$ hdfs dfs <args>
$ hdfs dfs -help
$ wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=122PnuKaSaA_OyYOKnxQOdlMc5awdyf5v' -O shakespeare.txt
$ hdfs dfs -put shakespeare.txt shakespeare.txt
You may apply the option -f to force overwrite the destination file in the distributed file system, e.g.
$ hdfs dfs -put -f shakespeare.txt shakespeare.txt
$ hdfs dfs -mkdir corpora
$ hdfs dfs -cat shakespeare.txt | less
Use the arrow keys to navigate the file. Type q to quit.
$ hdfs dfs -get shakespeare.txt ./shakespeare-dfs.txt
$ hdfs dfs -chmod 664 shakespeare.txt
664 is an octal representation of the flags to set for the permission triple. The above statement changes the permissions to -rw-rw-r–.
6 is 110, which means read and write, but not execute.
7 is 111, which means complete permissions.
4 is 100, which means read-only.
$ hdfs dfs -ls
$ hdfs dfs -ls /user
$ hdfs dfs -mkdir testHDFS
$ hdfs dfs -ls /user/hduser
$ echo "HDFS test file" >> testFile
$ ls
$ cat testFile
$ hdfs dfs -copyFromLocal testFile
To copy files from your local machine to HDFS, use the command -copyFromLocal. The command -cp is only used to copy files within HDFS. For more options and flexibility in copying files or directories to your desired destination within HDFS, consider using the alternative command shown below
hdfs dfs -put <Linux local file system> <distributed file system>
$ hdfs dfs -ls
$ hdfs dfs -cat testFile
$ hdfs dfs -mv testFile testHDFS
$ hdfs dfs -ls
$ hdfs dfs -ls testHDFS/
The first command moved your testFile from the HDFS home directory into the test one you created. The second command of this command then shows us that it’s no longer in the HDFS home directory, and the third command confirms that it’s now been moved to the test HDFS directory.
$ hdfs dfs -cp testHDFS/testFile testHDFS/testFile2
$ hdfs dfs -ls testHDFS/
$ hdfs dfs -du
$ hdfs dfs -df
$ hdfs dfs -rm testHDFS/testFile
$ hdfs dfs -ls testHDFS/
$ hdfs dfs -rm -r testHDFS
$ hdfs dfs -ls
In addition to the above commands, there are a number of POSIX-like commands (https://en.wikipedia.org/wiki/List_of_POSIX_commands) which include chgrp, chown, cp, du, mkdir, stat, tail
(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3, v3> (output)
Read more on MapReduce at URLs https://hadoop.apache.org/docs/r3.3.6/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html and https://en.wikipedia.org/wiki/MapReduce
$ sudo cp -r /mnt/c/de/WordCount /home/hduser
$ sudo chown hduser:hduser -R /home/hduser/WordCount
$ cd WordCount
$ cat ~/WordCount/WordCount.java
$ hadoop com.sun.tools.javac.Main WordCount.java
$ jar cf wc.jar WordCount*.class
$ ls ~/WordCount
$ hdfs dfs -ls /user/hduser
$ hadoop jar wc.jar WordCount shakespeare.txt wordcounts
You may copy the Linux local system to the distributed system using the following command
$ hdfs dfs -put ~/WordCount/shakespeare.txt /user/hduser
$ hdfs dfs -cat wordcounts/part-r-00000 | less
Exit the piped environment using Ctrl-z
$ mapred job -list
$ sudo cp -r /mnt/c/de/StreamingOn-time /home/hduser
$ sudo chown hduser:hduser -R /home/hduser/StreamingOn-time
$ cd StreamingOn-time
$ cat flights.csv | ./mapper.py | sort | ./reducer.py
Suppose that you may issue the following commands if you have observed the error of /usr/bin/env: ‘python’: No such file or directory. Re-execute the same command to retrying
$ sudo apt update $ sudo apt install python-is-python3
$ hdfs dfs -mkdir /user/hduser/StreamingOn-time
$ hdfs dfs -ls /user/hduser
$ hdfs dfs -ls /user/hduser/StreamingOn-time
$ hdfs dfs -put /home/hduser/StreamingOn-time/flights.csv /user/hduser/StreamingOn-time
$ hdfs dfs -ls /user/hduser/StreamingOn-time
$ hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar -input StreamingOn-time/flights.csv -output StreamingOn-time/average_delay -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py
$ hdfs dfs -ls /user/hduser/StreamingOn-time
$ hdfs dfs -ls /user/hduser/StreamingOn-time/average_delay
$ hdfs dfs -copyToLocal /user/hduser/StreamingOn-time/average_delay
$ ls /home/hduser/StreamingOn-time/average_delay
$ head /home/hduser/StreamingOn-time/average_delay/part-00000