HDFS基本命令与Hadoop MapReduce程序的执行

时间:2024-04-29 23:50:42

  一、HDFS基本命令

  1.创建目录:-mkdir

[jun@master ~]$ hadoop fs -mkdir /test
[jun@master ~]$ hadoop fs -mkdir /test/input

  2.查看文件列表:-ls

[jun@master ~]$ hadoop fs -ls /
Found items
drwxr-xr-x - jun supergroup -- : /test
[jun@master ~]$ hadoop fs -ls /test
Found items
drwxr-xr-x - jun supergroup -- : /test/input

  3.上传文件到HDFS

  在/home/jun下新建两个文件jun.dat和jun.txt

  (1)使用-put将文件从本地复制到HDFS集群

[jun@master ~]$ hadoop fs -put /home/jun/jun.dat /test/input/jun.dat

  (2)使用-copyFromLocal将文件从本地复制到HDFS集群

[jun@master ~]$ hadoop fs -copyFromLocal -f /home/jun/jun.txt  /test/input/jun.txt

  (3)查看是否复制成功

[jun@master ~]$ hadoop fs -ls /test/input
Found items
-rw-r--r-- jun supergroup -- : /test/input/jun.dat
-rw-r--r-- jun supergroup -- : /test/input/jun.txt

  4.下载文件到本地

  (1)使用-get将文件从HDFS集群复制到本地

[jun@master ~]$ hadoop fs -get /test/input/jun.dat /home/jun/jun1.dat

  (2)使用-copyToLocal将文件从HDFS集群复制到本地

[jun@master ~]$ hadoop fs -copyToLocal /test/input/jun.txt /home/jun/jun1.txt

  (3)查看是否复制成功

[jun@master ~]$ ls -l /home/jun/
total
drwxr-xr-x. jun jun Jul : Desktop
drwxr-xr-x. jun jun Jul : Documents
drwxr-xr-x. jun jun Jul : Downloads
drwxr-xr-x. jun jun Jul : hadoop
drwxrwxr-x. jun jun Jul : hadoopdata
-rw-r--r--. jun jun Jul : jun1.dat
-rw-r--r--. jun jun Jul : jun1.txt
-rw-rw-r--. jun jun Jul : jun.dat
-rw-rw-r--. jun jun Jul : jun.txt
drwxr-xr-x. jun jun Jul : Music
drwxr-xr-x. jun jun Jul : Pictures
drwxr-xr-x. jun jun Jul : Public
drwxr-xr-x. jun jun Jul : Resources
drwxr-xr-x. jun jun Jul : Templates
drwxr-xr-x. jun jun Jul : Videos

  5.查看HDFS集群中的文件

[jun@master ~]$ hadoop fs -cat /test/input/jun.txt
This is the txt file.
[jun@master ~]$ hadoop fs -text /test/input/jun.txt
This is the txt file.
[jun@master ~]$ hadoop fs -tail /test/input/jun.txt
This is the txt file.

  6.删除HDFS文件

[jun@master ~]$ hadoop fs -rm /test/input/jun.txt
Deleted /test/input/jun.txt
[jun@master ~]$ hadoop fs -ls /test/input
Found items
-rw-r--r-- jun supergroup -- : /test/input/jun.dat

  7.也可以在slave节点上执行命令

[jun@slave0 ~]$ hadoop fs -ls /test/input
Found items
-rw-r--r-- jun supergroup -- : /test/input/jun.dat

  二、在Hadoop集群中运行程序

  Hadoop安装文件中有一个MapReduce示例程序,该程序用来计算圆周率pi的Java程序包,

  参数说明:pi(类名)、10(Map次数)、10(随机生成点的次数)

[jun@master ~]$ hadoop jar /home/jun/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8..jar pi
Number of Maps =
Samples per Map =
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Starting Job
// :: INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.100:
// :: INFO input.FileInputFormat: Total input files to process :
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532226440522_0001
// :: INFO impl.YarnClientImpl: Submitted application application_1532226440522_0001
// :: INFO mapreduce.Job: The url to track the job: http://master:18088/proxy/application_1532226440522_0001/
// :: INFO mapreduce.Job: Running job: job_1532226440522_0001
// :: INFO mapreduce.Job: Job job_1532226440522_0001 running in uber mode : false
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Job job_1532226440522_0001 completed successfully
// :: INFO mapreduce.Job: Counters:
File System Counters
FILE: Number of bytes read=
FILE: Number of bytes written=
FILE: Number of read operations=
FILE: Number of large read operations=
FILE: Number of write operations=
HDFS: Number of bytes read=
HDFS: Number of bytes written=
HDFS: Number of read operations=
HDFS: Number of large read operations=
HDFS: Number of write operations=
Job Counters
Launched map tasks=
Launched reduce tasks=
Data-local map tasks=
Total time spent by all maps in occupied slots (ms)=
Total time spent by all reduces in occupied slots (ms)=
Total time spent by all map tasks (ms)=
Total time spent by all reduce tasks (ms)=
Total vcore-milliseconds taken by all map tasks=
Total vcore-milliseconds taken by all reduce tasks=
Total megabyte-milliseconds taken by all map tasks=
Total megabyte-milliseconds taken by all reduce tasks=
Map-Reduce Framework
Map input records=
Map output records=
Map output bytes=
Map output materialized bytes=
Input split bytes=
Combine input records=
Combine output records=
Reduce input groups=
Reduce shuffle bytes=
Reduce input records=
Reduce output records=
Spilled Records=
Shuffled Maps =
Failed Shuffles=
Merged Map outputs=
GC time elapsed (ms)=
CPU time spent (ms)=
Physical memory (bytes) snapshot=
Virtual memory (bytes) snapshot=
Total committed heap usage (bytes)=
Shuffle Errors
BAD_ID=
CONNECTION=
IO_ERROR=
WRONG_LENGTH=
WRONG_MAP=
WRONG_REDUCE=
File Input Format Counters
Bytes Read=
File Output Format Counters
Bytes Written=
Job Finished in 88.689 seconds
Estimated value of Pi is 3.20000000000000000000

  最后可以看到,得到的结果近似为3.2。