[Linux][Hadoop] 运行WordCount例子

紧接上篇，完成Hadoop的安装并跑起来之后，是该运行相关例子的时候了，而最简单最直接的例子就是HelloWorld式的WordCount例子。

参照博客进行运行：http://xiejianglei163.blog.163.com/blog/static/1247276201443152533684/

首先创建一个文件夹，并创建两个文件，目录随意，为以下文件结构：

examples

--file1.txt

--file2.txt

文件内容随意填写，我是从新闻copy下来的一段英文：

执行以下命令：

hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -mkdir /data    #在hadoop中创建/data文件夹，该文件夹用来存放输入数据，这个文件不是Linux的根目录下的文件，而是hadoop下的文件夹

hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -put -f ./data_input/* /data #将前面生成的两个 文件拷贝至/data下

[Linux][Hadoop] 运行WordCount例子

执行WordCount命令，并查看结果：

hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop jar ./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.4.1-sources.jar org.apache.hadoop.examples.WordCount /data /output

14/07/22 22:34:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

14/07/22 22:34:27 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

14/07/22 22:34:29 INFO input.FileInputFormat: Total input paths to process : 2

14/07/22 22:34:29 INFO mapreduce.JobSubmitter: number of splits:2

14/07/22 22:34:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406038146260_0001

14/07/22 22:34:32 INFO impl.YarnClientImpl: Submitted application application_1406038146260_0001

14/07/22 22:34:32 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1406038146260_0001/

14/07/22 22:34:32 INFO mapreduce.Job: Running job: job_1406038146260_0001

14/07/22 22:34:58 INFO mapreduce.Job: Job job_1406038146260_0001 running in uber mode : false

14/07/22 22:34:58 INFO mapreduce.Job:  map 0% reduce 0%

14/07/22 22:35:34 INFO mapreduce.Job:  map 100% reduce 0%

14/07/22 22:35:52 INFO mapreduce.Job:  map 100% reduce 100%

14/07/22 22:35:52 INFO mapreduce.Job: Job job_1406038146260_0001 completed successfully

14/07/22 22:35:53 INFO mapreduce.Job: Counters: 49

        File System Counters

                FILE: Number of bytes read=2521

                FILE: Number of bytes written=283699

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

                HDFS: Number of bytes read=2280

                HDFS: Number of bytes written=1710

                HDFS: Number of read operations=9

                HDFS: Number of large read operations=0

                HDFS: Number of write operations=2

        Job Counters

                Launched map tasks=2

                Launched reduce tasks=1

                Data-local map tasks=2

                Total time spent by all maps in occupied slots (ms)=71182

                Total time spent by all reduces in occupied slots (ms)=13937

                Total time spent by all map tasks (ms)=71182

                Total time spent by all reduce tasks (ms)=13937

                Total vcore-seconds taken by all map tasks=71182

                Total vcore-seconds taken by all reduce tasks=13937

                Total megabyte-seconds taken by all map tasks=72890368

                Total megabyte-seconds taken by all reduce tasks=14271488

        Map-Reduce Framework

                Map input records=29

                Map output records=274

                Map output bytes=2814

                Map output materialized bytes=2527

                Input split bytes=202

                Combine input records=274

                Combine output records=195

                Reduce input groups=190

                Reduce shuffle bytes=2527

                Reduce input records=195

                Reduce output records=190

                Spilled Records=390

                Shuffled Maps =2

                Failed Shuffles=0

                Merged Map outputs=2

                GC time elapsed (ms)=847

                CPU time spent (ms)=6410

                Physical memory (bytes) snapshot=426119168

                Virtual memory (bytes) snapshot=1953292288

                Total committed heap usage (bytes)=256843776

        Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

        File Input Format Counters

                Bytes Read=2078

        File Output Format Counters

                Bytes Written=1710

hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$

上面的日志显示出了wordCount的详细情况，然后执行查看结果命令查看统计结果：

hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -cat /output/part-r-00000

14/07/22 22:38:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

"as     1

"atrocious,"    1

-       1

10-day  1

13      1

18      1

20,     1

2006.   1

3,000   1

432     1

65      1

7.4.52  1

:help   2

:help<Enter>    1

:q<Enter>       1

<F1>    1

Already,        1

Ban     1

Benjamin        1

后面省略了很多统计数据，wordCount统计结果完成。

秒客网

[Linux][Hadoop] 运行WordCount例子

相关文章