用命令行运行hadoop程序WordCount,编译hadoop程序报错

时间:2021-10-10 17:43:06

用命令行运行hadoop程序,中间出现了很多错误,和大家分享一下


将WordCount.java文件放在Hadoop安装目录下,我的是放在/home/administrator/hadoop-0.20.2/下,并在此目录下创建输入目录input,改目录下有输入文件file01.txt,file02.txt

file01.txt内容为:

hello hadoop1

hello hadoop2
hello hadoop3
hello hadoop2
hello hadoop1
hello hadoop5
hello hadoop5
hello hadoop5
hello world1
hello world1
hello world2
hell word

file02.txt内容为:

hello world1

hello world2
hello world2
hello world2
hello world1
hello hadoop5
hello hadoop5
hell word
hell word


在集群上创建输入文件夹
bin/hadoop fs -mkdir wordcount_input


上传本地目录input下的file文件

bin/hadoop fs -put /home/administrator/input/file01.txt wordcount_input
bin/hadoop fs -put /home/administrator/input/file02.txt wordcount_input

开始编译WordCount.java
我一开始的时候出现了编译错误

administrator@Master:/$ javac -classpath /home/administrator/hadoop-0.20.2/hadoop-0.20.2-core.jar /home/administrator/hadoop-0.20.2/WordCount.java -d /home/administrator/hadoop-0.20.2/WordCount

错误提示:

/home/administrator/hadoop-0.20.2/WordCount.java:46: 无法访问 org.apache.commons.cli.Options
未找到 org.apache.commons.cli.Options 的类文件
    String[] otherArgs = new GenericOptionsParser(conf, arg).getRemainingArgs();  
                         ^
1 错误

编译成功:
administrator@Master:/$ javac -classpath /home/administrator/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/administrator/hadoop-0.20.2/lib/commons-cli-1.2.jar /home/administrator/hadoop-0.20.2/WordCount.java -d /home/administrator/hadoop-0.20.2/WordCount

只要在 classpath 中再加上一个 jar 包即可:
ubuntu@ubuntu:~/dev/wordcount$ javac -classpath /home/ubuntu/hadoop-1.0.4/hadoop-core-1.0.4.jar:/home/ubuntu/hadoop-1.0.4/lib/commons-cli-1.2.jar -d bin WordCount.java


打包报错:

administrator@Master:/$ jar -cvf wordcount.jar -C /home/administrator/hadoop-0.20.2/WordCount/v1/*.class

错误提示:

java.io.FileNotFoundException: wordcount.jar (Permission denied)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:99)
    at sun.tools.jar.Main.run(Main.java:187)
    at sun.tools.jar.Main.main(Main.java:1167)

打包成功:
示例 1:将两个类文件归档到一个名为 classes.jar 的归档文件中:
       jar cvf classes.jar Foo.class Bar.class
示例 2:使用现有的清单文件 "mymanifest" 并
           将 foo/ 目录中的所有文件归档到 "classes.jar" 中:
       jar cvfm classes.jar mymanifest -C foo/ .
                
administrator@Master:~/hadoop-0.20.2$ jar -cvf WordCount.jar -C WordCount .(ps:这个点号不能少,我的WordCount程序中是有包的)
标明清单(manifest)
增加:v1/(读入= 0) (写出= 0)(存储了 0%)
增加:v1/WordCount$TokenizerMapper.class(读入= 1852) (写出= 770)(压缩了 58%)
增加:v1/WordCount$IntSumReducer.class(读入= 1741) (写出= 741)(压缩了 57%)
增加:v1/WordCount.class(读入= 1839) (写出= 993)(压缩了 46%)


运行没有结果
administrator@Master:~/hadoop-0.20.2$ bin/hadoop jar wordcount.jar v1/WordCount wordcount_input wordcount_output
13/07/06 11:03:06 INFO input.FileInputFormat: Total input paths to process : 0
13/07/06 11:03:07 INFO mapred.JobClient: Running job: job_201307061012_0002
13/07/06 11:03:08 INFO mapred.JobClient:  map 0% reduce 0%
13/07/06 11:03:27 INFO mapred.JobClient:  map 0% reduce 100%
13/07/06 11:03:29 INFO mapred.JobClient: Job complete: job_201307061012_0002
13/07/06 11:03:29 INFO mapred.JobClient: Counters: 8
13/07/06 11:03:29 INFO mapred.JobClient:   Job Counters
13/07/06 11:03:29 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/06 11:03:29 INFO mapred.JobClient:   Map-Reduce Framework
13/07/06 11:03:29 INFO mapred.JobClient:     Reduce input groups=0
13/07/06 11:03:29 INFO mapred.JobClient:     Combine output records=0
13/07/06 11:03:29 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/07/06 11:03:29 INFO mapred.JobClient:     Reduce output records=0
13/07/06 11:03:29 INFO mapred.JobClient:     Spilled Records=0
13/07/06 11:03:29 INFO mapred.JobClient:     Combine input records=0
13/07/06 11:03:29 INFO mapred.JobClient:     Reduce input records=0
ps:后来我发现原来是我的WordCount.java程序中多写了String[] arg = { "hdfs://localhost:9000/user/administrator/input", "hdfs://localhost:9000/user/administrator/output" };命令行是指定输入输出文件夹的。

运行成功:

administrator@Master:~/hadoop-0.20.2$ bin/hadoop jar WordCount.jar v1/WordCount wordcount_input wordcount_output

v1/WordCount是主类,不要忘记了把包名v1加上。我开始就犯了这个错误,主要是不懂怎么运行jar文件,看来基础的东西还是要牢牢掌握的好啊

13/07/06 11:20:49 INFO input.FileInputFormat: Total input paths to process : 2
13/07/06 11:20:50 INFO mapred.JobClient: Running job: job_201307061012_0003
13/07/06 11:20:51 INFO mapred.JobClient:  map 0% reduce 0%
13/07/06 11:21:05 INFO mapred.JobClient:  map 50% reduce 0%
13/07/06 11:21:08 INFO mapred.JobClient:  map 100% reduce 0%
13/07/06 11:21:14 INFO mapred.JobClient:  map 100% reduce 16%
13/07/06 11:21:20 INFO mapred.JobClient:  map 100% reduce 100%
13/07/06 11:21:25 INFO mapred.JobClient: Job complete: job_201307061012_0003
13/07/06 11:21:25 INFO mapred.JobClient: Counters: 17
13/07/06 11:21:25 INFO mapred.JobClient:   Job Counters
13/07/06 11:21:25 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/06 11:21:25 INFO mapred.JobClient:     Launched map tasks=2
13/07/06 11:21:25 INFO mapred.JobClient:     Data-local map tasks=2
13/07/06 11:21:25 INFO mapred.JobClient:   FileSystemCounters
13/07/06 11:21:25 INFO mapred.JobClient:     FILE_BYTES_READ=196
13/07/06 11:21:25 INFO mapred.JobClient:     HDFS_BYTES_READ=276
13/07/06 11:21:25 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=462
13/07/06 11:21:25 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=81
13/07/06 11:21:25 INFO mapred.JobClient:   Map-Reduce Framework
13/07/06 11:21:25 INFO mapred.JobClient:     Reduce input groups=9
13/07/06 11:21:25 INFO mapred.JobClient:     Combine output records=15
13/07/06 11:21:25 INFO mapred.JobClient:     Map input records=23
13/07/06 11:21:25 INFO mapred.JobClient:     Reduce shuffle bytes=80
13/07/06 11:21:25 INFO mapred.JobClient:     Reduce output records=9
13/07/06 11:21:25 INFO mapred.JobClient:     Spilled Records=30
13/07/06 11:21:25 INFO mapred.JobClient:     Map output bytes=442
13/07/06 11:21:25 INFO mapred.JobClient:     Combine input records=42
13/07/06 11:21:25 INFO mapred.JobClient:     Map output records=42
13/07/06 11:21:25 INFO mapred.JobClient:     Reduce input records=15

查看结果:
administrator@Master:~/hadoop-0.20.2$ bin/hadoop fs -cat wordcount/part-r-00000
cat: File does not exist: wordcount/part-r-00000
administrator@Master:~/hadoop-0.20.2$ bin/hadoop fs -cat wordcount_output/part-r-00000
hadoop1    2
hadoop2    2
hadoop3    1
hadoop5    5
hell    3
hello    18
word    3
world1    4
world2    4
administrator@Master:~/hadoop-0.20.2$