在做MapReduce开发的过程中,难免会遇到些问题,这里记录下这些问题及其解决方法
1.找不到ResourceManager
开发好的MapReduce客户端代码打成jar包提交到部署hadoop集群的服务器,运行的时候提示下面的错误:
17/08/08 10:02:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/08 10:02:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/08/08 10:02:25 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/08/08 10:02:26 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/08/08 10:02:27 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/08/08 10:02:28 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/08/08 10:02:29 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
分析问题:
在这个错误提示里面有一行关键词-----Connecting to ResourceManager at /0.0.0.0:8032,通过这里可以看出是要尝试连接到ResourceManager ,并且给出端口号8032,我们都知道在Hadoop 2.X里面是用YARN来管理资源的,并且YARN也是主从模式的结构,启动YRAN之后在主节点会产生ResourceManager 进程,在从节点会产生NodeManager进程,通常在配置YARN时,需要指定yarn.resourcemanager.address这个参数,它是ResourceManager 对客户端暴露的地址。客户端通过该地址向RM提交应用程序,杀死应用程序等。它的默认值:${yarn.resourcemanager.hostname}:8032,这里恰好是8032端口,上面的错误信息里面有多次连续的几行都是尝试连接到0.0.0.0/0.0.0.0:8032,说明是在链接ResourceManager。其实分析到这里问题的原因已经很明确了-----ResourceManager进程不存在或者是木有配置YARN.检查yarn-site.xml和yarn-env.sh发现,这俩配置文件里面都没有配置。
解决方法:
配置yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>node1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>node1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>node1:8088</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
</configuration>
配置yarn-env.sh
增加下面的配置项
export JAVA_HOME=/opt/package/jdk1.7.0_76
export YARN_LOG_DIR=/opt/package/hadoop-2.7.2/logs
export YARN_ROOT_LOGGER=DEBUG
重新启动集群,并且在启动集群时在node1上执行
start-yarn.sh 在node1上会产生ResourceManager进程,在node2、node3上会产生NodeManager进程, 然后执行客户端MapReduce代码,结果正常.
其实一个HadoopHA 集群启动和停止是有顺序的,我在后面会有专门的文章来说明这种集群的启动和停止顺序。