一、问题
运行yarn的MR程序,发现出现问题,报错:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaster Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:336)
这个问题在hadoop-mapreduce-user邮件列表上面有人讨论过,地址: http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201207.mbox/browser 不过不深入。
二、分析
这个问题,很明显一看就是类加载不到,我们肯定首先去看下这个类在哪里,在包hadoop-mapreduce-client-app-2.0.0-alpha.jar中,路径在$HADOOP_HOME/share/hadoop/mapreduce(在2.0版本中,后续我估计这个可能会调整)
这个我猜应该是classpath的问题,所以我很想弄到启动container的时候的参数。
我们知道启动是通过shell命令启动,在ContainerLaunch.java中,我最终调试发现了启动参数(下面的这段代码其实最后会写入到/tmp/nm-local-dir/nmPrivate/application_1350793073454_0005/container_1350793073454_0005_01_000001/launch_container.sh这样类似的文件中):
#!/bin/bash export YARN_LOCAL_DIRS="/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003" export NM_HTTP_PORT="8042" export JAVA_HOME="/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre" export NM_HOST="hd19-vm4.yunti.yh.aliyun.com" export CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$YARN_HOME/share/hadoop/mapreduce/*:$YARN_HOME/share/hadoop/mapreduce/lib/*:job.jar:$PWD/*" export HADOOP_TOKEN_FILE_LOCATION="/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/container_1350707900707_0003_01_000001/container_tokens" export APPLICATION_WEB_PROXY_BASE="/proxy/application_1350707900707_0003" export JVM_PID="$$" export USER="yarn" export PWD="/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/container_1350707900707_0003_01_000001" export NM_PORT="49111" export HOME="/home/" export LOGNAME="yarn" export APP_SUBMIT_TIME_ENV="1350788662618" export HADOOP_CONF_DIR="/home/yarn/hadoop-2.0.0-alpha/conf" export MALLOC_ARENA_MAX="4" export AM_CONTAINER_ID="container_1350707900707_0003_01_000001" ln -sf "/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/filecache/-5059634618081520617/job.jar" "job.jar" mkdir -p jobSubmitDir ln -sf "/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/filecache/8471400424465082106/appTokens" "jobSubmitDir/appTokens" ln -sf "/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/filecache/-511993817008097803/job.xml" "job.xml" mkdir -p jobSubmitDir ln -sf "/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/filecache/5917092335430839370/job.split" "jobSubmitDir/job.split" mkdir -p jobSubmitDir ln -sf "/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/filecache/5764499011863329844/job.splitmetainfo" "jobSubmitDir/job.splitmetainfo" exec /bin/bash -c "$JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/tmp/logs/application_1350707900707_0003/container_1350707900707_0003_01_000001 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/tmp/logs/application_1350707900707_0003/container_1350707900707_0003_01_000001/stdout 2>/tmp/logs/application_1350707900707_0003/container_1350707900707_0003_01_000001/stderr "
classpath是:
"$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$YARN_HOME/share/hadoop/mapreduce/*:$YARN_HOME/share/hadoop/mapreduce/lib/*:job.jar:$PWD/*"
其实这个是:yarn.application.classpath这个参数控制,这个默认的是:
<property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value> $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $YARN_HOME/share/hadoop/mapreduce/*, $YARN_HOME/share/hadoop/mapreduce/lib/* </value> </property>
通过比较,那$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.0.0-alpha.jar应该在$YARN_HOME/share/hadoop/mapreduce/*这个中,现在的问题就是$YARN_HOME这个数值是多少?
在命令行下执行:
[yarn@hd19-vm2 ~]$ echo $YARN_HOME /home/yarn/hadoop-2.0.0-alpha是正确的。
那在launch_container.sh命令执行过程中,难道不起作用么?这个就要从linux的环境变量说起了,参考:http://vbird.dic.ksu.edu.tw/linux_basic/0320bash_4.php
鸟哥讲述了login 与 non-login shell的区别, non-login shell是不读取~/.bash_profile这个文件啦,是读取:~/.bashrc这个文件。(我们设置环境变量的时候大部分人会写到~/.bash_profile文件中)
我们通过远程调用shell及java调用shell的过程其实都不会读取~/.bash_profile文件的。所以说launch_container.sh中也export了很多的环境变量了。这个主要是ContainerLaunch#sanitizeEnv()写入的。
我们看到有export JAVA_HOME="/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre" 没有export YARN_HOME=xxx的,所以执行launch_container.sh的时候,其实YARN_HOME是空的。
因为System.getenv中没有YARN_HOME所以在launch_container.sh也没有export选项。(这个要看源码ContainerLaunch.java#sanitizeEnv())
我们也看下jvm启动的时候env:
System.getenv() (java.util.Collections$UnmodifiableMap<K,V>) {HADOOP_PREFIX=/home/yarn/hadoop-2.0.0-alpha, SHLVL=2, JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64, YARN_LOG_DIR=/home/yarn/hadoop-2.0.0-alpha/logs, XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt, SSH_CLIENT=10.249.197.55 47859 22, MAIL=/var/mail/yarn, PWD=/home/yarn/hadoop-2.0.0-alpha, LOGNAME=yarn, CVS_RSH=ssh, G_BROKEN_FILENAMES=1, NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat, LD_LIBRARY_PATH=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/../lib/amd64, SSH_CONNECTION=10.249.197.55 47859 10.249.197.56 22, MALLOC_ARENA_MAX=4, SHELL=/bin/bash, YARN_ROOT_LOGGER=INFO,RFA, YARN_LOGFILE=yarn-yarn-nodemanager-hd19-vm2.yunti.yh.aliyun.com.log, PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin, USER=yarn, HOME=/home/yarn, LESSOPEN=|/usr/bin/lesspipe.sh %s, HADOOP_CONF_DIR=/home/yarn/hadoop-2.0.0-alpha/conf, LS_COLORS=, SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass, LANG=en_US.UTF-8, YARN_IDENT_STRING=yarn, YARN_NICENESS=0}
这些主要是.bashrc与hadoop启动的命令产生的(其实从启动机器上面也可以带环境变量过来,大家可以做一个实验 export a=b; ssh h2 "echo $a>test";ssh h2 "cat test";)
还有一点非常注意:. xx.sh 如果没有export x 那x有效范围就是调用的进程,YARN_HOME就是这么弄的。
三、修正
那么我们修改这个就是非常容易了,我们可以把YARN_HOME等设置在.bashrc中。设置的变量主要有:JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,YARN_HOME
其中JAVA_HOME一般会ssh的时候带过去(当然需要所有机器的JAVA_HOME一致)、已经export的有HADOOP_CONF_DIR
或者修改$HADOOP_HOME/libexec中的代码,把YARN_HOME等变量export 。