HADOOP2.0,Exception java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaster

时间:2021-01-14 15:35:57

一、问题

运行yarn的MR程序,发现出现问题,报错:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaster
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster
	at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:336)

这个问题在hadoop-mapreduce-user邮件列表上面有人讨论过,地址: http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201207.mbox/browser  不过不深入。

二、分析

这个问题,很明显一看就是类加载不到,我们肯定首先去看下这个类在哪里,在包hadoop-mapreduce-client-app-2.0.0-alpha.jar中,路径在$HADOOP_HOME/share/hadoop/mapreduce(在2.0版本中,后续我估计这个可能会调整)

这个我猜应该是classpath的问题,所以我很想弄到启动container的时候的参数。

我们知道启动是通过shell命令启动,在ContainerLaunch.java中,我最终调试发现了启动参数(下面的这段代码其实最后会写入到/tmp/nm-local-dir/nmPrivate/application_1350793073454_0005/container_1350793073454_0005_01_000001/launch_container.sh这样类似的文件中):

#!/bin/bash

export YARN_LOCAL_DIRS="/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003"
export NM_HTTP_PORT="8042"
export JAVA_HOME="/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre"
export NM_HOST="hd19-vm4.yunti.yh.aliyun.com"
export CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$YARN_HOME/share/hadoop/mapreduce/*:$YARN_HOME/share/hadoop/mapreduce/lib/*:job.jar:$PWD/*"
export HADOOP_TOKEN_FILE_LOCATION="/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/container_1350707900707_0003_01_000001/container_tokens"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1350707900707_0003"
export JVM_PID="$$"
export USER="yarn"
export PWD="/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/container_1350707900707_0003_01_000001"
export NM_PORT="49111"
export HOME="/home/"
export LOGNAME="yarn"
export APP_SUBMIT_TIME_ENV="1350788662618"
export HADOOP_CONF_DIR="/home/yarn/hadoop-2.0.0-alpha/conf"
export MALLOC_ARENA_MAX="4"
export AM_CONTAINER_ID="container_1350707900707_0003_01_000001"
ln -sf "/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/filecache/-5059634618081520617/job.jar" "job.jar"
mkdir -p jobSubmitDir
ln -sf "/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/filecache/8471400424465082106/appTokens" "jobSubmitDir/appTokens"
ln -sf "/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/filecache/-511993817008097803/job.xml" "job.xml"
mkdir -p jobSubmitDir
ln -sf "/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/filecache/5917092335430839370/job.split" "jobSubmitDir/job.split"
mkdir -p jobSubmitDir
ln -sf "/tmp/nm-local-dir/usercache/yarn/appcache/application_1350707900707_0003/filecache/5764499011863329844/job.splitmetainfo" "jobSubmitDir/job.splitmetainfo"
exec /bin/bash -c "$JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/tmp/logs/application_1350707900707_0003/container_1350707900707_0003_01_000001 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/tmp/logs/application_1350707900707_0003/container_1350707900707_0003_01_000001/stdout 2>/tmp/logs/application_1350707900707_0003/container_1350707900707_0003_01_000001/stderr  "

classpath是:

"$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$YARN_HOME/share/hadoop/mapreduce/*:$YARN_HOME/share/hadoop/mapreduce/lib/*:job.jar:$PWD/*"


其实这个是:yarn.application.classpath这个参数控制,这个默认的是:

  <property>
    <description>Classpath for typical applications.</description>
     <name>yarn.application.classpath</name>
     <value>
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/share/hadoop/common/*,
        $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
        $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
        $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
        $YARN_HOME/share/hadoop/mapreduce/*,
        $YARN_HOME/share/hadoop/mapreduce/lib/*
     </value>
  </property>

通过比较,那$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.0.0-alpha.jar应该在$YARN_HOME/share/hadoop/mapreduce/*这个中,现在的问题就是$YARN_HOME这个数值是多少?

在命令行下执行:

[yarn@hd19-vm2 ~]$ echo $YARN_HOME
/home/yarn/hadoop-2.0.0-alpha
是正确的。

那在launch_container.sh命令执行过程中,难道不起作用么?这个就要从linux的环境变量说起了,参考:http://vbird.dic.ksu.edu.tw/linux_basic/0320bash_4.php

鸟哥讲述了login 与 non-login shell的区别, non-login shell是不读取~/.bash_profile这个文件啦,是读取:~/.bashrc这个文件。(我们设置环境变量的时候大部分人会写到~/.bash_profile文件中)

我们通过远程调用shell及java调用shell的过程其实都不会读取~/.bash_profile文件的。所以说launch_container.sh中也export了很多的环境变量了。这个主要是ContainerLaunch#sanitizeEnv()写入的。

我们看到有export JAVA_HOME="/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre" 没有export YARN_HOME=xxx的,所以执行launch_container.sh的时候,其实YARN_HOME是空的。

因为System.getenv中没有YARN_HOME所以在launch_container.sh也没有export选项。(这个要看源码ContainerLaunch.java#sanitizeEnv())

我们也看下jvm启动的时候env:

System.getenv()
	 (java.util.Collections$UnmodifiableMap<K,V>) {HADOOP_PREFIX=/home/yarn/hadoop-2.0.0-alpha, SHLVL=2, JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64, YARN_LOG_DIR=/home/yarn/hadoop-2.0.0-alpha/logs, XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt, SSH_CLIENT=10.249.197.55 47859 22, MAIL=/var/mail/yarn, PWD=/home/yarn/hadoop-2.0.0-alpha, LOGNAME=yarn, CVS_RSH=ssh, G_BROKEN_FILENAMES=1, NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat, LD_LIBRARY_PATH=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/../lib/amd64, SSH_CONNECTION=10.249.197.55 47859 10.249.197.56 22, MALLOC_ARENA_MAX=4, SHELL=/bin/bash, YARN_ROOT_LOGGER=INFO,RFA, YARN_LOGFILE=yarn-yarn-nodemanager-hd19-vm2.yunti.yh.aliyun.com.log, PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin, USER=yarn, HOME=/home/yarn, LESSOPEN=|/usr/bin/lesspipe.sh %s, HADOOP_CONF_DIR=/home/yarn/hadoop-2.0.0-alpha/conf, LS_COLORS=, SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass, LANG=en_US.UTF-8, YARN_IDENT_STRING=yarn, YARN_NICENESS=0}

这些主要是.bashrc与hadoop启动的命令产生的(其实从启动机器上面也可以带环境变量过来,大家可以做一个实验 export a=b; ssh h2 "echo $a>test";ssh h2 "cat test";)


还有一点非常注意:. xx.sh 如果没有export x 那x有效范围就是调用的进程,YARN_HOME就是这么弄的。

三、修正

那么我们修改这个就是非常容易了,我们可以把YARN_HOME等设置在.bashrc中。设置的变量主要有:JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,YARN_HOME

其中JAVA_HOME一般会ssh的时候带过去(当然需要所有机器的JAVA_HOME一致)、已经export的有HADOOP_CONF_DIR

或者修改$HADOOP_HOME/libexec中的代码,把YARN_HOME等变量export 。