Hadoop集群安装遇到的问题

时间:2022-09-25 07:45:51

问题1】

# jps
27851 Jps
18198 -- process information unavailable

没有办法kill掉

# kill -9 18198
-bash: kill: (18198) - No such process

解决方案:进入linux的/tmp下,cd /tmp,删除目录下的名称为hsperfdata_{username}的文件夹 然后jps,清净了。

问题2】

Hadoop的datanode,resourcemanager起不来(或者启动后自动关闭),日志报UnknownHostException。

Datanode日志:

2014-09-22 14:03:02,935 INFOorg.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host =java.net.UnknownHostException: master: master

解决方案:原来我的/etc/hosts文件中没有master(主机名)和ip的映射,vi /etc/hosts,添加一行:

master    192.168.80.4

另外也许是因为(遇到的问题太多了没记住是不是这个对应的解决方案),core-site.xml文件的改动带来的影响,自己先创建hadoop_tmp目录:

Hadoop集群安装遇到的问题

问题3】WARN net.DNS: Unable to determine address of the host-falling back to "localhost" address java.net.UnknownHostException: slave1: slave1
解决方案:这个报错是hosts的问题 ,在hosts里面有localhost还是不够的,要包当前的主机名slave1加进去。并测试:hostname –f,如果能返回当前的主机名,那么就是ok的。 
问题4】
异常:
2014-03-13 11:26:30,788 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1257313099-10.10.208.38-1394679083528 (storage id DS-743638901-127.0.0.1-50010-1394616048958) service to Linux-hadoop-38/10.10.208.38:9000
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/tmp/dfs/data: namenode clusterID = CID-8e201022-6faa-440a-b61c-290e4ccfb006; datanode clusterID = clustername
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:916)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:887)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:309)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
at java.lang.Thread.run(Thread.java:662)
解决办案:
1、在hdfs-site.xml配置文件中,配置了dfs.namenode.name.dir,在master中,该配置的目录下有个current文件夹,里面有个VERSION文件,内容如下:
#Thu Mar 13 10:51:23 CST 2014
namespaceID=1615021223
clusterID=CID-8e201022-6faa-440a-b61c-290e4ccfb006
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1257313099-10.10.208.38-1394679083528
layoutVersion=-40
2、在core-site.xml配置文件中,配置了hadoop.tmp.dir,在slave中,该配置的目录下有个dfs/data/current目录,里面也有一个VERSION文件,内容
#Wed Mar 12 17:23:04 CST 2014
storageID=DS-414973036-10.10.208.54-50010-1394616184818
clusterID=clustername
cTime=0
storageType=DATA_NODE
layoutVersion=-40
3、一目了然,两个内容不一样,导致的。删除slave中的错误内容,重启,搞定!

问题5】

开启Hadoop时,出现如下信息:(就是被ssh一下刷屏了,苦逼)

[root@hd-m1 /]# ./hadoop/hadoop-2.6.0/sbin/start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
15/01/23 20:23:41 WARN util.NativeCodeLoader: Unable to load native-hadooplibrary for your platform... using builtin-java classes where applicable
Starting namenodes on [Java HotSpot(TM) Client VM warning: You have loadedlibrary /hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might havedisabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c<libfile>', or link it with '-z noexecstack'.
hd-m1]
sed: -e expression #1, char 6: unknown option to `s'
-c: Unknown cipher type 'cd'
hd-m1: starting namenode, logging to/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-hd-m1.out
HotSpot(TM): ssh: Could not resolve hostname HotSpot(TM): Temporary failure inname resolution
Java: ssh: Could not resolve hostname Java:Temporary failure in name resolution
Client: ssh: Could not resolve hostname Client:Temporary failure in name resolution
You: ssh: Could not resolve hostname You:Temporary failure in name resolution
warning:: ssh: Could not resolve hostname warning:: Temporary failure in nameresolution
VM: ssh: Could not resolve hostname VM: Temporary failure in name resolution
have: ssh: Could not resolve hostname have: Temporary failure in nameresolution
library: ssh: Could not resolve hostname library: Temporary failure in nameresolution
loaded: ssh: Could not resolve hostname loaded: Temporary failure in nameresolution
might: ssh: Could not resolve hostname might: Temporary failure in nameresolution
which: ssh: Could not resolve hostname which: Temporary failure in nameresolution
have: ssh: Could not resolve hostname have: Temporary failure in nameresolution
disabled: ssh: Could not resolve hostname disabled: Temporary failure in nameresolution
stack: ssh: Could not resolve hostname stack: Temporary failure in nameresolution
guard.: ssh: Could not resolve hostname guard.: Temporary failure in nameresolution
VM: ssh: Could not resolve hostname VM: Temporary failure in name resolution
The: ssh: Could not resolve hostname The: Temporary failure in name resolution
try: ssh: Could not resolve hostname try: Temporary failure in name resolution
will: ssh: Could not resolve hostname will: Temporary failure in nameresolution
to: ssh: Could not resolve hostname to: Temporary failure in name resolution
fix: ssh: Could not resolve hostname fix: Temporary failure in name resolution
the: ssh: Could not resolve hostname the: Temporary failure in name resolution
stack: ssh: Could not resolve hostname stack: Temporary failure in nameresolution
guard: ssh: Could not resolve hostname guard: Temporary failure in nameresolution
It's: ssh: Could not resolve hostname It's: Temporary failure in nameresolution
now.: ssh: Could not resolve hostname now.: Temporary failure in nameresolution
recommended: ssh: Could not resolve hostname recommended: Temporary failure inname resolution
highly: ssh: Could not resolve hostname highly: Temporary failure in nameresolution
that: ssh: Could not resolve hostname that: Temporary failure in nameresolution
you: ssh: Could not resolve hostname you: Temporary failure in name resolution
with: ssh: Could not resolve hostname with: Temporary failure in nameresolution
'execstack: ssh: Could not resolve hostname 'execstack: Temporary failure inname resolution
the: ssh: Could not resolve hostname the: Temporary failure in name resolution
library: ssh: Could not resolve hostname library: Temporary failure in nameresolution
fix: ssh: Could not resolve hostname fix: Temporary failure in name resolution
< libfile>',: ssh: Could not resolve hostname <libfile>',:Temporary failure in name resolution
or: ssh: Could not resolve hostname or: Temporary failure in name resolution
link: ssh: Could not resolve hostname link: Temporary failure in nameresolution
it: ssh: Could not resolve hostname it: Temporary failure in name resolution
'-z: ssh: Could not resolve hostname '-z: Temporary failure in name resolution
with: ssh: Could not resolve hostname with: Temporary failure in nameresolution
noexecstack'.: ssh: Could not resolve hostname noexecstack'.: Temporary failurein name resolution。。。。。。

解决方案:

出现上述问题主要是环境变量没设置好,在~/.bash_profile或者/etc/profile中加入以下语句就没问题了。

  #vi /etc/profile或者vi~/.bash_profile
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    exportHADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

然后用source重新编译使之生效即可!
  #source /etc/profile或者source~/.bash_profile

 

问题6】

首先会因为以下几种情况才会出现启动不了datanode。

1.首先修改过master那台的配置文件,

2.多次hadoop namenode -format这种不好的习惯。

一般会出现一下报错:

java.io.IOException: Cannot lock storage /usr/hadoop/tmp/dfs/name. The directory is already locked.

或者是:

[root@hadoop current]# hadoop-daemon.sh start datanode

starting datanode, logging to /usr/local/hadoop1.1/libexec/../logs/hadoop-root-datanode-hadoop.out

[root@hadoop ~]# jps

jps命令发现没有datanode启动

对于这种情况请先试一下:

在坏死的节点上输入如下命令即可:

bin/hadoop-daemon.sh start dataNode

bin/hadoop-daemon.sh start jobtracker

如果还不可以的话,那么恭喜你和我遇到的情况一下。

正确的处理方法是,到你的每个Slave下面去,找到.../usr/hadoop/tmp/dfs/  -ls

会显示有: data

这里需要把data文件夹删掉。接着直接在刚才的目录下启动hadoop

start-all.sh

接着查看jps

那么就会出现datanode.了

接着去看

http://210.41.166.61(你的master的IP):50070

里面的活节点有多少个?

http://210.41.166.61(你的master的ip):50030/

显示的node数目。

OK,问题解决。