yarn中resourcemanager启动不了,启动hregionserver后又挂掉了问题的解决

时间:2025-04-12 08:03:58

问题一、启动Hadoop-2.2.0中的yarn时,resourcemanager进程一直没有启动起来。

查看日志文件中的信息tail -n 50

出现一下异常:

2016-09-09 14:41:09,341 INFO : Service ResourceManager failed in state STARTED; cause: : Error starting http server
: Error starting http server
    at $(:262)
    at (:623)
    at (:655)
    at (:193)
    at (:872)
Caused by: : Port in use: 192.168.1.120:8088
    at (:742)
    at (:686)
    at $(:257)
    ... 4 more
Caused by: : Address already in use
    at .bind0(Native Method)
    at (:444)
    at (:436)
    at (:214)
    at (:74)
    at (:216)
    at (:738)
    ... 6 more

解决方法:

1. ps aux | grep -i resourcemanager,  查看主机master中的resourcemanager的进程个数

2.   然后使用 kill -9 <RESOURCE_MANAGER_PID> 杀死相关进行

3. sbin目录下重启yarn即可复现进行

   ./   ./ 

在主节点master上面即可出现resourcemanager进程


问题二、有时,启动hregionserver后又挂掉了,查看Hbase启动的日志

dell@master1:/usr/local/hbase-0.98.7-hadoop2/logs$ tail -n 100 
    at (:1286)
    at (:862)
    at (:745)
2017-01-12 10:02:23,347 FATAL [regionserver60020] : ABORTING region server master1,60020,1484186540447: Unhandled: Cannot create directory /hbase/WALs/master1,60020,1484186540447. Name node is in safe mode.
Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE:  If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
    at (:3355)
    at (:3330)
    at (:724)
    at (:502)
    at $ClientNamenodeProtocol$(:59598)
    at $Server$(:585)
    at $(:928)
    at $Handler$(:2048)
    at $Handler$(:2044)
    at (Native Method)
    at (:415)
    at (:1491)
    at $(:2042)

(): Cannot create directory /hbase/WALs/master1,60020,1484186540447. Name node is in safe mode.
Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE:  If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
    at (:3355)
    at (:3330)
    at (:724)
    at (:502)
    at $ClientNamenodeProtocol$(:59598)
    at $Server$(:585)
    at $(:928)
    at $Handler$(:2048)
    at $Handler$(:2044)
    at (Native Method)
    at (:415)
    at (:1491)
    at $(:2042)

    at (:1347)
    at (:1300)
    at $(:206)
    at .$(Unknown Source)
    at .invoke0(Native Method)
    at (:57)
    at (:43)
    at (:606)
    at (:186)
    at (:102)
    at .$(Unknown Source)
    at (:467)
    at .invoke0(Native Method)
    at (:57)
    at (:43)
    at (:606)
    at $(:294)
    at .$(Unknown Source)
    at (:2394)
    at (:2365)
    at $(:817)
    at $(:813)
    at (:81)
    at (:813)
    at (:806)
    at (:1933)
    at .<init>(:408)
    at .<init>(:334)
    at (:58)
    at (:1552)
    at (:1531)
    at (:1286)
    at (:862)
    at (:745)
2017-01-12 10:02:23,350 FATAL [regionserver60020] : RegionServer abort: loaded coprocessors are: []
2017-01-12 10:02:23,367 INFO  [regionserver60020] : Stopping server on 60020
2017-01-12 10:02:23,368 INFO  [regionserver60020] : Stopping infoServer
2017-01-12 10:02:23,373 INFO  [regionserver60020] : Stopped SelectChannelConnector@0.0.0.0:60030
2017-01-12 10:02:23,475 INFO  [regionserver60020] : Stopping RegionServerSnapshotManager abruptly.
2017-01-12 10:02:23,475 INFO  [regionserver60020] : aborting server master1,60020,1484186540447
2017-01-12 10:02:23,475 DEBUG [regionserver60020] : Stopping catalog tracker @58465d50
2017-01-12 10:02:23,475 INFO  [regionserver60020] $HConnectionImplementation: Closing zookeeper sessionid=0x358d3e5582442fb
2017-01-12 10:02:23,485 INFO  [regionserver60020] : Session: 0x358d3e5582442fb closed
2017-01-12 10:02:23,485 INFO  [regionserver60020-EventThread] : EventThread shut down
2017-01-12 10:02:23,488 INFO  [regionserver60020] : stopping server master1,60020,1484186540447; all regions closed.
2017-01-12 10:02:23,588 INFO  [regionserver60020] : regionserver60020 closing leases
2017-01-12 10:02:23,588 INFO  [regionserver60020] : regionserver60020 closed leases
2017-01-12 10:02:23,589 INFO  [regionserver60020] : Waiting for Split Thread to finish...
2017-01-12 10:02:23,589 INFO  [regionserver60020] : Waiting for Merge Thread to finish...
2017-01-12 10:02:23,589 INFO  [regionserver60020] : Waiting for Large Compaction Thread to finish...
2017-01-12 10:02:23,589 INFO  [regionserver60020] : Waiting for Small Compaction Thread to finish...
2017-01-12 10:02:23,636 INFO  [regionserver60020] : Session: 0x558d3e6026242f9 closed
2017-01-12 10:02:23,636 INFO  [regionserver60020-EventThread] : EventThread shut down
2017-01-12 10:02:23,636 INFO  [regionserver60020] : stopping server master1,60020,1484186540447; zookeeper connection closed.
2017-01-12 10:02:23,636 INFO  [regionserver60020] : regionserver60020 exiting
2017-01-12 10:02:23,636 ERROR [main] : Region server exiting
: HRegionServer Aborted
    at (:66)
    at (:85)
    at (:70)
    at (:126)
    at (:2489)
2017-01-12 10:02:23,639 INFO  [Thread-10] : Shutdown hook starting; =true; fsShutdownHook=$Cache$ClientFinalizer@68ee3eb2
2017-01-12 10:02:23,640 INFO  [Thread-10] : Starting fs shutdown hook thread.
2017-01-12 10:02:23,641 INFO  [Thread-10] : Shutdown hook finished.
You have new mail in /var/mail/dell
解决方法:

1. hdfs dfsadmin -safemode leave, 释放安全模式

2. 然后使用

启动集群中所有的regionserver

./ start regionserver
或者启动某个regionserver
./ start regionserver
3.查看Hbase webUI 
http://192.168.1.120:60010/master-status
可以看到Region Servers的存活个数。






参考文献:/questions/26704763/yarn-resourcetrackerservice-failed-in-state-started