hadoop 集群子节点不启动 spark-slave1: ssh: Could not resolve hostname spark-slave1: Name or service not known

时间:2021-08-28 13:45:33

报错信息:

./start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-master.out
spark-slave1: ssh: Could not resolve hostname spark-slave1: Name or service not known
spark-slave2: ssh: Could not resolve hostname spark-slave2: Name or service not known
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-master.out
spark-slave2: ssh: Could not resolve hostname spark-slave2: Name or service not known
spark-slave1: ssh: Could not resolve hostname spark-slave1: Name or service not known

分析:报错信息大概意思是无法解析spark-slave1和spark-slave2主机名,我子节点的主机名明明是node1和node2,找了很久终于找到了问题所在

在slaves文件中

vim /usr/local/hadoop/etc/hadoop/slaves 

spark-slave1
spark-slave2

设置了默认的子节点主机名,改为自己的子节点即可

vim /usr/local/hadoop/etc/hadoop/slaves 

node1
node2

然后重启hadoop

./stop-all.sh //关闭
./start-all.sh //启动

然后发现就不报错了,子节点启动成功

@master:/usr/local/hadoop/sbin# jps
5698 ResourceManager
6403 Jps
5547 SecondaryNameNode
5358 NameNode
@node1:~# jps
885 Jps
744 NodeManager
681 DataNode
@node2:~# jps
914 Jps
773 NodeManager
710 DataNode

总结:hadoop下的slaves文件官方解释为:集群里的一台机器被指定为 NameNode,另一台不同的机器被指定为JobTracker。这些机器是masters,余下的机器即作为DataNode作为TaskTracker,这些机器是slaves,在slaves文件中列出所有slave的主机名或者IP地址,一行一个。  意思就是子节点主机名或ip,当然也可以设置为172之类的ip地址。