昨天大数据集群主一台主机硬盘io报错,经过停机维护后检查硬盘io读写正常后,加入集群。发现cloudera页面报错
在查看主机log,发现有快硬盘报错
2017-04-22 10:50:11,976 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /dn9/dfs/dn/in_use.lock acquired by nodename 1178
9@datanode26.wumart.com
2017-04-22 10:50:11,984 WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateE
xception: Directory /dn9/dfs/dn is in an inconsistent state: Root /dn9/dfs/dn: DatanodeUuid=031a3a79-8d18-4ba0-9dcf-6f2850e2b65e, do
es not match 280a0cac-e5d4-497c-baf2-86c3802f3db1 from other StorageDirectory.
2017-04-22 11:12:32,716 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /dn9/dfs/dn/in_use.lock acquired by nodename 1589
1@datanode26.wumart.com
2017-04-22 11:12:32,716 WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateE
xception: Directory /dn9/dfs/dn is in an inconsistent state: Root /dn9/dfs/dn: DatanodeUuid=031a3a79-8d18-4ba0-9dcf-6f2850e2b65e, do
es not match 280a0cac-e5d4-497c-baf2-86c3802f3db1 from other StorageDirectory.
2017-04-22 12:32:32,086 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /dn9/dfs/dn/in_use.lock acquired by nodename 2661
1@datanode26.wumart.com
2017-04-22 12:32:32,087 WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateE
xception: Directory /dn9/dfs/dn is in an inconsistent state: Root /dn9/dfs/dn: DatanodeUuid=031a3a79-8d18-4ba0-9dcf-6f2850e2b65e, do
es not match 280a0cac-e5d4-497c-baf2-86c3802f3db1 from other StorageDirectory.
然后根据报错百度搜索,按照方法该uuid,后正常。
解决DataNode Volume Failures故障
Hadoop集群有一台DataNode节点发生硬件故障,由于后需需要长时间的处理,所以从Cloudera集群中剔除了该节点,在重新将该节点添加到集群时候发现DataNode节点爆DataNode 卷故障阈值警告
二、解决过程 2.1、排查故障排查DataNode日志发现如下错误: