关于hadoop的AlreadyBeingCreatedException异常的解决

时间:2024-02-29 12:29:59

今天工作上遇到一个问题,报错如下:

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP for DFSClient_attempt_201601231122_96889_m_000004_0_1149914572_1 on client 132.121.94.29, because this file is already being created by NN_Recovery on 132.121.94.29

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1826)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1649)  

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1595)  

at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:712)

at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:691)

at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438)

at org.apache.hadoop.ipc.Client.call(Client.java:1118)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)

at com.sun.proxy.$Proxy7.create(Unknown Source)

at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)

at com.sun.proxy.$Proxy7.create(Unknown Source)

at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3660)

at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:930)

at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:207)

 

查NameNode的日志:

2016-02-15 00:02:28,392 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease.  Holder: NN_Recovery, pendingcreates: 182], src=/apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP from client NN_Recovery

2016-02-15 00:02:28,392 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease.  Holder: NN_Recovery, pendingcreates: 182], src=/apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP 2016-02-15 00:02:28,392 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* blk_-3441330245353537374_1007295680 recovery started, primary=132.121.94.29:1010

2016-02-15 00:02:28,407 WARN org.apache.hadoop.hdfs.StateChange: DIR* startFile: failed to create file /apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP for DFSClient_attempt_201601231122_95299_m_000003_0_-559831283_1 on client 132.121.94.7, because this file is already being created by NN_Recovery on 132.121.94.29 2016-02-15 00:02:28,407 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:icps/c2a1.ecld.com@ECLD.COM cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP for DFSClient_attempt_201601231122_95299_m_000003_0_-559831283_1 on client 132.121.94.7, because this file is already being created by NN_Recovery on 132.121.94.29 2016-02-15 00:02:28,407 INFO org.apache.hadoop.ipc.Server: IPC Server handler 34 on 8020, call create(/apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP, rwxr-xr-x, DFSClient_attempt_201601231122_95299_m_000003_0_-559831283_1, true, 3, 67108864) from 132.121.94.7:48867: error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP for DFSClient_attempt_201601231122_95299_m_000003_0_-559831283_1 on client 132.121.94.7, because this file is already being created by NN_Recovery on 132.121.94.29 org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP for DFSClient_attempt_201601231122_95299_m_000003_0_-559831283_1 on client 132.121.94.7, because this file is already being created by NN_Recovery on 132.121.94.29         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1826)         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1649)         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1595)         at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:712)         at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:691)

一直反复刷这三条日志,

而我的代码只是简单的在HDFS上创建一个文件:

FSDataOutputStream fsdout = fs.create(tmpPath);

默认是如果该文件存在则会覆盖掉,一开始很奇怪,我感觉这是hadoop本身的bug

于是上hadoop官网上查找这个异常,发现了这个:

Creating an already-open-for-write file with overwrite=true fails

Details

  • Type:Bug Bug
  • Status:CLOSED
  • Priority:Major Major
  • Resolution:Fixed
  • Affects Version/s:2.0.0-alpha
  • Fix Version/s:2.0.2-alpha
  • Component/s:namenode
  • Labels:
    None
  • Hadoop Flags:
    Reviewed
  • Release Note:
    This is an incompatible change: Before this change, if a file is already open for write by one client, and another client calls fs.create() with overwrite=true, an AlreadyBeingCreatedException is thrown. After this change, the file will be deleted and the new file will be created successfully.

    Description

    If a file is already open for write by one client, and another client calls fs.create() with overwrite=true, the file should be deleted and the new file successfully created. Instead, it is currently throwing AlreadyBeingCreatedException.

    This is a regression since branch-1.

    (详情请看:https://issues.apache.org/jira/browse/HDFS-3755)

     

     

    查看这个BUG的修改日志:

    --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java	(revision 1370568)
    +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java	(working copy)
    @@ -1755,8 +1755,6 @@
     
         try {
           INodeFile myFile = dir.getFileINode(src);
    -      recoverLeaseInternal(myFile, src, holder, clientMachine, false);
    -
           try {
             blockManager.verifyReplication(src, replication, clientMachine);
           } catch(IOException e) { recoverLeaseInternal(myFile, src, holder, clientMachine, false);这个函数被删掉了,取而代之的是blockManager.verifyReplication(src, replication, clientMachine); 我反编译我们所使用的依赖库代码,发现我们的代码还是使用这个函数: ......

     try {       

      INode myFile = dir.getFileINode(src);        

    recoverLeaseInternal(myFile, src, holder, clientMachine, false);

            try {       

        verifyReplication(src, replication, clientMachine);

    ......
    结论: 试图创建一个已存在的文件有可能会导致抛出AlreadyBeingCreatedException这个异常,由于namenode的server一直刷回收lease的日志,有可能是lease回收失败导致. 解决办法: 将代码: FSDataOutputStream fsdout = fs.create(tmpPath); 改成: if (fs.exists(tmpPath)) {      fs.delete(tmpPath, false);     } FSDataOutputStream fsdout = fs.create(tmpPath);