谷歌数据流工作连续失败:“管道坏了”

时间:2022-08-01 15:40:37

I've been using the same code for a long time it used to work but when I re-run our batch loader it gave error not enough disk space so I increased the disk size and ran again then I get Pipeline broken error like below

我一直在使用相同的代码很长一段时间它曾经工作,但当我重新运行我们的批量加载器时,它给出错误没有足够的磁盘空间所以我增加了磁盘大小并再次运行然后我得到管道损坏错误如下

    (84383c8e79f9b6a1): java.io.IOException: java.io.IOException: Pipe broken
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
    at com.google.cloud.dataflow.sdk.runners.worker.TextSink$TextFileWriter.close(TextSink.java:243)
    at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100)
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:254)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:191)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:144)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:180)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:161)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:148)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Pipe broken
    at java.io.PipedInputStream.read(PipedInputStream.java:321)
    at java.io.PipedInputStream.read(PipedInputStream.java:377)
    at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181)
    at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629)
    at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409)
    at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
    ... 4 more

This error is sometimes normal but batch job finally finish but now it is not finishing and failing in the middle after couple of hours.

这个错误有时是正常的,但是批处理作业最终完成但是现在它没有完成并且在几个小时之后在中间失败。

I am kinda blocked with this error and not sure how to proceed and get our batch loader start again.

我有点被这个错误阻止,不知道如何继续,让我们的批量加载器再次启动。

1 个解决方案

#1


1  

Posting an answer to address the last question on the comment thread above.

发布答案以解决上面评论帖子中的最后一个问题。

The message "CoGbkResult has more than 10000 elements, reiteration (which may be slow) is required" is not an error. 10000 elements is chosen as the maximum amount to keep in memory at once, and it's just letting you know that it must re-iterate on the remaining results if you have more than 10000 of them.

消息“CoGbkResult有超过10000个元素,需要重复(可能很慢)”并不是错误。选择10000个元素作为一次保留在内存中的最大数量,它只是让你知道它必须重新迭代剩余的结果,如果你有超过10000个。

I'd advise to continue debugging the issue on dataflow-feedback@google.com as jkff suggested rather than in the comment thread, since it's grown outside the scope of a Stack Overflow question.

我建议继续调试dataflow-feedback@google.com上的问题,因为jkff建议而不是在评论主题中,因为它超出了Stack Overflow问题的范围。

#1


1  

Posting an answer to address the last question on the comment thread above.

发布答案以解决上面评论帖子中的最后一个问题。

The message "CoGbkResult has more than 10000 elements, reiteration (which may be slow) is required" is not an error. 10000 elements is chosen as the maximum amount to keep in memory at once, and it's just letting you know that it must re-iterate on the remaining results if you have more than 10000 of them.

消息“CoGbkResult有超过10000个元素,需要重复(可能很慢)”并不是错误。选择10000个元素作为一次保留在内存中的最大数量,它只是让你知道它必须重新迭代剩余的结果,如果你有超过10000个。

I'd advise to continue debugging the issue on dataflow-feedback@google.com as jkff suggested rather than in the comment thread, since it's grown outside the scope of a Stack Overflow question.

我建议继续调试dataflow-feedback@google.com上的问题,因为jkff建议而不是在评论主题中,因为它超出了Stack Overflow问题的范围。