使用Dataflow Runner运行时Beam Sql失败

时间:2022-08-08 15:36:39

While Testing Beam Sql I used model class Example from Github, Running the Example of POJo on Local Machine(DirectRunner) Works Well but Fails with the Exception when running using DataflowRunner.

在测试Beam Sql时,我使用了Github中的模型类示例,在本地计算机上运行POJo示例(DirectRunner)运行良好,但在使用DataflowRunner运行时出现异常失败。

Exception:

例外:

java.lang.IllegalArgumentException: Unable to encode element 'com.test.Customer1@523377ea' with coder 'org.apache.beam.sdk.schemas.SchemaCoder@2574fe3c'.
    at org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:300)
    at org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291)
    at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:564)
    at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:480)
    at com.google.cloud.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:398)
    at com.google.cloud.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:124)
    at com.google.cloud.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:63)
    at com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:42)
    at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:200)
    at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
    at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:393)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:362)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:290)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
    at org.apache.beam.sdk.coders.BigEndianIntegerCoder.encode(BigEndianIntegerCoder.java:30)
    at org.apache.beam.sdk.coders.RowCoderGenerator$EncodeInstruction.encodeDelegate(RowCoderGenerator.java:206)
    at org.apache.beam.sdk.coders.Coder$ByteBuddy$BXCW8AHn.encode(Unknown Source)
    at org.apache.beam.sdk.coders.Coder$ByteBuddy$BXCW8AHn.encode(Unknown Source)
    at org.apache.beam.sdk.coders.RowCoder.encode(RowCoder.java:105)
    at org.apache.beam.sdk.schemas.SchemaCoder.encode(SchemaCoder.java:82)
    at org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:297)
    ... 20 more

Code : https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example

代码:https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example

1 个解决方案

#1


2  

This was actually a subtle bug in the Beam schema code, that would only exhibit on a multi-worker setup; this is the reason why the test passed on the DirectRunner. The fix is in https://github.com/apache/beam/pull/6218, which will be merged just as soon as it's reviewed.

这实际上是Beam模式代码中的一个微妙的错误,它只会出现在多工作者设置中;这就是为什么测试在DirectRunner上传递的原因。修复程序位于https://github.com/apache/beam/pull/6218,它将在审核后立即合并。

#1


2  

This was actually a subtle bug in the Beam schema code, that would only exhibit on a multi-worker setup; this is the reason why the test passed on the DirectRunner. The fix is in https://github.com/apache/beam/pull/6218, which will be merged just as soon as it's reviewed.

这实际上是Beam模式代码中的一个微妙的错误,它只会出现在多工作者设置中;这就是为什么测试在DirectRunner上传递的原因。修复程序位于https://github.com/apache/beam/pull/6218,它将在审核后立即合并。