如何修复Google DataFlow Pipeline(args)空指针异常?

时间:2021-03-27 15:35:07

I'm trying to run a really simple dataflow job, just taking some data in BigQuery, processing it a bit and putting it in a new bigquery table

我正在尝试运行一个非常简单的数据流作业,只是在BigQuery中获取一些数据,稍微处理它并将它放在一个新的bigquery表中

Pipeline p = Pipeline.create( PipelineOptionsFactory.fromArgs(args).withValidation().create()); p.apply(BigQueryIO.Read.fromQuery("SELECT * FROM realtime.status_6_output_11")); p.run();

管道p = Pipeline.create(PipelineOptionsFactory.fromArgs(args).withValidation()。create()); p.apply(BigQueryIO.Read.fromQuery(“SELECT * FROM realtime.status_6_output_11”)); p.run();

However whenever I run it I get the following pretty undescriptive NullPointerException:

但是每当我运行它时,我得到以下非常不合理的NullPointerException:

Exception in thread "main" java.lang.NullPointerException
    at java.util.regex.Matcher.getTextLength(Matcher.java:1283)
    at java.util.regex.Matcher.reset(Matcher.java:309)
    at java.util.regex.Matcher.<init>(Matcher.java:229)
    at java.util.regex.Pattern.matcher(Pattern.java:1093)
    at com.google.cloud.dataflow.sdk.util.IOChannelUtils.getFactory(IOChannelUtils.java:174)
    at com.google.cloud.dataflow.sdk.io.BigQueryIO$Read$Bound.apply(BigQueryIO.java:553)
    at com.google.cloud.dataflow.sdk.io.BigQueryIO$Read$Bound.apply(BigQueryIO.java:387)
    at com.google.cloud.dataflow.sdk.runners.PipelineRunner.apply(PipelineRunner.java:74)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.apply(DirectPipelineRunner.java:247)
    at com.google.cloud.dataflow.sdk.Pipeline.applyInternal(Pipeline.java:367)
    at com.google.cloud.dataflow.sdk.Pipeline.applyTransform(Pipeline.java:274)
    at com.google.cloud.dataflow.sdk.values.PBegin.apply(PBegin.java:47)
    at com.google.cloud.dataflow.sdk.Pipeline.apply(Pipeline.java:156)
    at com.noraway.conductor.NormalizedPipeline.main(NormalizedPipeline.java:42)

I think there's a problem with my command line arguments (don't have any right now) but I'm not sure what that would be.

1 个解决方案

#1


1  

It looks like there is a missing --tempLocation for BigQuery to use. The obscure error message is fixed as part of https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/313.

看起来缺少使用BigQuery的--tempLocation。隐藏的错误消息已修复为https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/313的一部分。

#1


1  

It looks like there is a missing --tempLocation for BigQuery to use. The obscure error message is fixed as part of https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/313.

看起来缺少使用BigQuery的--tempLocation。隐藏的错误消息已修复为https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/313的一部分。