I have a simple dataflow job for testing that ran successfully with apache-beam 2.1.0, the code looks something like:
我有一个简单的数据流作业,用于使用apache-beam 2.1.0成功运行的测试,代码如下所示:
public static void main(String[] args) throws Exception {
DataflowPipelineOptions dataflowOptions = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
dataflowOptions.setProject("MY_PROJECT_ID");
dataflowOptions.setStagingLocation("gs://MY_STAGING_LOC");
dataflowOptions.setTempLocation("gs://MY_TEMP_LOC");
dataflowOptions.setFilesToStage(Collections.singletonList("MY_LOCAL_JAR_FILE.jar"));
dataflowOptions.setRunner(DataflowRunner.class);
dataflowOptions.setNetwork("SOME_NETWORK");
dataflowOptions.setSubnetwork("regions/SOME_REGION/subnetworks/SOME_SUBNETWORK");
dataflowOptions.setZone("SOME_ZONE");
Pipeline p = Pipeline.create(dataflowOptions);
List<String> LINES = Arrays.asList("foobar");
p.apply(Create.of(LINES)).setCoder(StringUtf8Coder.of());
p.run().waitUntilFinish();
}
However, when I migrate to apache-beam 2.4.0, I immediately get the following error when trying to submit a dataflow job via the cli.
但是,当我迁移到apache-beam 2.4.0时,我在尝试通过cli提交数据流作业时立即收到以下错误。
Exception in thread "main" java.lang.RuntimeException: Error while staging packages
at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:396)
at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:273)
at org.apache.beam.runners.dataflow.util.GcsStager.stageFiles(GcsStager.java:76)
at org.apache.beam.runners.dataflow.util.GcsStager.stageDefaultFiles(GcsStager.java:64)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:661)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:174)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at com.company.app.App.main(App.java:48)
Caused by: java.io.IOException: Error executing batch GCS request
at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:607)
at org.apache.beam.sdk.util.GcsUtil.getObjects(GcsUtil.java:339)
at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.matchNonGlobs(GcsFileSystem.java:216)
at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.match(GcsFileSystem.java:85)
at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:123)
at org.apache.beam.sdk.io.FileSystems.matchSingleFileSpec(FileSystems.java:188)
at org.apache.beam.runners.dataflow.util.PackageUtil.alreadyStaged(PackageUtil.java:160)
at org.apache.beam.runners.dataflow.util.PackageUtil.stagePackageSynchronously(PackageUtil.java:184)
at org.apache.beam.runners.dataflow.util.PackageUtil.lambda$stagePackage$1(PackageUtil.java:174)
at org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:101)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: com.google.api.client.http.HttpResponseException: 404 Not Found
...
I haven't changed any configuration settings.
我没有更改任何配置设置。
Further debugging the code, it is failing on a POST request to https://www.googleapis.com/null
进一步调试代码,POST请求失败到https://www.googleapis.com/null
2 个解决方案
#1
2
Looks like it is a bug which was fixed in the dev branch on Feb 13. Hopefully the fix will be released soon:
看起来它是2月13日在dev分支中修复的bug。希望修复程序很快就会发布:
Original Issue: https://github.com/google/google-api-java-client/issues/1073
原始问题:https://github.com/google/google-api-java-client/issues/1073
Flawed Fix: https://github.com/google/google-api-java-client/pull/1087
有缺陷的修复:https://github.com/google/google-api-java-client/pull/1087
Corrected Fix: https://github.com/google/google-api-java-client/pull/1096
更正了修复程序:https://github.com/google/google-api-java-client/pull/1096
#2
0
You're hitting this issue: https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/607
你遇到了这个问题:https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/607
To fix, add the following if using Gradle:
要修复,请使用Gradle添加以下内容:
compile (group: 'com.google.api-client', name: 'google-api-client', version: '1.22.0') {
force = true
}
Or Maven:
或者Maven:
<dependency>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client</artifactId>
<version>[1.22.0]</version>
</dependency>
#1
2
Looks like it is a bug which was fixed in the dev branch on Feb 13. Hopefully the fix will be released soon:
看起来它是2月13日在dev分支中修复的bug。希望修复程序很快就会发布:
Original Issue: https://github.com/google/google-api-java-client/issues/1073
原始问题:https://github.com/google/google-api-java-client/issues/1073
Flawed Fix: https://github.com/google/google-api-java-client/pull/1087
有缺陷的修复:https://github.com/google/google-api-java-client/pull/1087
Corrected Fix: https://github.com/google/google-api-java-client/pull/1096
更正了修复程序:https://github.com/google/google-api-java-client/pull/1096
#2
0
You're hitting this issue: https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/607
你遇到了这个问题:https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/607
To fix, add the following if using Gradle:
要修复,请使用Gradle添加以下内容:
compile (group: 'com.google.api-client', name: 'google-api-client', version: '1.22.0') {
force = true
}
Or Maven:
或者Maven:
<dependency>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client</artifactId>
<version>[1.22.0]</version>
</dependency>