When trying to run a pipeline on the Dataflow service, I specify the staging and temp buckets (in GCS) on the command line. When the program executes, I get a RuntimeException before my pipeline runs, where the root cause is that I'm missing something in the path.
尝试在Dataflow服务上运行管道时,我在命令行上指定了staging和temp buckets(在GCS中)。当程序执行时,我在管道运行之前得到一个RuntimeException,其根本原因是我在路径中遗漏了一些东西。
Caused by: java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions) ... Caused by: java.lang.IllegalArgumentException: Missing object or bucket in path: 'gs://df-staging-bucket-57763/', did you mean: 'gs://some-bucket/df-staging-bucket-57763'?
引发者:java.lang.RuntimeException:无法从工厂方法构造实例DataflowRunner #fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)...由以下引起:java.lang.IllegalArgumentException:缺少对象或存储桶路径:'gs:// df-staging-bucket-57763 /',的意思是:'gs:// some-bucket / df-staging-bucket-57763'?
gs://df-staging-bucket-57763/
already exists in my project, and I have access to it. What do I need to add to make this work?
gs:// df-staging-bucket-57763 /已存在于我的项目中,我可以访问它。我需要添加什么才能使其工作?
1 个解决方案
#1
4
The DataflowRunner requires that the staging location and temp locations be a location within a bucket rather than the top-level of a bucket. Adding a directory (such as --stagingLocation=gs://df-staging-bucket-57763/staging
or --tempLocation=gs://df-staging-bucket-57763/temp
) to your arguments (for each of the stagingLocation
and gcpTempLocation
arguments) will be sufficient to run the pipeline.
DataflowRunner要求分段位置和临时位置是存储桶中的位置,而不是存储桶的*位置。将一个目录(例如--stagingLocation = gs:// df-staging-bucket-57763 / staging或--tempLocation = gs:// df-staging-bucket-57763 / temp)添加到您的参数中(对于每个目录) stagingLocation和gcpTempLocation参数)足以运行管道。
#1
4
The DataflowRunner requires that the staging location and temp locations be a location within a bucket rather than the top-level of a bucket. Adding a directory (such as --stagingLocation=gs://df-staging-bucket-57763/staging
or --tempLocation=gs://df-staging-bucket-57763/temp
) to your arguments (for each of the stagingLocation
and gcpTempLocation
arguments) will be sufficient to run the pipeline.
DataflowRunner要求分段位置和临时位置是存储桶中的位置,而不是存储桶的*位置。将一个目录(例如--stagingLocation = gs:// df-staging-bucket-57763 / staging或--tempLocation = gs:// df-staging-bucket-57763 / temp)添加到您的参数中(对于每个目录) stagingLocation和gcpTempLocation参数)足以运行管道。