数据流无法读取区域“asia-northeast1”中的BigQuery数据集

时间:2021-08-14 15:24:04

I have a BigQuery dataset located in the new "asia-northeast1" region. I'm trying to run a Dataflow templated pipeline (running in Australia region) to read a table from it. It chucks the following error, even though the dataset/table does indeed exist:

我有一个BigQuery数据集位于新的“asia-northeast1”区域。我正在尝试运行Dataflow模板化管道(在澳大利亚地区运行)从中读取表格。即使数据集/表确实存在,它也会丢失以下错误:

Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
  "code" : 404,
  "errors" : [ {
    "domain" : "global",
    "message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo",
    "reason" : "notFound"
  } ],
  "message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo"
}

Am I doing something wrong here?

我在这里做错了吗?

/**
 * BigQuery -> ParDo -> GCS (one file)
 */
public class BigQueryTableToOneFile {
    public static void main(String[] args) throws Exception {
        PipelineOptionsFactory.register(TemplateOptions.class);
        TemplateOptions options = PipelineOptionsFactory
                .fromArgs(args)
                .withValidation()
                .as(TemplateOptions.class);
        options.setAutoscalingAlgorithm(THROUGHPUT_BASED);
        Pipeline pipeline = Pipeline.create(options);
        pipeline.apply(BigQueryIO.read().from(options.getBigQueryTableName()).withoutValidation())
                .apply(ParDo.of(new DoFn<TableRow, String>() {
                    @ProcessElement
                    public void processElement(ProcessContext c) throws Exception {
                        String commaSep = c.element().values()
                                .stream()
                                .map(cell -> cell.toString().trim())
                                .collect(Collectors.joining("\",\""));
                        c.output(commaSep);
                    }
                }))
                .apply(TextIO.write().to(options.getOutputFile())
                        .withoutSharding()
                        .withWritableByteChannelFactory(GZIP)
                );
        pipeline.run();
    }

    public interface TemplateOptions extends DataflowPipelineOptions {
        @Description("The BigQuery table to read from in the format project:dataset.table")
        @Default.String("bigquery-samples:wikipedia_benchmark.Wiki1k")
        ValueProvider<String> getBigQueryTableName();

        void setBigQueryTableName(ValueProvider<String> value);

        @Description("The name of the output file to produce in the format gs://bucket_name/filname.csv")
        @Default.String("gs://bigquery-table-to-one-file/output/bar.csv.gz")
        ValueProvider<String> getOutputFile();

        void setOutputFile(ValueProvider<String> value);
    }
}

Args:

ARGS:

--project=grey-sort-challenge
--runner=DataflowRunner
--jobName=bigquery-table-to-one-file
--maxNumWorkers=1
--zone=australia-southeast1-a
--stagingLocation=gs://bigquery-table-to-one-file/jars
--tempLocation=gs://bigquery-table-to-one-file/tmp
--templateLocation=gs://bigquery-table-to-one-file/template

Job id: 2018-05-05_05_37_08-8260293482986343692

工作号:2018-05-05_05_37_08-8260293482986343692

数据流无法读取区域“asia-northeast1”中的BigQuery数据集

数据流无法读取区域“asia-northeast1”中的BigQuery数据集

1 个解决方案

#1


0  

Sorry about that issue. It will be addressed in the upcoming Beam SDK 2.5.0 (you can try using current head snapshots from the Beam repo)

抱歉,这个问题。它将在即将发布的Beam SDK 2.5.0中解决(您可以尝试使用Beam repo中的当前头部快照)

#1


0  

Sorry about that issue. It will be addressed in the upcoming Beam SDK 2.5.0 (you can try using current head snapshots from the Beam repo)

抱歉,这个问题。它将在即将发布的Beam SDK 2.5.0中解决(您可以尝试使用Beam repo中的当前头部快照)