I have a BigQuery dataset located in the new "asia-northeast1" region. I'm trying to run a Dataflow templated pipeline (running in Australia region) to read a table from it. It chucks the following error, even though the dataset/table does indeed exist:
我有一个BigQuery数据集位于新的“asia-northeast1”区域。我正在尝试运行Dataflow模板化管道(在澳大利亚地区运行)从中读取表格。即使数据集/表确实存在,它也会丢失以下错误:
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo",
"reason" : "notFound"
} ],
"message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo"
}
Am I doing something wrong here?
我在这里做错了吗?
/**
* BigQuery -> ParDo -> GCS (one file)
*/
public class BigQueryTableToOneFile {
public static void main(String[] args) throws Exception {
PipelineOptionsFactory.register(TemplateOptions.class);
TemplateOptions options = PipelineOptionsFactory
.fromArgs(args)
.withValidation()
.as(TemplateOptions.class);
options.setAutoscalingAlgorithm(THROUGHPUT_BASED);
Pipeline pipeline = Pipeline.create(options);
pipeline.apply(BigQueryIO.read().from(options.getBigQueryTableName()).withoutValidation())
.apply(ParDo.of(new DoFn<TableRow, String>() {
@ProcessElement
public void processElement(ProcessContext c) throws Exception {
String commaSep = c.element().values()
.stream()
.map(cell -> cell.toString().trim())
.collect(Collectors.joining("\",\""));
c.output(commaSep);
}
}))
.apply(TextIO.write().to(options.getOutputFile())
.withoutSharding()
.withWritableByteChannelFactory(GZIP)
);
pipeline.run();
}
public interface TemplateOptions extends DataflowPipelineOptions {
@Description("The BigQuery table to read from in the format project:dataset.table")
@Default.String("bigquery-samples:wikipedia_benchmark.Wiki1k")
ValueProvider<String> getBigQueryTableName();
void setBigQueryTableName(ValueProvider<String> value);
@Description("The name of the output file to produce in the format gs://bucket_name/filname.csv")
@Default.String("gs://bigquery-table-to-one-file/output/bar.csv.gz")
ValueProvider<String> getOutputFile();
void setOutputFile(ValueProvider<String> value);
}
}
Args:
ARGS:
--project=grey-sort-challenge
--runner=DataflowRunner
--jobName=bigquery-table-to-one-file
--maxNumWorkers=1
--zone=australia-southeast1-a
--stagingLocation=gs://bigquery-table-to-one-file/jars
--tempLocation=gs://bigquery-table-to-one-file/tmp
--templateLocation=gs://bigquery-table-to-one-file/template
Job id: 2018-05-05_05_37_08-8260293482986343692
工作号:2018-05-05_05_37_08-8260293482986343692
1 个解决方案
#1
0
Sorry about that issue. It will be addressed in the upcoming Beam SDK 2.5.0 (you can try using current head snapshots from the Beam repo)
抱歉,这个问题。它将在即将发布的Beam SDK 2.5.0中解决(您可以尝试使用Beam repo中的当前头部快照)
#1
0
Sorry about that issue. It will be addressed in the upcoming Beam SDK 2.5.0 (you can try using current head snapshots from the Beam repo)
抱歉,这个问题。它将在即将发布的Beam SDK 2.5.0中解决(您可以尝试使用Beam repo中的当前头部快照)