在Dataflow中读取和写入表时出现BigQuery后端错误

时间:2022-07-13 15:21:06

I get this error only when reading, then writing (to a different table). If I only read from the table, no error occurs. For example, the code below produces no error.

我只在阅读,然后写(到另一个表)时才会出现此错误。如果我只从表中读取,则不会发生错误。例如,下面的代码不会产生错误。

   Pipeline p = Pipeline.create(
    PipelineOptionsFactory.fromArgs(args).withValidation().create());

   PCollection<TableRow> BigQueryTableRow = p
    	.apply(BigQueryIO.Read.named("ReadTable")
        .from("project:dataset.data_table"));

   p.run();

But if I do the following, I get a 'BigQuery job Backend error'.

但是,如果我执行以下操作,则会收到“BigQuery作业后端错误”。

Pipeline p = Pipeline.create(
    PipelineOptionsFactory.fromArgs(args).withValidation().create());
   PCollection<TableRow> BigQueryTableRow = p
    	.apply(BigQueryIO.Read.named("ReadTable")
        .from("project:dataset.data_table"));


    TableSchema tableSchema = new TableSchema().setFields(fields);
    
    BigQueryTableRow.apply(BigQueryIO.Write
      .named("Write Members to BigQuery")
      .to("project:dataset.data_table_two")
      .withSchema(tableSchema)
      .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
      .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
        
       p.run();

Some more details on the error

有关错误的更多详细信息

BigQuery job "dataflow_job" in project "project-name" 
finished with error(s): errorResult: Backend error. 
Job aborted.

1 个解决方案

#1


1  

I managed to figure out the problem on my own. The backend error message is produced because I have two repeated fields in my table.

我设法自己解决了这个问题。产生后端错误消息是因为我的表中有两个重复的字段。

If I try to output the entire table using BigQuery's web service it displays more helpful error message.

如果我尝试使用BigQuery的Web服务输出整个表,它会显示更有用的错误消息。

Error: Cannot output multiple independently repeated fields
at the same time. Found memberships_is_coach and actions_type

It is unfortunate that the 'Backend error' message provides no real insight into the problem. Also, when reading only reading the data and not performing any operations, no error is given which further exacerbates the problem.

遗憾的是,“后端错误”消息并未真正洞察问题。此外,当只读取数据而不执行任何操作时,不会给出错误,这进一步加剧了问题。

#1


1  

I managed to figure out the problem on my own. The backend error message is produced because I have two repeated fields in my table.

我设法自己解决了这个问题。产生后端错误消息是因为我的表中有两个重复的字段。

If I try to output the entire table using BigQuery's web service it displays more helpful error message.

如果我尝试使用BigQuery的Web服务输出整个表,它会显示更有用的错误消息。

Error: Cannot output multiple independently repeated fields
at the same time. Found memberships_is_coach and actions_type

It is unfortunate that the 'Backend error' message provides no real insight into the problem. Also, when reading only reading the data and not performing any operations, no error is given which further exacerbates the problem.

遗憾的是,“后端错误”消息并未真正洞察问题。此外,当只读取数据而不执行任何操作时,不会给出错误,这进一步加剧了问题。