数据流错误 - “来源太大。限制是5.00Ti“

时间:2021-06-06 15:45:30

We have a pipeline that looks like:


BigQuery -> ParDo -> BigQuery

BigQuery - > ParDo - > BigQuery

The table has ~2B rows, and is just under 1TB.


After running for just over 8 hours, the job failed with the following error:


May 19, 2015, 10:09:15 PM
S09: (f5a951d84007ef89): Workflow failed. Causes: (f5a951d84007e064): BigQuery job "dataflow_job_17701769799585490748" in project "gdfp-xxxx" finished with error(s): job error: Sources are too large. Limit is 5.00Ti., error: Sources are too large. Limit is 5.00Ti.

Job id is: 2015-05-18_21_04_28-9907828662358367047


It's a big table, but it's not that big and Dataflow should be easily able to handle it. Why can't it handle this use case?


Also, even though the job failed, it still shows it as successful on the diagram. Why?


数据流错误 - “来源太大。限制是5.00Ti“

1 个解决方案



I think that error means the data you are trying to write to BigQuery exceeds the 5TB limit set by BigQuery for a single import job.


One way to work around this limit might be to split your BigQuery writes into multiple jobs by having multiple Write transforms so that no Write transform receives more than 5TB.


Before your write transform, you could have a DoFn with N outputs. For each record randomly assign it to one of the outputs. Each of the N outputs can then have its own BigQuery.Write transform. The write transforms could all append data to the same table so that all of the data will end up in the same table.




I think that error means the data you are trying to write to BigQuery exceeds the 5TB limit set by BigQuery for a single import job.


One way to work around this limit might be to split your BigQuery writes into multiple jobs by having multiple Write transforms so that no Write transform receives more than 5TB.


Before your write transform, you could have a DoFn with N outputs. For each record randomly assign it to one of the outputs. Each of the N outputs can then have its own BigQuery.Write transform. The write transforms could all append data to the same table so that all of the data will end up in the same table.
