从Dataflow管道中的数据存储读取速度很慢

时间:2022-12-12 15:23:25

When reading data in from Datastore in my Dataflow pipeline, it seems like the job is not being distributed over the amount of available workers I have set for my job. Does Dataflow parallelize the read of Datastore data or is it doing it with a single worker?

在我的Dataflow管道中从数据存储区读取数据时,似乎作业没有分配给我为我的工作设置的可用工作量。 Dataflow是否并行化数据存储区数据的读取,还是单个工作程序执行此操作?

1 个解决方案

#1


1  

Typically, reads made by DatastoreIO use multiple workers to read in parallel. However, not all queries can be parallelized according to the documentation. For instance, queries that specify a limit or use an inequality filter. These queries would need to use a single worker to ensure correctness.

通常,DatastoreIO进行的读取使用多个worker并行读取。但是,并非所有查询都可以根据文档进行并行化。例如,指定限制的查询或使用不等式过滤器。这些查询需要使用单个工作程序来确保正确性。

https://cloud.google.com/dataflow/model/datastore-io#reading-from-datastore

https://cloud.google.com/dataflow/model/datastore-io#reading-from-datastore

#1


1  

Typically, reads made by DatastoreIO use multiple workers to read in parallel. However, not all queries can be parallelized according to the documentation. For instance, queries that specify a limit or use an inequality filter. These queries would need to use a single worker to ensure correctness.

通常,DatastoreIO进行的读取使用多个worker并行读取。但是,并非所有查询都可以根据文档进行并行化。例如,指定限制的查询或使用不等式过滤器。这些查询需要使用单个工作程序来确保正确性。

https://cloud.google.com/dataflow/model/datastore-io#reading-from-datastore

https://cloud.google.com/dataflow/model/datastore-io#reading-from-datastore