从Dataflow管道中的数据存储读取速度很慢

When reading data in from Datastore in my Dataflow pipeline, it seems like the job is not being distributed over the amount of available workers I have set for my job. Does Dataflow parallelize the read of Datastore data or is it doing it with a single worker?

在我的Dataflow管道中从数据存储区读取数据时，似乎作业没有分配给我为我的工作设置的可用工作量。 Dataflow是否并行化数据存储区数据的读取，还是单个工作程序执行此操作？

1 个解决方案

#1

Typically, reads made by DatastoreIO use multiple workers to read in parallel. However, not all queries can be parallelized according to the documentation. For instance, queries that specify a limit or use an inequality filter. These queries would need to use a single worker to ensure correctness.

通常，DatastoreIO进行的读取使用多个worker并行读取。但是，并非所有查询都可以根据文档进行并行化。例如，指定限制的查询或使用不等式过滤器。这些查询需要使用单个工作程序来确保正确性。

https://cloud.google.com/dataflow/model/datastore-io#reading-from-datastore

#1

https://cloud.google.com/dataflow/model/datastore-io#reading-from-datastore

秒客网

从Dataflow管道中的数据存储读取速度很慢

1 个解决方案

#1

#1

相关文章