Google Cloud Dataflow - 自动缩放无法正常工作

I'm running a Google Dataflow pipeline job and its job ID is: 2018-08-17_03_35_19-3029795715446392897

我正在运行Google Dataflow管道作业，其作业ID为：2018-08-17_03_35_19-3029795715446392897

The console says that it has adjusted my autoscaling from 3 to 1000 nodes based on the current rate of progress, but the job is still saying at 3 nodes only.

控制台表示已根据当前的进度将调整后的自动调节范围从3个调整到1000个，但该作业仍然仅在3个节点上进行调整。

I also haven't received any errors in the Google Cloud Console regarding quota limits, so I'm not sure why Dataflow isn't scaling my pipeline, despite it saying so.

我还没有在Google Cloud Console中收到有关配额限制的任何错误，因此我不确定为什么Dataflow不会扩展我的管道，尽管它这样说。

Thank you for the help!

感谢您的帮助！

2 个解决方案

#1

To autoscale your Dataflow Job, be sure that you use "autoscalingAlgorithm":"THROUGHPUT_BASED".

要自动调整数据流作业，请确保使用“autoscalingAlgorithm”：“THROUGHPUT_BASED”。

If you use "autoscalingAlgorithm":"NONE" and numWorkers: 3 (or you don't specify numWorkers, which will default to 3), then your Dataflow Job will get stuck at 3 nodes even if it could autoscale to the max number of nodes (which is 1000 if you set maxNumWorkers to 0 or 1000).

如果您使用“autoscalingAlgorithm”：“NONE”和numWorkers：3（或者您没有指定numWorkers，默认为3），那么即使数据流作业可以自动调整到最大数量，也会卡在3个节点上节点（如果将maxNumWorkers设置为0或1000，则为1000）。

If you don't want to use THROUGHPUT_BASED, then you will need to specify the number of workers you want on numWorkers, not on maxNumWorkers.

如果您不想使用THROUGHPUT_BASED，那么您需要在numWorkers上指定所需的工作者数量，而不是maxNumWorkers。

Also, to scale to the amount of workers you want, be sure to specify a number equal or lower to your quota, check your quota by using:

此外，要扩展到所需的工作量，请务必指定与配额相等或更低的数字，使用以下方法检查配额：

gcloud compute project-info describe

#2

It looks like you're getting an error for the quota on number of VM instances when attempting to scale to 1000 workers. According to these docs, that quota is a factor of your permitted CPU cores for the region. I would check your CPU quotas to see if they would permit 1000x your configured instance size. I would also check that you have enough disk and ip addresses to scale to the size of that worker pool as well.

在尝试扩展到1000个工作线程时，您似乎收到了VM实例数量上的配额错误。根据这些文档，该配额是该区域允许的CPU核心的一个因素。我会检查你的CPU配额，看看他们是否允许1000x你配置的实例大小。我还会检查您是否有足够的磁盘和IP地址来扩展到该工作池的大小。

To request additional quota, follow the instructions here.

要请求其他配额，请按照此处的说明操作。

#1