Google Cloud Dataflow - 自动缩放无法正常工作

时间:2021-01-09 15:25:59

I'm running a Google Dataflow pipeline job and its job ID is: 2018-08-17_03_35_19-3029795715446392897

我正在运行Google Dataflow管道作业,其作业ID为:2018-08-17_03_35_19-3029795715446392897

The console says that it has adjusted my autoscaling from 3 to 1000 nodes based on the current rate of progress, but the job is still saying at 3 nodes only.

控制台表示已根据当前的进度将调整后的自动调节范围从3个调整到1000个,但该作业仍然仅在3个节点上进行调整。

I also haven't received any errors in the Google Cloud Console regarding quota limits, so I'm not sure why Dataflow isn't scaling my pipeline, despite it saying so.

我还没有在Google Cloud Console中收到有关配额限制的任何错误,因此我不确定为什么Dataflow不会扩展我的管道,尽管它这样说。

Thank you for the help!

感谢您的帮助!

2 个解决方案

#1


0  

To autoscale your Dataflow Job, be sure that you use "autoscalingAlgorithm":"THROUGHPUT_BASED".

要自动调整数据流作业,请确保使用“autoscalingAlgorithm”:“THROUGHPUT_BASED”。

If you use "autoscalingAlgorithm":"NONE" and numWorkers: 3 (or you don't specify numWorkers, which will default to 3), then your Dataflow Job will get stuck at 3 nodes even if it could autoscale to the max number of nodes (which is 1000 if you set maxNumWorkers to 0 or 1000).

如果您使用“autoscalingAlgorithm”:“NONE”和numWorkers:3(或者您没有指定numWorkers,默认为3),那么即使数据流作业可以自动调整到最大数量,也会卡在3个节点上节点(如果将maxNumWorkers设置为0或1000,则为1000)。

If you don't want to use THROUGHPUT_BASED, then you will need to specify the number of workers you want on numWorkers, not on maxNumWorkers.

如果您不想使用THROUGHPUT_BASED,那么您需要在numWorkers上指定所需的工作者数量,而不是maxNumWorkers。

Also, to scale to the amount of workers you want, be sure to specify a number equal or lower to your quota, check your quota by using:

此外,要扩展到所需的工作量,请务必指定与配额相等或更低的数字,使用以下方法检查配额:

gcloud compute project-info describe

#2


0  

It looks like you're getting an error for the quota on number of VM instances when attempting to scale to 1000 workers. According to these docs, that quota is a factor of your permitted CPU cores for the region. I would check your CPU quotas to see if they would permit 1000x your configured instance size. I would also check that you have enough disk and ip addresses to scale to the size of that worker pool as well.

在尝试扩展到1000个工作线程时,您似乎收到了VM实例数量上的配额错误。根据这些文档,该配额是该区域允许的CPU核心的一个因素。我会检查你的CPU配额,看看他们是否允许1000x你配置的实例大小。我还会检查您是否有足够的磁盘和IP地址来扩展到该工作池的大小。

To request additional quota, follow the instructions here.

要请求其他配额,请按照此处的说明操作。

#1


0  

To autoscale your Dataflow Job, be sure that you use "autoscalingAlgorithm":"THROUGHPUT_BASED".

要自动调整数据流作业,请确保使用“autoscalingAlgorithm”:“THROUGHPUT_BASED”。

If you use "autoscalingAlgorithm":"NONE" and numWorkers: 3 (or you don't specify numWorkers, which will default to 3), then your Dataflow Job will get stuck at 3 nodes even if it could autoscale to the max number of nodes (which is 1000 if you set maxNumWorkers to 0 or 1000).

如果您使用“autoscalingAlgorithm”:“NONE”和numWorkers:3(或者您没有指定numWorkers,默认为3),那么即使数据流作业可以自动调整到最大数量,也会卡在3个节点上节点(如果将maxNumWorkers设置为0或1000,则为1000)。

If you don't want to use THROUGHPUT_BASED, then you will need to specify the number of workers you want on numWorkers, not on maxNumWorkers.

如果您不想使用THROUGHPUT_BASED,那么您需要在numWorkers上指定所需的工作者数量,而不是maxNumWorkers。

Also, to scale to the amount of workers you want, be sure to specify a number equal or lower to your quota, check your quota by using:

此外,要扩展到所需的工作量,请务必指定与配额相等或更低的数字,使用以下方法检查配额:

gcloud compute project-info describe

#2


0  

It looks like you're getting an error for the quota on number of VM instances when attempting to scale to 1000 workers. According to these docs, that quota is a factor of your permitted CPU cores for the region. I would check your CPU quotas to see if they would permit 1000x your configured instance size. I would also check that you have enough disk and ip addresses to scale to the size of that worker pool as well.

在尝试扩展到1000个工作线程时,您似乎收到了VM实例数量上的配额错误。根据这些文档,该配额是该区域允许的CPU核心的一个因素。我会检查你的CPU配额,看看他们是否允许1000x你配置的实例大小。我还会检查您是否有足够的磁盘和IP地址来扩展到该工作池的大小。

To request additional quota, follow the instructions here.

要请求其他配额,请按照此处的说明操作。