Assume we have some data coming in through a Google PubSub topic and its traffic pattern is spiky in nature, with potentially long quiet time before a burst of data coming in at fast rate for minutes.
假设我们通过Google PubSub主题获得了一些数据,其流量模式本质上是尖锐的,可能需要很长的安静时间才能快速进入数分钟。
For processing that data, if we are going to use streaming mode Dataflow with subscription based PubSubIO as data source, will the dataflow always be in the running state with the minimum number of workers, or will it be restarted when the burst of data come in, but then stopped once we get into the quiet period?
为了处理这些数据,如果我们要使用基于订阅的PubSubIO作为数据源的流模式Dataflow,数据流是否总是处于具有最小工作数的运行状态,或者当数据突发进入时它将重新启动,但是一旦我们进入安静时期就停止了?
1 个解决方案
#1
2
If you enable autoscaling, Dataflow will raise or lower the number of workers dynamically according to load, without restarting the pipeline. You can read more about it here and here.
如果启用自动扩展,Dataflow将根据负载动态地提高或降低工作器数,而无需重新启动管道。你可以在这里和这里阅读更多相关信息。
#1
2
If you enable autoscaling, Dataflow will raise or lower the number of workers dynamically according to load, without restarting the pipeline. You can read more about it here and here.
如果启用自动扩展,Dataflow将根据负载动态地提高或降低工作器数,而无需重新启动管道。你可以在这里和这里阅读更多相关信息。