在DataFlow上取消没有dataloss的作业

时间:2020-11-29 15:33:59

I'm trying to find a way gracefully end my jobs, so as not to lose any data, streaming from PubSub and writing to BigQuery.

我正试图找到一种优雅地结束我的工作的方式,以免丢失任何数据,从PubSub流式传输并写入BigQuery。

A possible approach I can envision is to have the job stop pulling new data and then run until it has processed everything, but I don't know if/how this is possible to implement.

我可以设想的一种可能的方法是让作业停止提取新数据,然后运行直到它处理完所有内容,但我不知道是否/如何实现这一点。

2 个解决方案

#1


3  

It appears this feature was added in the latest release.

看来此功能已在最新版本中添加。

All you have to do now is select the drain option when cancelling a job.

您现在要做的就是在取消作业时选择排水选项。

Thanks.

谢谢。

#2


2  

I believe this would be difficult (if not impossible) to do on your own. We (Google Cloud Dataflow team) are aware of this need and are working on addressing it with a new feature in the coming months.

我相信这对你自己很难(如果不是不可能的话)。我们(Google云数据流团队)已意识到这一需求,并正在努力在未来几个月内使用新功能解决此问题。

#1


3  

It appears this feature was added in the latest release.

看来此功能已在最新版本中添加。

All you have to do now is select the drain option when cancelling a job.

您现在要做的就是在取消作业时选择排水选项。

Thanks.

谢谢。

#2


2  

I believe this would be difficult (if not impossible) to do on your own. We (Google Cloud Dataflow team) are aware of this need and are working on addressing it with a new feature in the coming months.

我相信这对你自己很难(如果不是不可能的话)。我们(Google云数据流团队)已意识到这一需求,并正在努力在未来几个月内使用新功能解决此问题。