管道在Google DataFlow上完成后运行函数/管道

时间:2021-09-13 15:36:44

I'm trying to run a function (or a pipeline) after a Beam pipeline completes, on Google DataFlow.

在Google DataFlow上,我试图在Beam管道完成后运行一个函数(或管道)。

Currently I've built a hack to run the function by writing the results of the previous Pipeline to null with

目前我已经构建了一个hack来运行该函数,方法是将前一个Pipeline的结果写为null

_ = existing_pipeline | "do next task" >> beam.Map(func)

...where func is:

...其中func是:

def func(_):
    # do some work, and ignore `_`

But is there a better way?

但有更好的方法吗?

1 个解决方案

#1


0  

Assuming you want the function to run on your machine and not in the Cloud, you should do something like this:

假设您希望该功能在您的计算机上而不是在云中运行,您应该执行以下操作:

result = existing_pipeline.run()
result.wait_until_finish()
# do some work

#1


0  

Assuming you want the function to run on your machine and not in the Cloud, you should do something like this:

假设您希望该功能在您的计算机上而不是在云中运行,您应该执行以下操作:

result = existing_pipeline.run()
result.wait_until_finish()
# do some work