Is there a way to publish a message onto Google Pubsub after a Google Dataflow job completes? We have a need to notify dependent systems that the processing of incoming data is complete. How could Dataflow publish after writing data to the sink?
在Google Dataflow作业完成后,有没有办法在Google Pubsub上发布消息?我们需要通知从属系统传入数据的处理已完成。将数据写入接收器后,Dataflow如何发布?
EDIT: We want to notify after a pipeline completes writing to GCS. Our pipeline looks like this:
编辑:我们想在管道完成写入GCS后通知。我们的管道如下:
Pipeline.create(options) .apply(....) .apply(AvroIO.Write.named("Write to GCS") .withSchema(Extract.class) .to(options.getOutputPath()) .withSuffix(".avro")); p.run();
If we add logic outside of the pipeline.apply(...) methods we are notified when the code completes execution, not when the pipeline is completed. Ideally we could add another .apply(...)
after the AvroIO sink and publish a message to PubSub.
如果我们在pipeline.apply(...)方法之外添加逻辑,那么在代码完成执行时会通知我们,而不是在管道完成时。理想情况下,我们可以在AvroIO接收器之后添加另一个.apply(...)并向PubSub发布消息。
1 个解决方案
#1
1
You have two options to get notified when your pipeline finishes, and then subsequently publish a message - or do whatever you want to after the pipeline finishes running:
您有两个选项可以在管道完成时收到通知,然后发布消息 - 或者在管道完成运行后执行任何操作:
- Use the
BlockingPipelineRunner
. This will run your pipeline synchronously. - 使用BlockingPipelineRunner。这将同步运行您的管道。
- Use the
DataflowPipelineRunner
. This will run your pipeline asynchronously. You can then poll the pipeline for its status, and wait for it to finish. - 使用DataflowPipelineRunner。这将异步运行您的管道。然后,您可以轮询管道的状态,并等待它完成。
#1
1
You have two options to get notified when your pipeline finishes, and then subsequently publish a message - or do whatever you want to after the pipeline finishes running:
您有两个选项可以在管道完成时收到通知,然后发布消息 - 或者在管道完成运行后执行任何操作:
- Use the
BlockingPipelineRunner
. This will run your pipeline synchronously. - 使用BlockingPipelineRunner。这将同步运行您的管道。
- Use the
DataflowPipelineRunner
. This will run your pipeline asynchronously. You can then poll the pipeline for its status, and wait for it to finish. - 使用DataflowPipelineRunner。这将异步运行您的管道。然后,您可以轮询管道的状态,并等待它完成。