无法通过在Apache Beam中创建模板以所需顺序运行多个管道

时间:2021-02-07 15:38:55

I have two separate Pipelines say 'P1' and 'P2'. As per my requirement I need to run P2 only after P1 has completely finished its execution. I need to get this entire operation done through a single Template.

我有两个单独的管道说'P1'和'P2'。根据我的要求,我需要在P1完全执行后才运行P2。我需要通过单个模板完成整个操作。

Basically Template gets created the moment it finds run() its way say p1.run().

基本上模板会在它找到run()的时候创建它的方式说p1.run()。

So what I can see that I need to handle two different Pipelines using two different templates but that would not satisfy my strict order based Pipeline execution requirement.

所以我可以看到我需要使用两个不同的模板处理两个不同的管道,但这不符合我严格的基于订单的管道执行要求。

Another way I could think of calling p1.run() inside the ParDo of p2.run() and keep the run() of p2 wait until finish of run() of p1. I tried this way but stuck at IllegalArgumentException given below.

我可以想到在p2.run()的ParDo中调用p1.run()并保持p2的run()等到p1的run()结束的另一种方式。我试过这种方式,但坚持下面给出的IllegalArgumentException。

java.io.NotSerializableException: PipelineOptions objects are not serializable and should not be embedded into transforms (did you capture a PipelineOptions object in a field or in an anonymous class?). Instead, if you're using a DoFn, access PipelineOptions at runtime via ProcessContext/StartBundleContext/FinishBundleContext.getPipelineOptions(), or pre-extract necessary fields from PipelineOptions at pipeline construction time.

java.io.NotSerializableException:PipelineOptions对象不可序列化,不应嵌入到转换中(您是在字段中还是在匿名类中捕获PipelineOptions对象?)。相反,如果您正在使用DoFn,请在运行时通过ProcessContext / StartBundleContext / FinishBundleContext.getPipelineOptions()访问PipelineOptions,或者在管道构建时从PipelineOptions中预提取必要的字段。

Is it not possible at all to call the run() of a pipeline inside any transform say 'Pardo' of another Pipeline?

难道根本不可能在任何转换中调用管道的run()说“另一个管道的'Pardo”吗?

If this is the case then how to satisfy my requirement of calling two different Pipelines in sequence by creating a single template?

如果是这种情况那么如何满足我通过创建单个模板按顺序调用两个不同的管道的要求?

1 个解决方案

#1


2  

A template can contain only a single pipeline. In order to sequence the execution of two separate pipelines each of which is a template, you'll need to schedule them externally, e.g. via some workflow management system (such as what Anuj mentioned, or Airflow, or something else - you might draw some inspiration from this post for example).

模板只能包含一个管道。为了对两个独立管道的执行进行排序,每个管道都是模板,您需要在外部安排它们,例如,通过一些工作流程管理系统(例如Anuj所提到的,或Airflow,或其他东西 - 你可以从这篇文章中汲取灵感)。

We are aware of the need for better sequencing primitives in Beam within a single pipeline, but do not have a concrete design yet.

我们意识到需要在单个管道中更好地对Beam中的基元进行排序,但还没有具体的设计。

#1


2  

A template can contain only a single pipeline. In order to sequence the execution of two separate pipelines each of which is a template, you'll need to schedule them externally, e.g. via some workflow management system (such as what Anuj mentioned, or Airflow, or something else - you might draw some inspiration from this post for example).

模板只能包含一个管道。为了对两个独立管道的执行进行排序,每个管道都是模板,您需要在外部安排它们,例如,通过一些工作流程管理系统(例如Anuj所提到的,或Airflow,或其他东西 - 你可以从这篇文章中汲取灵感)。

We are aware of the need for better sequencing primitives in Beam within a single pipeline, but do not have a concrete design yet.

我们意识到需要在单个管道中更好地对Beam中的基元进行排序,但还没有具体的设计。