Apache Beam:在已构建的管道中跳过步骤

时间:2021-03-25 15:35:12

Is there a way to conditionally skip steps in an already-constructed pipeline? Or is pipeline construction designed to be the only way to control which steps are run?

有没有办法有条件地跳过已经构建的管道中的步骤?或者管道结构是否是控制哪些步骤运行的唯一方法?

1 个解决方案

#1


1  

Normally, pipeline construction controls what transformations in a pipeline will be executed.

通常,管道构造控制将执行管道中的哪些转换。

You can, however, imagine a single input, multiple output ParDo that multiplexes the input PCollection to one of the output PCollections. Then, by choosing which output to pass your data to, you can dynamically control which steps are executed -- steps without any input might not be executed and/or their execution might not matter.

但是,您可以想象一个输入,多输出ParDo,它将输入PCollection多路复用到其中一个输出PCollections。然后,通过选择将数据传递到哪个输出,您可以动态控制执行哪些步骤 - 没有任何输入的步骤可能无法执行和/或它们的执行可能无关紧要。

A related feature is "parameterized pipelines" or "template pipelines". This is something we are very interested in and are actively working on.

相关功能是“参数化管道”或“模板管道”。这是我们非常感兴趣并且正在积极努力的事情。

#1


1  

Normally, pipeline construction controls what transformations in a pipeline will be executed.

通常,管道构造控制将执行管道中的哪些转换。

You can, however, imagine a single input, multiple output ParDo that multiplexes the input PCollection to one of the output PCollections. Then, by choosing which output to pass your data to, you can dynamically control which steps are executed -- steps without any input might not be executed and/or their execution might not matter.

但是,您可以想象一个输入,多输出ParDo,它将输入PCollection多路复用到其中一个输出PCollections。然后,通过选择将数据传递到哪个输出,您可以动态控制执行哪些步骤 - 没有任何输入的步骤可能无法执行和/或它们的执行可能无关紧要。

A related feature is "parameterized pipelines" or "template pipelines". This is something we are very interested in and are actively working on.

相关功能是“参数化管道”或“模板管道”。这是我们非常感兴趣并且正在积极努力的事情。