如何将PCollection >转换为dataflow / beam中的PCollection

时间:2022-06-10 15:22:21

I have a use case in which I need to output multiple T from a DoFn. So DoFn function is returning a PCollection<List<T>>. I want to convert it to PCollection<T> so that later in pipeline I can just filter like:

我有一个用例,我需要从DoFn输出多个T.所以DoFn函数返回一个PCollection >。我想将它转换为PCollection ,以便稍后在管道中我可以像以下一样过滤:

PCollection<T> filteredT = filterationResult.apply(Filter.byPredicate(p -> p.equals(T) == T));

Currently the best method I can think of is, instead returning List<T> from the ParDo function I return KV<String,List<T>> with same key for every item. Then in pipeline I can do below to combine result:

目前我能想到的最好的方法是,从ParDo函数返回List ,我为每个项返回KV >和相同的键。然后在管道中我可以在下面结合结果: ,list>

filterationResult.apply("Group", GroupByKey.<String, List<T>>create())

Or can I call c.output(T) from DoFn multiple times?

或者我可以多次从DoFn调用c.output(T)吗?

1 个解决方案

#1


3  

You can call c.output(T) from a DoFn multiple times.

您可以多次从DoFn调用c.output(T)。

There is also a library transform Flatten.iterables() but you don't need it in this case.

还有一个库变换Flatten.iterables()但在这种情况下你不需要它。

#1


3  

You can call c.output(T) from a DoFn multiple times.

您可以多次从DoFn调用c.output(T)。

There is also a library transform Flatten.iterables() but you don't need it in this case.

还有一个库变换Flatten.iterables()但在这种情况下你不需要它。