I have a use case in which I need to output multiple T
from a DoFn. So DoFn
function is returning a PCollection<List<T>>
. I want to convert it to PCollection<T>
so that later in pipeline I can just filter like:
我有一个用例,我需要从DoFn输出多个T.所以DoFn函数返回一个PCollection
>。我想将它转换为PCollection
PCollection<T> filteredT = filterationResult.apply(Filter.byPredicate(p -> p.equals(T) == T));
Currently the best method I can think of is, instead returning List<T>
from the ParDo
function I return KV<String,List<T>>
with same key for every item. Then in pipeline I can do below to combine result:
目前我能想到的最好的方法是,从ParDo函数返回List
filterationResult.apply("Group", GroupByKey.<String, List<T>>create())
Or can I call c.output(T) from DoFn multiple times?
或者我可以多次从DoFn调用c.output(T)吗?
1 个解决方案
#1
3
You can call c.output(T)
from a DoFn
multiple times.
您可以多次从DoFn调用c.output(T)。
There is also a library transform Flatten.iterables()
but you don't need it in this case.
还有一个库变换Flatten.iterables()但在这种情况下你不需要它。
#1
3
You can call c.output(T)
from a DoFn
multiple times.
您可以多次从DoFn调用c.output(T)。
There is also a library transform Flatten.iterables()
but you don't need it in this case.
还有一个库变换Flatten.iterables()但在这种情况下你不需要它。