How do I output to multiple files from PCollection<KV<String, String>>
?
如何从PCollection
The key in each entry is the file name. The groupByKey transformation gives me PCollection<KV<String, Iterable<String>>>
, but how I can write them to multiple files?
每个条目中的键是文件名。 groupByKey转换给了我PCollection
For example, given the following input
例如,给出以下输入
<file1, value1>
<file2, value2>
<file1, value3>
I'd like to output two files
我想输出两个文件
file1:
value1
value3
file2:
value2
1 个解决方案
#1
2
Dataflow currently does not have a transform that can do this for you. As a work-around, you can do this using a simple DoFn
that will extract the filename from the KV
, open the file using IOChannelFactory
, and write the Iterable<String>
to it.
数据流当前没有可以为您执行此操作的转换。作为解决方法,您可以使用简单的DoFn来完成此操作,该DoFn将从KV中提取文件名,使用IOChannelFactory打开文件,并将Iterable
See similar question and another one.
看到类似的问题和另一个问题。
We have plans to address this https://issues.apache.org/jira/browse/BEAM-92, but no concrete timeline yet.
我们计划解决这个https://issues.apache.org/jira/browse/BEAM-92,但还没有具体的时间表。
#1
2
Dataflow currently does not have a transform that can do this for you. As a work-around, you can do this using a simple DoFn
that will extract the filename from the KV
, open the file using IOChannelFactory
, and write the Iterable<String>
to it.
数据流当前没有可以为您执行此操作的转换。作为解决方法,您可以使用简单的DoFn来完成此操作,该DoFn将从KV中提取文件名,使用IOChannelFactory打开文件,并将Iterable
See similar question and another one.
看到类似的问题和另一个问题。
We have plans to address this https://issues.apache.org/jira/browse/BEAM-92, but no concrete timeline yet.
我们计划解决这个https://issues.apache.org/jira/browse/BEAM-92,但还没有具体的时间表。