在processElement()中选择元素 - Apache Beam

时间:2022-11-06 15:40:07

I know that when we implement a ParDo transform, we pick up individual elements from our data(basically separated by "\n"). But what if I have an element that occupies two lines in my file. Can I apply my own condition to pick elements according to it? Or is it always necessary to have an element in a single line?

我知道当我们实现ParDo变换时,我们从数据中获取单个元素(基本上用“\ n”分隔)。但是,如果我的元素在我的文件中占用两行,该怎么办?我可以根据自己的条件选择元素吗?或者总是需要在一行中包含一个元素?

1 个解决方案

#1


1  

Reading of text files is controlled by TextIO, not by ParDo - I suppose that's what you meant. Indeed right now TextIO splits files into 1 element per line, however there is work in progress on changing that. You can follow the work at https://issues.apache.org/jira/browse/BEAM-2802.

阅读文本文件由TextIO控制,而不是由ParDo控制 - 我想这就是你的意思。事实上,现在TextIO将文件分成每行1个元素,但是正在进行更改。您可以访问https://issues.apache.org/jira/browse/BEAM-2802。

It would be useful for that work, if you told more about your file format, to make sure it is in scope.

如果您对文件格式有更多了解,那么这项工作将非常有用,以确保它在范围内。

#1


1  

Reading of text files is controlled by TextIO, not by ParDo - I suppose that's what you meant. Indeed right now TextIO splits files into 1 element per line, however there is work in progress on changing that. You can follow the work at https://issues.apache.org/jira/browse/BEAM-2802.

阅读文本文件由TextIO控制,而不是由ParDo控制 - 我想这就是你的意思。事实上,现在TextIO将文件分成每行1个元素,但是正在进行更改。您可以访问https://issues.apache.org/jira/browse/BEAM-2802。

It would be useful for that work, if you told more about your file format, to make sure it is in scope.

如果您对文件格式有更多了解,那么这项工作将非常有用,以确保它在范围内。