I know that when we implement a ParDo transform, we pick up individual elements from our data(basically separated by "\n"). But what if I have an element that occupies two lines in my file. Can I apply my own condition to pick elements according to it? Or is it always necessary to have an element in a single line?
我知道当我们实现ParDo变换时,我们从数据中获取单个元素(基本上用“\ n”分隔)。但是,如果我的元素在我的文件中占用两行,该怎么办?我可以根据自己的条件选择元素吗?或者总是需要在一行中包含一个元素?
1 个解决方案
#1
1
Reading of text files is controlled by TextIO
, not by ParDo
- I suppose that's what you meant. Indeed right now TextIO
splits files into 1 element per line, however there is work in progress on changing that. You can follow the work at https://issues.apache.org/jira/browse/BEAM-2802.
阅读文本文件由TextIO控制,而不是由ParDo控制 - 我想这就是你的意思。事实上,现在TextIO将文件分成每行1个元素,但是正在进行更改。您可以访问https://issues.apache.org/jira/browse/BEAM-2802。
It would be useful for that work, if you told more about your file format, to make sure it is in scope.
如果您对文件格式有更多了解,那么这项工作将非常有用,以确保它在范围内。
#1
1
Reading of text files is controlled by TextIO
, not by ParDo
- I suppose that's what you meant. Indeed right now TextIO
splits files into 1 element per line, however there is work in progress on changing that. You can follow the work at https://issues.apache.org/jira/browse/BEAM-2802.
阅读文本文件由TextIO控制,而不是由ParDo控制 - 我想这就是你的意思。事实上,现在TextIO将文件分成每行1个元素,但是正在进行更改。您可以访问https://issues.apache.org/jira/browse/BEAM-2802。
It would be useful for that work, if you told more about your file format, to make sure it is in scope.
如果您对文件格式有更多了解,那么这项工作将非常有用,以确保它在范围内。