I want to skip leading rows when reading files while using google dataflow. Is that feature available in the lastest version? The files are kept in google storage. I will be writing these files to big query.
我想在使用谷歌数据流时阅读文件时跳过前导行。最新版本中是否提供该功能?这些文件保存在谷歌存储中。我将把这些文件写入大查询。
bq load command has option --skip_leading_rows . This option skips the leading rows when reading from the files.
bq load命令有选项--skip_leading_rows。从文件读取时,此选项会跳过前导行。
I want a similar feature to this in google dataflow. My input is in following format.
我想在google dataflow中使用类似的功能。我的输入采用以下格式。
I want google dataflow to ignore the first line and write only the rest of the lines to big Query
我希望google dataflow忽略第一行,只将其余行写入大查询
1 个解决方案
#1
2
This feature is not supported directly in Dataflow/ParDo's.
Dataflow / ParDo中不直接支持此功能。
You need to use a Filter.byPredicate()
to achieve this.
您需要使用Filter.byPredicate()来实现此目的。
e.g.
例如
PCollection<X> rows = ...;
PCollection<X> nonHeaders =
rows.apply(Filter.by(new MatchIfNonHeader()));
#1
2
This feature is not supported directly in Dataflow/ParDo's.
Dataflow / ParDo中不直接支持此功能。
You need to use a Filter.byPredicate()
to achieve this.
您需要使用Filter.byPredicate()来实现此目的。
e.g.
例如
PCollection<X> rows = ...;
PCollection<X> nonHeaders =
rows.apply(Filter.by(new MatchIfNonHeader()));