I have a pipeline that is loading a CSV file from GCS into BQ. The details are here: Import CSV file from GCS to BigQuery.
我有一个管道,正在从GCS加载CSV文件到BQ。详细信息如下:将CSV文件从GCS导入BigQuery。
I'm splitting the CSV in a ParDo into a TableRow where some of the fields are empty.
我将ParDo中的CSV拆分为TableRow,其中一些字段为空。
String inputLine = c.element();
String[] split = inputLine.split(',');
TableRow output = new TableRow();
output.set("Event_Time", split[0]);
output.set("Name", split[1]);
...
c.output(output);
My question is, how can I have the empty fields show up as a null in BigQuery? Currently they are coming through as empty fields.
我的问题是,如何在BigQuery中将空字段显示为null?目前他们正在作为空地来到。
1 个解决方案
#1
2
It's turning up in BigQuery as an empty String because when you use split()
, it will return an empty String for ,,
and not null
in the Array.
它在BigQuery中作为一个空字符串出现,因为当你使用split()时,它将返回一个空字符串,而不是在数组中返回null。
Two options:
两种选择:
- Check for empty String in your result array and don't set the field in
output
. - 检查结果数组中的空字符串,不要在输出中设置字段。
- Check for empty String in your result array and explicitly set
null
for the field inoutput
. - 检查结果数组中的空字符串,并为输出中的字段显式设置null。
Either way will result in null
for BigQuery.
无论哪种方式都会导致BigQuery为null。
Note: be careful splitting Strings in Java like this this. split()
will remove leading and trailing empties. Use split("," -1)
instead. See here.
注意:小心分割Java中的字符串就像这样。 split()将删除前导和尾随空。请改用split(“,” - 1)。看这里。
BTW: unless you're doing some complex/advanced transformations in Dataflow, you don't have to use a pipeline to load in your CSV files. You could just load it or read it directly from GCS.
顺便说一句:除非您在Dataflow中进行一些复杂/高级转换,否则您不必使用管道加载CSV文件。您可以直接加载它或直接从GCS读取它。
#1
2
It's turning up in BigQuery as an empty String because when you use split()
, it will return an empty String for ,,
and not null
in the Array.
它在BigQuery中作为一个空字符串出现,因为当你使用split()时,它将返回一个空字符串,而不是在数组中返回null。
Two options:
两种选择:
- Check for empty String in your result array and don't set the field in
output
. - 检查结果数组中的空字符串,不要在输出中设置字段。
- Check for empty String in your result array and explicitly set
null
for the field inoutput
. - 检查结果数组中的空字符串,并为输出中的字段显式设置null。
Either way will result in null
for BigQuery.
无论哪种方式都会导致BigQuery为null。
Note: be careful splitting Strings in Java like this this. split()
will remove leading and trailing empties. Use split("," -1)
instead. See here.
注意:小心分割Java中的字符串就像这样。 split()将删除前导和尾随空。请改用split(“,” - 1)。看这里。
BTW: unless you're doing some complex/advanced transformations in Dataflow, you don't have to use a pipeline to load in your CSV files. You could just load it or read it directly from GCS.
顺便说一句:除非您在Dataflow中进行一些复杂/高级转换,否则您不必使用管道加载CSV文件。您可以直接加载它或直接从GCS读取它。