CSV到BQ:空字段而不是空值

时间:2022-05-24 11:49:47

I have a pipeline that is loading a CSV file from GCS into BQ. The details are here: Import CSV file from GCS to BigQuery.

我有一个管道,正在从GCS加载CSV文件到BQ。详细信息如下:将CSV文件从GCS导入BigQuery。

I'm splitting the CSV in a ParDo into a TableRow where some of the fields are empty.

我将ParDo中的CSV拆分为TableRow,其中一些字段为空。

String inputLine = c.element();

String[] split = inputLine.split(',');

TableRow output = new TableRow();
output.set("Event_Time", split[0]);
output.set("Name", split[1]);
...
c.output(output);

My question is, how can I have the empty fields show up as a null in BigQuery? Currently they are coming through as empty fields.

我的问题是,如何在BigQuery中将空字段显示为null?目前他们正在作为空地来到。

1 个解决方案

#1


2  

It's turning up in BigQuery as an empty String because when you use split(), it will return an empty String for ,, and not null in the Array.

它在BigQuery中作为一个空字符串出现,因为当你使用split()时,它将返回一个空字符串,而不是在数组中返回null。

Two options:

两种选择:

  1. Check for empty String in your result array and don't set the field in output.
  2. 检查结果数组中的空字符串,不要在输出中设置字段。
  3. Check for empty String in your result array and explicitly set null for the field in output.
  4. 检查结果数组中的空字符串,并为输出中的字段显式设置null。

Either way will result in null for BigQuery.

无论哪种方式都会导致BigQuery为null。

Note: be careful splitting Strings in Java like this this. split() will remove leading and trailing empties. Use split("," -1) instead. See here.

注意:小心分割Java中的字符串就像这样。 split()将删除前导和尾随空。请改用split(“,” - 1)。看这里。

BTW: unless you're doing some complex/advanced transformations in Dataflow, you don't have to use a pipeline to load in your CSV files. You could just load it or read it directly from GCS.

顺便说一句:除非您在Dataflow中进行一些复杂/高级转换,否则您不必使用管道加载CSV文件。您可以直接加载它或直接从GCS读取它。

#1


2  

It's turning up in BigQuery as an empty String because when you use split(), it will return an empty String for ,, and not null in the Array.

它在BigQuery中作为一个空字符串出现,因为当你使用split()时,它将返回一个空字符串,而不是在数组中返回null。

Two options:

两种选择:

  1. Check for empty String in your result array and don't set the field in output.
  2. 检查结果数组中的空字符串,不要在输出中设置字段。
  3. Check for empty String in your result array and explicitly set null for the field in output.
  4. 检查结果数组中的空字符串,并为输出中的字段显式设置null。

Either way will result in null for BigQuery.

无论哪种方式都会导致BigQuery为null。

Note: be careful splitting Strings in Java like this this. split() will remove leading and trailing empties. Use split("," -1) instead. See here.

注意:小心分割Java中的字符串就像这样。 split()将删除前导和尾随空。请改用split(“,” - 1)。看这里。

BTW: unless you're doing some complex/advanced transformations in Dataflow, you don't have to use a pipeline to load in your CSV files. You could just load it or read it directly from GCS.

顺便说一句:除非您在Dataflow中进行一些复杂/高级转换,否则您不必使用管道加载CSV文件。您可以直接加载它或直接从GCS读取它。