导入JSON时BigQuery处理缺少的字段和未知/额外字段

时间:2021-09-30 13:52:16

The schema for my BigQuery table looks like:

我的BigQuery表的架构如下所示:

A:STRING,B:STRING,C:STRING,D:STRING,E:STRING,F:STRING,createdAt:INTEGER,updatedAt:INTEGER,I:STRING

The file (JSON) in cloud storage has a single item and it looks like:

云存储中的文件(JSON)只有一个项目,它看起来像:

{
    "A": "AAA",
    "B": "BBB",
    "E": "EEEEE",
    "F": "FFFFF",
    "createdAt": 1364226047214,
    "XXXX": "XXXXXXXXXXX",
    "I": "IIIIII",
    "YYYY": {
        "Y1": 1.99,
        "Y2": "YYYYYY"
    },
    "location": {
        "city": "Amherst",
        "region": "NS",
        "country": "CA"
    },
    "purchaseDate": 1364225968000,
    "updatedAt": 1364226052634
}

I get the following error:

我收到以下错误:

Errors:
Line:1 / Column:173, The field "createdAt" was not found on the current message.
Too many errors encountered. Limit is: 0.

Two questions related to the error above:

与上述错误相关的两个问题:

  1. How does BigQuery deal with missing fields in data? Aren't all fields by default nullable?

    BigQuery如何处理数据中缺少的字段?默认情况下,并非所有字段都可以为空吗?

  2. How does BigQuery deal with unknown/extra fields in data?

    BigQuery如何处理数据中的未知/额外字段?

2 个解决方案

#1


4  

I have just tested your schema/data using the webUI and got the following error:

我刚刚使用webUI测试了您的架构/数据并得到以下错误:

Line:1 / Column:84, The field "XXXX" was not found on the current message.
Too many errors encountered. Limit is: 0.
  1. Yes, fields are nullable by default. You need to explicitly define the field mode (nullable, required or repeated) if you want to change it. If a required field is missing in your JSON data you will get an error but if this same field is nullable it will work.

    是的,默认情况下字段可以为空。如果要更改字段模式,则需要明确定义字段模式(可空,必需或重复)。如果JSON数据中缺少必填字段,则会出现错误,但如果该字段可以为空,则可以使用。

  2. As you can see with the error on the "XXXX" field it will fail if your data contains extra fields. Your data must match the table schema as described in the documentation, and you can't modify the table schema which is immutable (you can find some informations here if you need to add fields by using another table)

    正如您在“XXXX”字段中看到的错误一样,如果您的数据包含额外字段,则会失败。您的数据必须与文档中描述的表模式匹配,并且您无法修改不可变的表模式(如果需要使用另一个表添加字段,可以在此处找到一些信息)

  3. Another thing that could be useful for people dealing with JSON data is that each data row must be contained in a single line in your file (like in google example file). If your json data is in a pretty formated form, data import will fail.

    对于处理JSON数据的人来说,另一件可能有用的事情是每个数据行必须包含在文件中的一行中(例如在google示例文件中)。如果您的json数据处于非常合理的格式,则数据导入将失败。

Hope this help

希望这有帮助

#2


0  

I have had this problem this week, during last days I have been taking a look at the code and the problem was that that the BQ TableRow did not have all the elements that it was defined on the Big Query TableSchema.

本周我遇到了这个问题,在过去几天我一直在看代码,问题是BQ TableRow没有在Big Query TableSchema上定义的所有元素。

Please re-check all the parameters you're adding on the TableRow and that this one is represented correctly on the TableSchema.

请重新检查您在TableRow上添加的所有参数,并在TableSchema上正确表示此参数。

Good luck!

祝你好运!

#1


4  

I have just tested your schema/data using the webUI and got the following error:

我刚刚使用webUI测试了您的架构/数据并得到以下错误:

Line:1 / Column:84, The field "XXXX" was not found on the current message.
Too many errors encountered. Limit is: 0.
  1. Yes, fields are nullable by default. You need to explicitly define the field mode (nullable, required or repeated) if you want to change it. If a required field is missing in your JSON data you will get an error but if this same field is nullable it will work.

    是的,默认情况下字段可以为空。如果要更改字段模式,则需要明确定义字段模式(可空,必需或重复)。如果JSON数据中缺少必填字段,则会出现错误,但如果该字段可以为空,则可以使用。

  2. As you can see with the error on the "XXXX" field it will fail if your data contains extra fields. Your data must match the table schema as described in the documentation, and you can't modify the table schema which is immutable (you can find some informations here if you need to add fields by using another table)

    正如您在“XXXX”字段中看到的错误一样,如果您的数据包含额外字段,则会失败。您的数据必须与文档中描述的表模式匹配,并且您无法修改不可变的表模式(如果需要使用另一个表添加字段,可以在此处找到一些信息)

  3. Another thing that could be useful for people dealing with JSON data is that each data row must be contained in a single line in your file (like in google example file). If your json data is in a pretty formated form, data import will fail.

    对于处理JSON数据的人来说,另一件可能有用的事情是每个数据行必须包含在文件中的一行中(例如在google示例文件中)。如果您的json数据处于非常合理的格式,则数据导入将失败。

Hope this help

希望这有帮助

#2


0  

I have had this problem this week, during last days I have been taking a look at the code and the problem was that that the BQ TableRow did not have all the elements that it was defined on the Big Query TableSchema.

本周我遇到了这个问题,在过去几天我一直在看代码,问题是BQ TableRow没有在Big Query TableSchema上定义的所有元素。

Please re-check all the parameters you're adding on the TableRow and that this one is represented correctly on the TableSchema.

请重新检查您在TableRow上添加的所有参数,并在TableSchema上正确表示此参数。

Good luck!

祝你好运!