BigQuery - 记录字段中的灵活模式

时间:2022-05-03 15:39:58

I have a schema for BigQuery in which the Record field is JSON-like, however, the keys in the JSON are dynamic i.e. new keys might emerge with new data and it is hard to know how many keys in total there are. According to my understanding, it is not possible to use BigQuery for such a table since the schema of the record field type needs to be explicitly defined or else it will throw an error.

我有一个BigQuery模式,其中Record字段与JSON类似,但是,JSON中的键是动态的,即新键可能会出现新数据,并且很难知道总共有多少键。根据我的理解,不可能将BigQuery用于这样的表,因为需要显式定义记录字段类型的模式,否则它将引发错误。

The only other alternative is to use JSON_EXTRACT function while querying the data which would parse through the JSON (text) field. Is there any other way we can have dynamic nested schemas in a table in BigQuery?

唯一的另一种选择是在查询将通过JSON(文本)字段解析的数据时使用JSON_EXTRACT函数。有没有其他方法可以在BigQuery的表中使用动态嵌套模式?

1 个解决方案

#1


7  

A fixed schema can be created for common fields, and you can set them as nullable. And a column as type string can be used to store the rest of the JSON and use the JSON Functions to query for data.

可以为公共字段创建固定模式,您可以将它们设置为可空。并且作为类型字符串的列可用于存储JSON的其余部分,并使用JSON函数来查询数据。

We all the time have a meta column in our table, which holds additional raw unstructured data as a JSON object.

我们一直在表中有一个元列,它将其他原始非结构化数据保存为JSON对象。

Please note that currently you can store up to 2 Megabytes in a string column, which is decent for a JSON document.

请注意,目前您可以在字符串列中存储最多2兆字节,这对于JSON文档来说是不错的。

To make it easier to deal with the data, you can create views from your queries that use JSON_EXTRACT, and reference the view table in some other more simpler query.

为了更容易处理数据,您可以从使用JSON_EXTRACT的查询创建视图,并在其他更简单的查询中引用视图表。

Also at the streaming insert phase, your app could denormalize the JSON into proper tables.

此外,在流式插入阶段,您的应用可以将JSON非规范化为正确的表。

#1


7  

A fixed schema can be created for common fields, and you can set them as nullable. And a column as type string can be used to store the rest of the JSON and use the JSON Functions to query for data.

可以为公共字段创建固定模式,您可以将它们设置为可空。并且作为类型字符串的列可用于存储JSON的其余部分,并使用JSON函数来查询数据。

We all the time have a meta column in our table, which holds additional raw unstructured data as a JSON object.

我们一直在表中有一个元列,它将其他原始非结构化数据保存为JSON对象。

Please note that currently you can store up to 2 Megabytes in a string column, which is decent for a JSON document.

请注意,目前您可以在字符串列中存储最多2兆字节,这对于JSON文档来说是不错的。

To make it easier to deal with the data, you can create views from your queries that use JSON_EXTRACT, and reference the view table in some other more simpler query.

为了更容易处理数据,您可以从使用JSON_EXTRACT的查询创建视图,并在其他更简单的查询中引用视图表。

Also at the streaming insert phase, your app could denormalize the JSON into proper tables.

此外,在流式插入阶段,您的应用可以将JSON非规范化为正确的表。