I am extracting JSON data out of a BigQuery column using JSON_EXTRACT
. Now I want to extract lists of values and run aggregate functions (like AVG
) against them. Testing the JsonPath expression .objects[*].v
succeeds on http://jsonpath.curiousconcept.com/. But the query:
我正在使用JSON_EXTRACT从BigQuery列中提取JSON数据。现在我想提取值列表并对它们运行聚合函数(如AVG)。测试JsonPath表达式.objects [*]。v在http://jsonpath.curiousconcept.com/上成功。但查询:
SELECT
JSON_EXTRACT(json_column, "$.id") as id,
AVG(JSON_EXTRACT(json_column, "$.objects[*].v")) as average_value
FROM [tablename]
throws a JsonPath parse error on BigQuery. Is this possible on BigQuery? Or do I need to preprocess my data in order to run aggregate functions against data inside of my JSON?
在BigQuery上抛出JsonPath解析错误。这可能在BigQuery上吗?或者我是否需要预处理数据才能针对JSON内的数据运行聚合函数?
My data looks similar to this:
我的数据与此类似:
# Record 1
{
"id": "abc",
"objects": [
{
"id": 1,
"v": 1
},
{
"id": 2,
"v": 3
}
]
}
# Record 2
{
"id": "def",
"objects": [
{
"id": 1,
"v": 2
},
{
"id": 2,
"v": 5
}
]
}
This is related to another question.
这与另一个问题有关。
Update: The problem can be simplified by running two queries. First, run JSON_EXTRACT
and save the results into a view. Secondly, run the aggregate function against this view. But even then I need to correct the JsonPath expression $.objects[*].v
to prevent the JSONPath parse error
.
更新:运行两个查询可以简化问题。首先,运行JSON_EXTRACT并将结果保存到视图中。其次,针对此视图运行聚合函数。但即便如此,我还需要更正JsonPath表达式$ .objects [*]。v以防止JSONPath解析错误。
1 个解决方案
#1
6
Leverage SPLIT() to pivot repeatable fields into separate rows. Also might be easier/cleaner to put this into a subquery and put AVG outside:
利用SPLIT()将可重复的字段转换为单独的行。也可能更容易/更清楚地将它放入子查询并将AVG放在外面:
SELECT id, AVG(v) as average
FROM (
SELECT
JSON_EXTRACT(json_column, "$.id") as id,
INTEGER(
REGEXP_EXTRACT(
SPLIT(
JSON_EXTRACT(json_column, "$.objects")
,"},{"
)
,r'\"v\"\:([^,]+),')) as v FROM [mytable]
)
GROUP BY id;
#1
6
Leverage SPLIT() to pivot repeatable fields into separate rows. Also might be easier/cleaner to put this into a subquery and put AVG outside:
利用SPLIT()将可重复的字段转换为单独的行。也可能更容易/更清楚地将它放入子查询并将AVG放在外面:
SELECT id, AVG(v) as average
FROM (
SELECT
JSON_EXTRACT(json_column, "$.id") as id,
INTEGER(
REGEXP_EXTRACT(
SPLIT(
JSON_EXTRACT(json_column, "$.objects")
,"},{"
)
,r'\"v\"\:([^,]+),')) as v FROM [mytable]
)
GROUP BY id;