如何将聚合函数应用于从Google BigQuery中的JSON中提取的数据?

时间:2020-12-22 15:52:20

I am extracting JSON data out of a BigQuery column using JSON_EXTRACT. Now I want to extract lists of values and run aggregate functions (like AVG) against them. Testing the JsonPath expression .objects[*].v succeeds on http://jsonpath.curiousconcept.com/. But the query:

我正在使用JSON_EXTRACT从BigQuery列中提取JSON数据。现在我想提取值列表并对它们运行聚合函数(如AVG)。测试JsonPath表达式.objects [*]。v在http://jsonpath.curiousconcept.com/上成功。但查询:

SELECT
  JSON_EXTRACT(json_column, "$.id") as id,
  AVG(JSON_EXTRACT(json_column, "$.objects[*].v")) as average_value
FROM [tablename]

throws a JsonPath parse error on BigQuery. Is this possible on BigQuery? Or do I need to preprocess my data in order to run aggregate functions against data inside of my JSON?

在BigQuery上抛出JsonPath解析错误。这可能在BigQuery上吗?或者我是否需要预处理数据才能针对JSON内的数据运行聚合函数?

My data looks similar to this:

我的数据与此类似:

# Record 1
{
  "id": "abc",
  "objects": [
    {
      "id": 1,
      "v": 1
    },
    {
      "id": 2,
      "v": 3
    }
  ]
}
# Record 2
{
  "id": "def",
  "objects": [
    {
      "id": 1,
      "v": 2
    },
    {
      "id": 2,
      "v": 5
    }
  ]
}

This is related to another question.

这与另一个问题有关。

Update: The problem can be simplified by running two queries. First, run JSON_EXTRACT and save the results into a view. Secondly, run the aggregate function against this view. But even then I need to correct the JsonPath expression $.objects[*].v to prevent the JSONPath parse error.

更新:运行两个查询可以简化问题。首先,运行JSON_EXTRACT并将结果保存到视图中。其次,针对此视图运行聚合函数。但即便如此,我还需要更正JsonPath表达式$ .objects [*]。v以防止JSONPath解析错误。

1 个解决方案

#1


6  

Leverage SPLIT() to pivot repeatable fields into separate rows. Also might be easier/cleaner to put this into a subquery and put AVG outside:

利用SPLIT()将可重复的字段转换为单独的行。也可能更容易/更清楚地将它放入子查询并将AVG放在外面:

SELECT id, AVG(v) as average 
FROM (
SELECT 
    JSON_EXTRACT(json_column, "$.id") as id, 
    INTEGER( 
      REGEXP_EXTRACT(
        SPLIT(
          JSON_EXTRACT(json_column, "$.objects")
          ,"},{"
          )
        ,r'\"v\"\:([^,]+),')) as v FROM [mytable] 
)
GROUP BY id;

#1


6  

Leverage SPLIT() to pivot repeatable fields into separate rows. Also might be easier/cleaner to put this into a subquery and put AVG outside:

利用SPLIT()将可重复的字段转换为单独的行。也可能更容易/更清楚地将它放入子查询并将AVG放在外面:

SELECT id, AVG(v) as average 
FROM (
SELECT 
    JSON_EXTRACT(json_column, "$.id") as id, 
    INTEGER( 
      REGEXP_EXTRACT(
        SPLIT(
          JSON_EXTRACT(json_column, "$.objects")
          ,"},{"
          )
        ,r'\"v\"\:([^,]+),')) as v FROM [mytable] 
)
GROUP BY id;