I'm after an efficient(ish) BigQuery SQL query to address the following:
我正在寻找一个有效的(ish)BigQuery SQL查询来解决以下问题:
I've got a table that looks like so:
我有一张看起来像这样的桌子:
Row | Col_A | Col_B |
---------------------
1 | 2 | 3 |
2 | 1 | 4 |
3 | 5 | 7 |
4 | 2 | 3 |
5 | 6 | 1 |
...and so on (>million rows)
The value of each column is an ID with the range [1..7].
每列的值是一个范围为[1..7]的ID。
The query should produce the following i.e. sum every code for each column:
查询应该产生以下内容,即对每列的每个代码求和:
Code | Total Col_A | Total Col_B
--------------------------------
1 | 1 | 0
2 | 2 | 0
3 | 0 | 2
4 | 0 | 1
5 | 1 | 0
6 | 1 | 0
7 | 0 | 1
Anyone know of a way of doing this in BigQuery without using multiple SELECTs?
有人知道在不使用多个SELECT的情况下在BigQuery中执行此操作的方法吗?
Cheers.
2 个解决方案
#1
2
Can you create a public dataset with your sample data? It will make much easier writing a query that works on your data, and validating the results.
您可以使用样本数据创建公共数据集吗?编写适用于您的数据的查询并验证结果会更容易。
A starting query:
一个起始查询:
SELECT Code, COUNT(Col_A) count_column_x, COUNT(Col_B) count_column_y
FROM [your:list.of_codes] a
LEFT JOIN EACH [your:sample.table] b
ON a.Code=b.Col_A
GROUP BY 1
(it's not perfect, will go further if you share a table to work with)
(它不完美,如果你共用一张桌子可以更进一步)
#2
1
Anyone know of a way of doing this in BigQuery without using multiple SELECTs?
有人知道在不使用多个SELECT的情况下在BigQuery中执行此操作的方法吗?
One SELECT with Standard SQL
一个标准SQL的SELECT
#standardSQL
WITH logs AS (
SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
SELECT 1 AS Col_A, 4 AS Col_B UNION ALL
SELECT 5 AS Col_A, 7 AS Col_B UNION ALL
SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
SELECT 6 AS Col_A, 1 AS Col_B
)
SELECT
id,
SUM(CAST(id = Col_A AS INT64)) AS Total_Col_A,
SUM(CAST(id = Col_B AS INT64)) AS Total_Col_B
FROM logs, UNNEST(GENERATE_ARRAY(1,7)) AS id
GROUP BY id
ORDER BY id
Or with COUNTIF()
或者使用COUNTIF()
#standardSQL
WITH logs AS (
SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
SELECT 1 AS Col_A, 4 AS Col_B UNION ALL
SELECT 5 AS Col_A, 7 AS Col_B UNION ALL
SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
SELECT 6 AS Col_A, 1 AS Col_B
)
SELECT
id,
COUNTIF(id = Col_A) AS Total_Col_A,
COUNTIF(id = Col_B) AS Total_Col_B
FROM logs, UNNEST(GENERATE_ARRAY(1,7)) AS id
GROUP BY id
ORDER BY id
#1
2
Can you create a public dataset with your sample data? It will make much easier writing a query that works on your data, and validating the results.
您可以使用样本数据创建公共数据集吗?编写适用于您的数据的查询并验证结果会更容易。
A starting query:
一个起始查询:
SELECT Code, COUNT(Col_A) count_column_x, COUNT(Col_B) count_column_y
FROM [your:list.of_codes] a
LEFT JOIN EACH [your:sample.table] b
ON a.Code=b.Col_A
GROUP BY 1
(it's not perfect, will go further if you share a table to work with)
(它不完美,如果你共用一张桌子可以更进一步)
#2
1
Anyone know of a way of doing this in BigQuery without using multiple SELECTs?
有人知道在不使用多个SELECT的情况下在BigQuery中执行此操作的方法吗?
One SELECT with Standard SQL
一个标准SQL的SELECT
#standardSQL
WITH logs AS (
SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
SELECT 1 AS Col_A, 4 AS Col_B UNION ALL
SELECT 5 AS Col_A, 7 AS Col_B UNION ALL
SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
SELECT 6 AS Col_A, 1 AS Col_B
)
SELECT
id,
SUM(CAST(id = Col_A AS INT64)) AS Total_Col_A,
SUM(CAST(id = Col_B AS INT64)) AS Total_Col_B
FROM logs, UNNEST(GENERATE_ARRAY(1,7)) AS id
GROUP BY id
ORDER BY id
Or with COUNTIF()
或者使用COUNTIF()
#standardSQL
WITH logs AS (
SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
SELECT 1 AS Col_A, 4 AS Col_B UNION ALL
SELECT 5 AS Col_A, 7 AS Col_B UNION ALL
SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
SELECT 6 AS Col_A, 1 AS Col_B
)
SELECT
id,
COUNTIF(id = Col_A) AS Total_Col_A,
COUNTIF(id = Col_B) AS Total_Col_B
FROM logs, UNNEST(GENERATE_ARRAY(1,7)) AS id
GROUP BY id
ORDER BY id