I have a Bigquery database with multiple tables:
我有一个包含多个表的Bigquery数据库:
table1
id,timestamp,data
1,1428969600,AAAAA
2,1428969600,CCCCC
[..]
20,1428969600,ZZZZZ
table2
id,timestamp,data
1,1429056000,AAAAA
2,1429056000,BBBBB
3,1429056000,CCCCC
[..]
20,1429056000,ZZZZZ
table3
id,timestamp,data
1,1429142400,AAAAA
2,1429142400,BBBBB
3,1429142400,CCCCC
[..]
20,1429142400,ZZZZZ
I want to run a search over all the tables (table1, table2 and table3) to see when the value in the field "data" first and last appeared and take the associated field "timestamp".
我想对所有表(table1,table2和table3)进行搜索,以查看“data”字段中的值是第一个和最后一个出现的时间,并取相关字段“timestamp”。
This should be the result:
这应该是结果:
id,timestamp_first, timestamp_last,data
1,1428969600,1429142400,AAAAA
2,1429056000,1429142400,BBBBB
3,1428969600,1429142400,CCCCC
[..]
20,1428969600,1429142400,ZZZZZ
Can someone give me some tips how I can make a search like this?
有人可以给我一些提示如何进行这样的搜索吗?
Martin
马丁
1 个解决方案
#1
5
I would first union the tables (in BigQuery the syntax for union is comma). Then there are two approaches:
我首先将表联合起来(在BigQuery中,union的语法是逗号)。然后有两种方法:
- Use analytic functions FIRST_VALUE and LAST_VALUE.
- 使用分析函数FIRST_VALUE和LAST_VALUE。
SELECT id, timestamp_first, timestamp_last, data FROM (SELECT id, timestamp, FIRST_VALUE(timestamp) OVER( PARTITION BY id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS timestamp_first, LAST_VALUE(timestamp) OVER( PARTITION BY id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS timestamp_last FROM table1, table2, table3
- Use aggregation MIN/MAX on timestamp to find first/last and then join back to the same tables.
- 在时间戳上使用聚合MIN / MAX查找第一个/最后一个,然后再连接回相同的表。
SELECT a.id id, timestamp_first, timestamp_last, data FROM (SELECT id, data FROM table1,table2,table3) a INNER JOIN (SELECT id, MIN(timestamp) timestamp_first, MAX(timestamp) timestamp_last FROM table1,table2,table3 GROUP BY id) b ON a.id = b.id
#1
5
I would first union the tables (in BigQuery the syntax for union is comma). Then there are two approaches:
我首先将表联合起来(在BigQuery中,union的语法是逗号)。然后有两种方法:
- Use analytic functions FIRST_VALUE and LAST_VALUE.
- 使用分析函数FIRST_VALUE和LAST_VALUE。
SELECT id, timestamp_first, timestamp_last, data FROM (SELECT id, timestamp, FIRST_VALUE(timestamp) OVER( PARTITION BY id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS timestamp_first, LAST_VALUE(timestamp) OVER( PARTITION BY id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS timestamp_last FROM table1, table2, table3
- Use aggregation MIN/MAX on timestamp to find first/last and then join back to the same tables.
- 在时间戳上使用聚合MIN / MAX查找第一个/最后一个,然后再连接回相同的表。
SELECT a.id id, timestamp_first, timestamp_last, data FROM (SELECT id, data FROM table1,table2,table3) a INNER JOIN (SELECT id, MIN(timestamp) timestamp_first, MAX(timestamp) timestamp_last FROM table1,table2,table3 GROUP BY id) b ON a.id = b.id