Bigquery：搜索多个表并使用first_seen和last_seen进行聚合

I have a Bigquery database with multiple tables:

我有一个包含多个表的Bigquery数据库：

table1
    id,timestamp,data
    1,1428969600,AAAAA
    2,1428969600,CCCCC
    [..]
    20,1428969600,ZZZZZ

table2
    id,timestamp,data
    1,1429056000,AAAAA
    2,1429056000,BBBBB
    3,1429056000,CCCCC
    [..]
    20,1429056000,ZZZZZ

table3
    id,timestamp,data
    1,1429142400,AAAAA
    2,1429142400,BBBBB
    3,1429142400,CCCCC
    [..]
    20,1429142400,ZZZZZ

I want to run a search over all the tables (table1, table2 and table3) to see when the value in the field "data" first and last appeared and take the associated field "timestamp".

我想对所有表（table1，table2和table3）进行搜索，以查看“data”字段中的值是第一个和最后一个出现的时间，并取相关字段“timestamp”。

This should be the result:

这应该是结果：

id,timestamp_first, timestamp_last,data
1,1428969600,1429142400,AAAAA
2,1429056000,1429142400,BBBBB
3,1428969600,1429142400,CCCCC
[..]
20,1428969600,1429142400,ZZZZZ

Can someone give me some tips how I can make a search like this?

有人可以给我一些提示如何进行这样的搜索吗？

Martin

马丁

1 个解决方案

#1

I would first union the tables (in BigQuery the syntax for union is comma). Then there are two approaches:

我首先将表联合起来（在BigQuery中，union的语法是逗号）。然后有两种方法：

Use analytic functions FIRST_VALUE and LAST_VALUE.
使用分析函数FIRST_VALUE和LAST_VALUE。

SELECT id, timestamp_first, timestamp_last, data FROM
(SELECT 
  id,
  timestamp,
  FIRST_VALUE(timestamp) OVER(
    PARTITION BY id
    ORDER BY timestamp ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
  AS timestamp_first,
  LAST_VALUE(timestamp) OVER(
    PARTITION BY id
    ORDER BY timestamp ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
  AS timestamp_last
FROM table1, table2, table3

Use aggregation MIN/MAX on timestamp to find first/last and then join back to the same tables.
在时间戳上使用聚合MIN / MAX查找第一个/最后一个，然后再连接回相同的表。

SELECT a.id id, timestamp_first, timestamp_last, data FROM
(SELECT id, data FROM table1,table2,table3) a
INNER JOIN
(SELECT 
   id, 
   MIN(timestamp) timestamp_first,
   MAX(timestamp) timestamp_last 
 FROM table1,table2,table3 GROUP BY id) b
ON a.id = b.id

#1

I would first union the tables (in BigQuery the syntax for union is comma). Then there are two approaches:

我首先将表联合起来（在BigQuery中，union的语法是逗号）。然后有两种方法：

Use analytic functions FIRST_VALUE and LAST_VALUE.
使用分析函数FIRST_VALUE和LAST_VALUE。

SELECT id, timestamp_first, timestamp_last, data FROM
(SELECT 
  id,
  timestamp,
  FIRST_VALUE(timestamp) OVER(
    PARTITION BY id
    ORDER BY timestamp ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
  AS timestamp_first,
  LAST_VALUE(timestamp) OVER(
    PARTITION BY id
    ORDER BY timestamp ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
  AS timestamp_last
FROM table1, table2, table3

Use aggregation MIN/MAX on timestamp to find first/last and then join back to the same tables.
在时间戳上使用聚合MIN / MAX查找第一个/最后一个，然后再连接回相同的表。

SELECT a.id id, timestamp_first, timestamp_last, data FROM
(SELECT id, data FROM table1,table2,table3) a
INNER JOIN
(SELECT 
   id, 
   MIN(timestamp) timestamp_first,
   MAX(timestamp) timestamp_last 
 FROM table1,table2,table3 GROUP BY id) b
ON a.id = b.id

秒客网

Bigquery：搜索多个表并使用first_seen和last_seen进行聚合

1 个解决方案

#1

#1

相关文章