SQL Bigquery将时间戳转换为5分钟的间隔

时间:2022-09-26 15:33:15

In Google bigquery database, Convert minute level timestamp to an interval of 5 minutes. The 5 minutes interval is standard time interval . Below is just an example how i want the data to be presented

在Google bigquery数据库中,将分钟级别时间戳转换为5分钟的间隔。 5分钟的间隔是标准时间间隔。以下是我希望如何呈现数据的示例

test hd_count 2013-12-20 10:40:30 1 2013-12-20 10:41:30 3 2013-12-20 10:42:30 2 2013-12-20 10:43:30 1 2013-12-20 10:44:30 1

test hd_count 2013-12-20 10:40:30 1 2013-12-20 10:41:30 3 2013-12-20 10:42:30 2 2013-12-20 10:43:30 1 2013-12- 20 10:44:30 1

I want this to be represented as

我想把它表示为

test_1 test_2 hd_count 2013-12-20 10:40:30 2013-12-20 10:44:30 8

test_1 test_2 hd_count 2013-12-20 10:40:30 2013-12-20 10:44:30 8

I have looked into similar request in other answer but none of them seem to work for big query.Any help would be appreciated

我在其他答案中调查了类似的请求,但它们似乎都不适用于大查询。任何帮助将不胜感激

2 个解决方案

#1


1  

Below is for BigQuery Standard SQL

以下是BigQuery Standard SQL

#standardSQL
WITH minmax AS (
  SELECT MIN(test) AS mintest, MAX(test) AS maxtest, 5 AS step
  FROM `project.dataset.table`
), intervals AS (
  SELECT 
    TIMESTAMP_ADD(mintest, INTERVAL step * num MINUTE) AS test1,
    TIMESTAMP_ADD(mintest, INTERVAL step * 60* (1 + num) - 1 SECOND) AS test2
  FROM minmax, 
  UNNEST(GENERATE_ARRAY(0,  DIV(TIMESTAMP_DIFF(maxtest, mintest, MINUTE) , step))) AS num
)
SELECT test1, test2, SUM(hd_count) AS hd_count
FROM intervals JOIN `project.dataset.table`
ON test BETWEEN test1 AND test2
GROUP BY test1, test2

you can test / play with above using dummy data as below

您可以使用如下虚拟数据进行上述测试/播放

#standardSQL
WITH `project.dataset.table` AS (
  SELECT TIMESTAMP '2013-12-20 10:40:30' test, 1 hd_count UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:41:30', 3 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:42:30', 2 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:43:30', 1 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:44:30', 1 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:45:30', 3 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:46:30', 2 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:47:30', 1  
), minmax AS (
  SELECT MIN(test) AS mintest, MAX(test) AS maxtest, 5 AS step
  FROM `project.dataset.table`
), intervals AS (
  SELECT 
    TIMESTAMP_ADD(mintest, INTERVAL step * num MINUTE) AS test1,
    TIMESTAMP_ADD(mintest, INTERVAL step * 60* (1 + num) - 1 SECOND) AS test2
  FROM minmax, 
  UNNEST(GENERATE_ARRAY(0,  DIV(TIMESTAMP_DIFF(maxtest, mintest, MINUTE) , step))) AS num
)
SELECT test1, test2, SUM(hd_count) AS hd_count
FROM intervals JOIN `project.dataset.table`
ON test BETWEEN test1 AND test2
GROUP BY test1, test2
ORDER BY test1   

output is as below

输出如下

test1                       test2                       hd_count     
2013-12-20 10:40:30 UTC     2013-12-20 10:45:29 UTC     8    
2013-12-20 10:45:30 UTC     2013-12-20 10:50:29 UTC     6    

#2


1  

Here's a standard SQL UDF-based approach that enables arbitrary alignments up to millisecond precision. I use it when I'm working with more granular time intervals:

这是一种基于SQL UDF的标准方法,可以实现高达毫秒精度的任意对齐。我在使用更精细的时间间隔时使用它:

CREATE TEMPORARY FUNCTION bracketTimestampByMillis(ts TIMESTAMP, bracketMillis INT64) RETURNS TIMESTAMP AS (
  TIMESTAMP_MILLIS(CAST(FLOOR(
       (UNIX_MILLIS(ts) -  UNIX_MILLIS(TIMESTAMP_TRUNC(ts, DAY))) / bracketMillis) AS INT64) 
  * bracketMillis + UNIX_MILLIS(TIMESTAMP_TRUNC(ts, DAY))));

To demonstrate, here's another UDF which consumes the first to build an array of timestamps aligned to various intervals:

为了演示,这里是另一个UDF,它使用第一个构建一个与各种间隔对齐的时间戳数组:

CREATE TEMPORARY FUNCTION emitTimeBrackets(ts TIMESTAMP) RETURNS ARRAY<STRUCT<bracket STRING, tsVal TIMESTAMP>> AS (
    [STRUCT("exact" as bracket, ts as tsVal),
     STRUCT("minute", bracketTimestampByMillis(ts, 60 * 1000)),
     STRUCT("5 minute", bracketTimestampByMillis(ts, 5 * 60 * 1000)),
     STRUCT("15 minute", bracketTimestampByMillis(ts, 15 * 60 * 1000)),
     STRUCT("hour", bracketTimestampByMillis(ts, 60 * 60 * 1000)),
     STRUCT("quarter day", bracketTimestampByMillis(ts, 6 * 3600 * 1000))
    ]
);

SELECT emitTimeBrackets(CURRENT_TIMESTAMP()) as b

#1


1  

Below is for BigQuery Standard SQL

以下是BigQuery Standard SQL

#standardSQL
WITH minmax AS (
  SELECT MIN(test) AS mintest, MAX(test) AS maxtest, 5 AS step
  FROM `project.dataset.table`
), intervals AS (
  SELECT 
    TIMESTAMP_ADD(mintest, INTERVAL step * num MINUTE) AS test1,
    TIMESTAMP_ADD(mintest, INTERVAL step * 60* (1 + num) - 1 SECOND) AS test2
  FROM minmax, 
  UNNEST(GENERATE_ARRAY(0,  DIV(TIMESTAMP_DIFF(maxtest, mintest, MINUTE) , step))) AS num
)
SELECT test1, test2, SUM(hd_count) AS hd_count
FROM intervals JOIN `project.dataset.table`
ON test BETWEEN test1 AND test2
GROUP BY test1, test2

you can test / play with above using dummy data as below

您可以使用如下虚拟数据进行上述测试/播放

#standardSQL
WITH `project.dataset.table` AS (
  SELECT TIMESTAMP '2013-12-20 10:40:30' test, 1 hd_count UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:41:30', 3 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:42:30', 2 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:43:30', 1 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:44:30', 1 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:45:30', 3 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:46:30', 2 UNION ALL
  SELECT TIMESTAMP '2013-12-20 10:47:30', 1  
), minmax AS (
  SELECT MIN(test) AS mintest, MAX(test) AS maxtest, 5 AS step
  FROM `project.dataset.table`
), intervals AS (
  SELECT 
    TIMESTAMP_ADD(mintest, INTERVAL step * num MINUTE) AS test1,
    TIMESTAMP_ADD(mintest, INTERVAL step * 60* (1 + num) - 1 SECOND) AS test2
  FROM minmax, 
  UNNEST(GENERATE_ARRAY(0,  DIV(TIMESTAMP_DIFF(maxtest, mintest, MINUTE) , step))) AS num
)
SELECT test1, test2, SUM(hd_count) AS hd_count
FROM intervals JOIN `project.dataset.table`
ON test BETWEEN test1 AND test2
GROUP BY test1, test2
ORDER BY test1   

output is as below

输出如下

test1                       test2                       hd_count     
2013-12-20 10:40:30 UTC     2013-12-20 10:45:29 UTC     8    
2013-12-20 10:45:30 UTC     2013-12-20 10:50:29 UTC     6    

#2


1  

Here's a standard SQL UDF-based approach that enables arbitrary alignments up to millisecond precision. I use it when I'm working with more granular time intervals:

这是一种基于SQL UDF的标准方法,可以实现高达毫秒精度的任意对齐。我在使用更精细的时间间隔时使用它:

CREATE TEMPORARY FUNCTION bracketTimestampByMillis(ts TIMESTAMP, bracketMillis INT64) RETURNS TIMESTAMP AS (
  TIMESTAMP_MILLIS(CAST(FLOOR(
       (UNIX_MILLIS(ts) -  UNIX_MILLIS(TIMESTAMP_TRUNC(ts, DAY))) / bracketMillis) AS INT64) 
  * bracketMillis + UNIX_MILLIS(TIMESTAMP_TRUNC(ts, DAY))));

To demonstrate, here's another UDF which consumes the first to build an array of timestamps aligned to various intervals:

为了演示,这里是另一个UDF,它使用第一个构建一个与各种间隔对齐的时间戳数组:

CREATE TEMPORARY FUNCTION emitTimeBrackets(ts TIMESTAMP) RETURNS ARRAY<STRUCT<bracket STRING, tsVal TIMESTAMP>> AS (
    [STRUCT("exact" as bracket, ts as tsVal),
     STRUCT("minute", bracketTimestampByMillis(ts, 60 * 1000)),
     STRUCT("5 minute", bracketTimestampByMillis(ts, 5 * 60 * 1000)),
     STRUCT("15 minute", bracketTimestampByMillis(ts, 15 * 60 * 1000)),
     STRUCT("hour", bracketTimestampByMillis(ts, 60 * 60 * 1000)),
     STRUCT("quarter day", bracketTimestampByMillis(ts, 6 * 3600 * 1000))
    ]
);

SELECT emitTimeBrackets(CURRENT_TIMESTAMP()) as b