PostgreSQL:按分钟运行查询的行数

时间:2022-04-28 23:00:15

I need to query for each minute the total count of rows up to that minute.

我需要查询每分钟直到该分钟的总行数。

The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:

到目前为止,我所能达到的最好成绩并没有成功。它返回每分钟的计数,而不是每分钟的总计数:

SELECT COUNT(id) AS count
     , EXTRACT(hour from "when") AS hour
     , EXTRACT(minute from "when") AS minute
  FROM mytable
 GROUP BY hour, minute

1 个解决方案

#1


70  

Only return minutes with activity

Shortest

SELECT DISTINCT
       date_trunc('minute', "when") AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY 1;
  • Use date_trunc(), it returns exactly what you need.

    使用date_trunc(),它会准确返回您需要的内容。

  • Don't include id in the query, since you want to GROUP BY minute slices.

    不要在查询中包含id,因为您想要GROUP BY小切片。

  • count() is typically used as plain aggregate function. Appending an OVER clause makes it a window function. Omit PARTITION BY in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY. I quote the manual:

    count()通常用作普通聚合函数。附加OVER子句使其成为窗口函数。在窗口定义中省略PARTITION BY - 您希望在所有行上运行计数。默认情况下,它由ORDER BY定义的当前行的第一行到最后一个对等计数。我引用手册:

    The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer.

    默认框架选项是RANGE UNBOUNDED PRECEDING,它与UNBOUNDED PRECEDING和CURRENT ROW之间的RANGE相同。使用ORDER BY,这将帧设置为从分区启动到当前行的最后一个ORDER BY对等体的所有行。

    And that happens to be exactly what you need.

    而这恰好正是您所需要的。

  • Use count(*) rather than count(id). It better fits your question ("count of rows"). It is generally slightly faster than count(id). And, while we might assume that id is NOT NULL, it has not been specified in the question, so count(id) is wrong, strictly speaking, because NULL values are not counted with count(id).

    使用count(*)而不是count(id)。它更适合您的问题(“行数”)。它通常比count(id)略快。并且,虽然我们可能假设id为NOT NULL,但问题中没有指定,因此严格来说count(id)是错误的,因为NULL值不计入count(id)。

  • You can't GROUP BY minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*) would only see 1 row per minute this way.
    You can, however, SELECT DISTINCT, because DISTINCT is applied after window functions.

    您不能在同一查询级别上GROUP BY小切片。在窗口函数之前应用聚合函数,窗函数count(*)这样每分钟只能看到1行。但是,您可以选择SELECT DISTINCT,因为在窗口函数之后应用了DISTINCT。

  • ORDER BY 1 is just shorthand for ORDER BY date_trunc('minute', "when") here.
    1 is a positional reference reference to the 1st expression in the SELECT list.

    ORDER BY 1只是ORDER BY date_trunc('minute',“when”)的简写。图1是SELECT列表中的第一表达式的位置参考参考。

  • Use to_char() if you need to format the result. Like:

    如果需要格式化结果,请使用to_char()。喜欢:

SELECT DISTINCT
       to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY date_trunc('minute', "when");

Fastest

SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) sub
ORDER  BY 1;

Much like the above, but:

与上述非常相似,但是:

  • I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCT in the outer SELECT.

    我使用子查询来聚合和计算每分钟的行数。这样我们在外部SELECT中没有DISTINCT的情况下每分钟获得1行。

  • Use sum() as window aggregate function now to add up the counts from the subquery.

    现在使用sum()作为窗口聚合函数来添加子查询中的计数。

I found this to be substantially faster with many rows per minute.

我发现每分钟有很多行,速度要快得多。

Include minutes without activity

Shortest

@GabiMe asked in a comment how to get eone row for every minute in the time frame, including those where no event occured (no row in base table):

@GabiMe在评论中询问如何在时间范围内每分钟获取一行,包括那些没有发生事件的行(基表中没有行):

SELECT DISTINCT
       minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER  BY 1;
  • Generate a row for every minute in the time frame between the first and the last event with generate_series() - here directly based on aggregated values from the subquery.

    使用generate_series()在第一个和最后一个事件之间的时间范围内为每分钟生成一行 - 这里直接基于子查询的聚合值。

  • LEFT JOIN to all timestamps truncated to the minute and count. NULL values (where no row exists) do not add to the running count.

    LEFT JOIN到截断到分钟和计数的所有时间戳。 NULL值(不存在行)不会添加到运行计数。

Fastest

With CTE:

有了CTE:

WITH cte AS (
   SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) 
SELECT m.minute
     , COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(min(minute), max(minute), interval '1 min')
   FROM   cte
   ) m(minute)
LEFT   JOIN cte USING (minute)
ORDER  BY 1;
  • Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT.

    同样,在第一步中每分钟聚合和计数行,它省略了以后DISTINCT的需要。

  • Different from count(), sum() can return NULL. Default to 0 with COALESCE.

    与count()不同,sum()可以返回NULL。使用COALESCE默认为0。

With many rows and an index on "when" this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:

有很多行和“when”的索引这个带有子查询的版本在我用Postgres 9.1 - 9.4测试的几个变种中最快

SELECT m.minute
     , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) c USING (minute)
ORDER  BY 1;

#1


70  

Only return minutes with activity

Shortest

SELECT DISTINCT
       date_trunc('minute', "when") AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY 1;
  • Use date_trunc(), it returns exactly what you need.

    使用date_trunc(),它会准确返回您需要的内容。

  • Don't include id in the query, since you want to GROUP BY minute slices.

    不要在查询中包含id,因为您想要GROUP BY小切片。

  • count() is typically used as plain aggregate function. Appending an OVER clause makes it a window function. Omit PARTITION BY in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY. I quote the manual:

    count()通常用作普通聚合函数。附加OVER子句使其成为窗口函数。在窗口定义中省略PARTITION BY - 您希望在所有行上运行计数。默认情况下,它由ORDER BY定义的当前行的第一行到最后一个对等计数。我引用手册:

    The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer.

    默认框架选项是RANGE UNBOUNDED PRECEDING,它与UNBOUNDED PRECEDING和CURRENT ROW之间的RANGE相同。使用ORDER BY,这将帧设置为从分区启动到当前行的最后一个ORDER BY对等体的所有行。

    And that happens to be exactly what you need.

    而这恰好正是您所需要的。

  • Use count(*) rather than count(id). It better fits your question ("count of rows"). It is generally slightly faster than count(id). And, while we might assume that id is NOT NULL, it has not been specified in the question, so count(id) is wrong, strictly speaking, because NULL values are not counted with count(id).

    使用count(*)而不是count(id)。它更适合您的问题(“行数”)。它通常比count(id)略快。并且,虽然我们可能假设id为NOT NULL,但问题中没有指定,因此严格来说count(id)是错误的,因为NULL值不计入count(id)。

  • You can't GROUP BY minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*) would only see 1 row per minute this way.
    You can, however, SELECT DISTINCT, because DISTINCT is applied after window functions.

    您不能在同一查询级别上GROUP BY小切片。在窗口函数之前应用聚合函数,窗函数count(*)这样每分钟只能看到1行。但是,您可以选择SELECT DISTINCT,因为在窗口函数之后应用了DISTINCT。

  • ORDER BY 1 is just shorthand for ORDER BY date_trunc('minute', "when") here.
    1 is a positional reference reference to the 1st expression in the SELECT list.

    ORDER BY 1只是ORDER BY date_trunc('minute',“when”)的简写。图1是SELECT列表中的第一表达式的位置参考参考。

  • Use to_char() if you need to format the result. Like:

    如果需要格式化结果,请使用to_char()。喜欢:

SELECT DISTINCT
       to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY date_trunc('minute', "when");

Fastest

SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) sub
ORDER  BY 1;

Much like the above, but:

与上述非常相似,但是:

  • I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCT in the outer SELECT.

    我使用子查询来聚合和计算每分钟的行数。这样我们在外部SELECT中没有DISTINCT的情况下每分钟获得1行。

  • Use sum() as window aggregate function now to add up the counts from the subquery.

    现在使用sum()作为窗口聚合函数来添加子查询中的计数。

I found this to be substantially faster with many rows per minute.

我发现每分钟有很多行,速度要快得多。

Include minutes without activity

Shortest

@GabiMe asked in a comment how to get eone row for every minute in the time frame, including those where no event occured (no row in base table):

@GabiMe在评论中询问如何在时间范围内每分钟获取一行,包括那些没有发生事件的行(基表中没有行):

SELECT DISTINCT
       minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER  BY 1;
  • Generate a row for every minute in the time frame between the first and the last event with generate_series() - here directly based on aggregated values from the subquery.

    使用generate_series()在第一个和最后一个事件之间的时间范围内为每分钟生成一行 - 这里直接基于子查询的聚合值。

  • LEFT JOIN to all timestamps truncated to the minute and count. NULL values (where no row exists) do not add to the running count.

    LEFT JOIN到截断到分钟和计数的所有时间戳。 NULL值(不存在行)不会添加到运行计数。

Fastest

With CTE:

有了CTE:

WITH cte AS (
   SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) 
SELECT m.minute
     , COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(min(minute), max(minute), interval '1 min')
   FROM   cte
   ) m(minute)
LEFT   JOIN cte USING (minute)
ORDER  BY 1;
  • Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT.

    同样,在第一步中每分钟聚合和计数行,它省略了以后DISTINCT的需要。

  • Different from count(), sum() can return NULL. Default to 0 with COALESCE.

    与count()不同,sum()可以返回NULL。使用COALESCE默认为0。

With many rows and an index on "when" this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:

有很多行和“when”的索引这个带有子查询的版本在我用Postgres 9.1 - 9.4测试的几个变种中最快

SELECT m.minute
     , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) c USING (minute)
ORDER  BY 1;