
时间:2022-02-16 22:58:18

In PostgreSQL 9.4 the window functions have the new option of a FILTER to select a sub-set of the window frame for processing. The documentation mentions it, but provides no sample. An online search yields some samples, including from 2ndQuadrant but all that I found were rather trivial examples with constant expressions. What I am looking for is a filter expression that includes the value of the current row.

在PostgreSQL 9.4中,窗口函数具有FILTER的新选项,用于选择窗口框架的子集以进行处理。文档提到它,但没有提供样本。在线搜索产生了一些样本,包括来自2ndQuadrant的样本,但我发现的所有样本都是具有常量表达式的相当简单的例子。我要找的是一个包含当前行值的过滤器表达式。

Assume I have a table with a bunch of columns, one of which is of date type:


col1 | col2 |     dt
  1  |  a   | 2015-07-01
  2  |  b   | 2015-07-03
  3  |  c   | 2015-07-10
  4  |  d   | 2015-07-11
  5  |  e   | 2015-07-11
  6  |  f   | 2015-07-13

A window definition for processing on the date over the entire table is trivially constructed: WINDOW win AS (ORDER BY dt)

在整个表上处理日期的窗口定义很简单:WINDOW win AS(ORDER BY dt)

I am interested in knowing how many rows are present in, say, the 4 days prior to the current row (inclusive). So I want to generate this output:


col1 | col2 |     dt     | count
  1  |  a   | 2015-07-01 |   1
  2  |  b   | 2015-07-03 |   2
  3  |  c   | 2015-07-10 |   1
  4  |  d   | 2015-07-11 |   3
  5  |  e   | 2015-07-11 |   3
  6  |  f   | 2015-07-13 |   4

The FILTER clause of the window functions seems like the obvious choice:


count(*) FILTER (WHERE current_row.dt - dt <= 4) OVER win

But how do I specify current_row.dt (for lack of a better syntax)? Is this even possible?


If this is not possible, are there other ways of selecting date ranges in a window frame? The frame specification is no help as it is all row-based.


I am not interested in alternative solutions using sub-queries, it has to be based on window processing.


2 个解决方案



You are not actually aggregating rows, so the new aggregate FILTER clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS clause.


To make that work, aggregate counts per day and LEFT JOIN to a full set of days in range. Then you can apply a window function:

为了完成这项工作,每天汇总计数,将LEFT JOIN汇总到范围内的整套天数。然后你可以应用一个窗口函数:

SELECT t.*, ct.ct_last4days
   SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
   FROM  (
      SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
      FROM   tbl t1
      ) d
   LEFT   JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
   ) ct
JOIN  tbl t USING (dt);

Omitting ORDER BY dt in the widow frame definition usually works, since the order is carried over from generate_series() in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY and it might break in more complex queries.

在寡妇框架定义中省略ORDER BY dt通常有效,因为顺序是从子查询中的generate_series()继承的。但是如果没有显式的ORDER BY,SQL标准就没有任何保证,它可能会在更复杂的查询中中断。

SQL Fiddle.




I don't think there is any syntax that means "current row" in an expression. The gram.y file for postgres makes a filter clause take just an a_expr, which is just the normal expression clauses. There is nothing specific to window functions or filter clauses in an expression. As far as I can find, the only current row notion in a window clause is for specifying the window frame boundaries. I don't think this gets you what you want.

我认为在表达式中没有任何语法意味着“当前行”。 postgres的gram.y文件使得一个过滤子句只带一个a_expr,它只是普通的表达式子句。表达式中没有特定于窗口函数或过滤器子句的内容。据我所知,window子句中唯一的当前行概念是用于指定窗口框架边界。我不认为这会让你得到你想要的东西。

It's possible that you could get some traction from an enclosing query:



When an aggregate expression appears in a subquery (see Section 4.2.11 and Section 9.22), the aggregate is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate's arguments (and filter_clause if any) contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query.


but it's not obvious to me how.




You are not actually aggregating rows, so the new aggregate FILTER clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS clause.


To make that work, aggregate counts per day and LEFT JOIN to a full set of days in range. Then you can apply a window function:

为了完成这项工作,每天汇总计数,将LEFT JOIN汇总到范围内的整套天数。然后你可以应用一个窗口函数:

SELECT t.*, ct.ct_last4days
   SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
   FROM  (
      SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
      FROM   tbl t1
      ) d
   LEFT   JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
   ) ct
JOIN  tbl t USING (dt);

Omitting ORDER BY dt in the widow frame definition usually works, since the order is carried over from generate_series() in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY and it might break in more complex queries.

在寡妇框架定义中省略ORDER BY dt通常有效,因为顺序是从子查询中的generate_series()继承的。但是如果没有显式的ORDER BY,SQL标准就没有任何保证,它可能会在更复杂的查询中中断。

SQL Fiddle.




I don't think there is any syntax that means "current row" in an expression. The gram.y file for postgres makes a filter clause take just an a_expr, which is just the normal expression clauses. There is nothing specific to window functions or filter clauses in an expression. As far as I can find, the only current row notion in a window clause is for specifying the window frame boundaries. I don't think this gets you what you want.

我认为在表达式中没有任何语法意味着“当前行”。 postgres的gram.y文件使得一个过滤子句只带一个a_expr,它只是普通的表达式子句。表达式中没有特定于窗口函数或过滤器子句的内容。据我所知,window子句中唯一的当前行概念是用于指定窗口框架边界。我不认为这会让你得到你想要的东西。

It's possible that you could get some traction from an enclosing query:



When an aggregate expression appears in a subquery (see Section 4.2.11 and Section 9.22), the aggregate is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate's arguments (and filter_clause if any) contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query.


but it's not obvious to me how.
