PostgreSQL - 我应该如何使用first_value()?

时间:2022-08-12 22:59:07

This answer to shows how to produce High/Low/Open/Close values from a ticker:
Retrieve aggregates for arbitrary time intervals

此答案显示如何从股票代码生成高/低/开/关值:检索任意时间间隔的聚合

I am trying to implement a solution based on this (PG 9.2), but am having difficulty in getting the correct value for first_value().

我正在尝试实现基于此的解决方案(PG 9.2),但我很难获得first_value()的正确值。

So far, I have tried two queries:

到目前为止,我尝试了两个查询:

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp), floor(extract(minute FROM cstamp) / 5)
    ORDER BY date_trunc('hour',cstamp) ASC, floor(extract(minute FROM cstamp) / 5) ASC
    )
ORDER BY cstamp;

Here's a piece of the result:

这是结果的一部分:

        cstamp         price      h                 m5  min      max      first    last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000

"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000

"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;73.99004;77.80000;73.99004;73.99004

As you can see, 77.8 is not what I believe is the correct value for first_value(), which should be 77.0.

正如您所看到的,77.8并不是我认为first_value()的正确值,它应该是77.0。

I though this might be due to the ambiguous ORDER BY in the WINDOW, so I changed this to

我虽然这可能是由于WINDOW中的ORDER BY模糊,所以我将其更改为

ORDER BY cstamp ASC 

but this appears to upset the PARTITION as well:

但这似乎也打乱了PARTITION:

        cstamp         price      h                 m5  min      max      first    last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000

"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.00000;77.00000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.80000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.00000

"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;77.00000;77.00000;77.00000;77.00000

since the values for max and last now vary within the partition.

因为max和last的值现在在分区内变化。

What am I doing wrong? Could someone help me better to understand the relation between PARTITION and ORDER within a WINDOW?

我究竟做错了什么?有人可以帮助我更好地理解WINDOW中PARTITION和ORDER之间的关系吗?


Although I have an answer, here's a trimmed-down pg_dump which will allow anyone to recreate the table. The only thing that's different is the table name.

虽然我有一个答案,这里是一个精简的pg_dump,允许任何人重新创建表。唯一不同的是表名。

CREATE TABLE wtest (
    cstamp timestamp without time zone,
    price numeric(10,5)
);

COPY wtest (cstamp, price) FROM stdin;
2013-03-29 09:04:54 77.80000
2013-03-29 09:04:50 76.98000
2013-03-29 09:29:51 77.00000
2013-03-29 09:29:41 77.80000
2013-03-29 09:26:18 77.00000
2013-03-29 09:19:14 77.00000
2013-03-29 09:19:10 77.00000
2013-03-29 09:33:50 76.00000
2013-03-29 09:33:46 76.10000
2013-03-29 09:33:15 77.79000
2013-03-29 09:30:08 77.80000
2013-03-29 09:30:04 77.00000
\.

3 个解决方案

#1


19  

SQL Fiddle

SQL小提琴

All the functions you used act on the window frame, not on the partition. If omitted the frame end is the current row. To make the window frame to be the whole partition declare it in the frame clause (range...):

您使用的所有功能都作用于窗口框架,而不是分区。如果省略,则帧结束是当前行。要使窗口框架成为整个分区,请在frame子句中声明它(范围...):

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5)
    ORDER BY cstamp
    range between unbounded preceding and unbounded following
    )
ORDER BY cstamp;

#2


8  

Here's a quick query to illustrate the behaviour:

这是一个快速查询来说明行为:

select 
  v,
  first_value(v) over w1 f1,
  first_value(v) over w2 f2,
  first_value(v) over w3 f3,
  last_value (v) over w1 l1,
  last_value (v) over w2 l2,
  last_value (v) over w3 l3,
  max        (v) over w1 m1,
  max        (v) over w2 m2,
  max        (v) over w3 m3,
  max        (v) over () m4
from (values(1),(2),(3),(4)) t(v)
window
  w1 as (order by v),
  w2 as (order by v rows between unbounded preceding and current row),
  w3 as (order by v rows between unbounded preceding and unbounded following)

The output of the above query can be seen here (SQLFiddle here):

可以在这里看到上述查询的输出(这里是SQLFiddle):

| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 |
|---|----|----|----|----|----|----|----|----|----|----|
| 1 |  1 |  1 |  1 |  1 |  1 |  4 |  1 |  1 |  4 |  4 |
| 2 |  1 |  1 |  1 |  2 |  2 |  4 |  2 |  2 |  4 |  4 |
| 3 |  1 |  1 |  1 |  3 |  3 |  4 |  3 |  3 |  4 |  4 |
| 4 |  1 |  1 |  1 |  4 |  4 |  4 |  4 |  4 |  4 |  4 |

Few people think of the implicit frames that are applied to window functions that take an ORDER BY clause. In this case, windows are defaulting to the frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Think about it this way:

很少有人会想到应用于带有ORDER BY子句的窗口函数的隐式框架。在这种情况下,窗口默认为在UNBOUNDED PRECEDING和CURRENT ROW之间的框架ROWS。这样考虑一下:

  • On the row with v = 1 the ordered window's frame spans v IN (1)
  • 在v = 1的行上,有序窗口的帧跨越v IN(1)
  • On the row with v = 2 the ordered window's frame spans v IN (1, 2)
  • 在v = 2的行上,有序窗口的帧跨越v IN(1,2)
  • On the row with v = 3 the ordered window's frame spans v IN (1, 2, 3)
  • 在v = 3的行上,有序窗口的帧跨越v IN(1,2,3)
  • On the row with v = 4 the ordered window's frame spans v IN (1, 2, 3, 4)
  • 在v = 4的行上,有序窗口的帧跨越v IN(1,2,3,4)

If you want to prevent that behaviour, you have two options:

如果要阻止该行为,您有两种选择:

  • Use an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause for ordered window functions
  • 对有序窗函数使用明确的ROWS BETWEEN UNBOUNDED PRECEDING和UNBOUNDED FOLLOWING子句
  • Use no ORDER BY clause in those window functions that allow for omitting them (as MAX(v) OVER())
  • 在那些允许省略它们的窗口函数中不使用ORDER BY子句(如MAX(v)OVER())

More details are explained in this article about LEAD(), LAG(), FIRST_VALUE() and LAST_VALUE()

本文将介绍有关LEAD(),LAG(),FIRST_VALUE()和LAST_VALUE()的更多详细信息。

#3


3  

The result of max() as window function is base on the frame definition.

max()作为窗口函数的结果基于帧定义。

The default frame definition (with ORDER BY) is from the start of the frame up to the last peer of the current row (including the current row and possibly more rows ranking equally according to ORDER BY). In the absence of ORDER BY (like in my answer you are referring to), or if ORDER BY treats every row in the partition as equal (like in your first example), all rows in the partition are peers, and max() produces the same result for every row in the partition, effectively considering all rows of the partition.

默认帧定义(使用ORDER BY)是从帧的开头到当前行的最后一个对等点(包括当前行以及可能根据ORDER BY等级排列的更多行)。在缺少ORDER BY的情况下(如我所说的那样),或者如果ORDER BY将分区中的每一行都视为相等(如第一个示例中所示),则分区中的所有行都是对等的,而max()会生成对于分区中的每一行都有相同的结果,有效地考虑了分区的所有行。

Per documentation:

每个文件:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last peer. Without ORDER BY, all rows of the partition are included in the window frame, since all rows become peers of the current row.

默认框架选项是RANGE UNBOUNDED PRECEDING,它与UNBOUNDED PRECEDING和CURRENT ROW之间的RANGE相同。使用ORDER BY,这会将帧设置为从分区启动到当前行的最后一个对等的所有行。如果没有ORDER BY,则分区的所有行都包含在窗口框架中,因为所有行都成为当前行的对等项。

Bold emphasis mine.

大胆强调我的。

The simple solution would be to omit the ORDER BY in the window definition - just like I demonstrated in the example you are referring to.

简单的解决方案是在窗口定义中省略ORDER BY - 就像我在你所指的例子中演示的那样。

All the gory details about frame specifications in the chapter Window Function Calls in the manual.

有关帧规范的所有详细信息,请参见手册中的“窗口函数调用”一章。

#1


19  

SQL Fiddle

SQL小提琴

All the functions you used act on the window frame, not on the partition. If omitted the frame end is the current row. To make the window frame to be the whole partition declare it in the frame clause (range...):

您使用的所有功能都作用于窗口框架,而不是分区。如果省略,则帧结束是当前行。要使窗口框架成为整个分区,请在frame子句中声明它(范围...):

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5)
    ORDER BY cstamp
    range between unbounded preceding and unbounded following
    )
ORDER BY cstamp;

#2


8  

Here's a quick query to illustrate the behaviour:

这是一个快速查询来说明行为:

select 
  v,
  first_value(v) over w1 f1,
  first_value(v) over w2 f2,
  first_value(v) over w3 f3,
  last_value (v) over w1 l1,
  last_value (v) over w2 l2,
  last_value (v) over w3 l3,
  max        (v) over w1 m1,
  max        (v) over w2 m2,
  max        (v) over w3 m3,
  max        (v) over () m4
from (values(1),(2),(3),(4)) t(v)
window
  w1 as (order by v),
  w2 as (order by v rows between unbounded preceding and current row),
  w3 as (order by v rows between unbounded preceding and unbounded following)

The output of the above query can be seen here (SQLFiddle here):

可以在这里看到上述查询的输出(这里是SQLFiddle):

| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 |
|---|----|----|----|----|----|----|----|----|----|----|
| 1 |  1 |  1 |  1 |  1 |  1 |  4 |  1 |  1 |  4 |  4 |
| 2 |  1 |  1 |  1 |  2 |  2 |  4 |  2 |  2 |  4 |  4 |
| 3 |  1 |  1 |  1 |  3 |  3 |  4 |  3 |  3 |  4 |  4 |
| 4 |  1 |  1 |  1 |  4 |  4 |  4 |  4 |  4 |  4 |  4 |

Few people think of the implicit frames that are applied to window functions that take an ORDER BY clause. In this case, windows are defaulting to the frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Think about it this way:

很少有人会想到应用于带有ORDER BY子句的窗口函数的隐式框架。在这种情况下,窗口默认为在UNBOUNDED PRECEDING和CURRENT ROW之间的框架ROWS。这样考虑一下:

  • On the row with v = 1 the ordered window's frame spans v IN (1)
  • 在v = 1的行上,有序窗口的帧跨越v IN(1)
  • On the row with v = 2 the ordered window's frame spans v IN (1, 2)
  • 在v = 2的行上,有序窗口的帧跨越v IN(1,2)
  • On the row with v = 3 the ordered window's frame spans v IN (1, 2, 3)
  • 在v = 3的行上,有序窗口的帧跨越v IN(1,2,3)
  • On the row with v = 4 the ordered window's frame spans v IN (1, 2, 3, 4)
  • 在v = 4的行上,有序窗口的帧跨越v IN(1,2,3,4)

If you want to prevent that behaviour, you have two options:

如果要阻止该行为,您有两种选择:

  • Use an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause for ordered window functions
  • 对有序窗函数使用明确的ROWS BETWEEN UNBOUNDED PRECEDING和UNBOUNDED FOLLOWING子句
  • Use no ORDER BY clause in those window functions that allow for omitting them (as MAX(v) OVER())
  • 在那些允许省略它们的窗口函数中不使用ORDER BY子句(如MAX(v)OVER())

More details are explained in this article about LEAD(), LAG(), FIRST_VALUE() and LAST_VALUE()

本文将介绍有关LEAD(),LAG(),FIRST_VALUE()和LAST_VALUE()的更多详细信息。

#3


3  

The result of max() as window function is base on the frame definition.

max()作为窗口函数的结果基于帧定义。

The default frame definition (with ORDER BY) is from the start of the frame up to the last peer of the current row (including the current row and possibly more rows ranking equally according to ORDER BY). In the absence of ORDER BY (like in my answer you are referring to), or if ORDER BY treats every row in the partition as equal (like in your first example), all rows in the partition are peers, and max() produces the same result for every row in the partition, effectively considering all rows of the partition.

默认帧定义(使用ORDER BY)是从帧的开头到当前行的最后一个对等点(包括当前行以及可能根据ORDER BY等级排列的更多行)。在缺少ORDER BY的情况下(如我所说的那样),或者如果ORDER BY将分区中的每一行都视为相等(如第一个示例中所示),则分区中的所有行都是对等的,而max()会生成对于分区中的每一行都有相同的结果,有效地考虑了分区的所有行。

Per documentation:

每个文件:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last peer. Without ORDER BY, all rows of the partition are included in the window frame, since all rows become peers of the current row.

默认框架选项是RANGE UNBOUNDED PRECEDING,它与UNBOUNDED PRECEDING和CURRENT ROW之间的RANGE相同。使用ORDER BY,这会将帧设置为从分区启动到当前行的最后一个对等的所有行。如果没有ORDER BY,则分区的所有行都包含在窗口框架中,因为所有行都成为当前行的对等项。

Bold emphasis mine.

大胆强调我的。

The simple solution would be to omit the ORDER BY in the window definition - just like I demonstrated in the example you are referring to.

简单的解决方案是在窗口定义中省略ORDER BY - 就像我在你所指的例子中演示的那样。

All the gory details about frame specifications in the chapter Window Function Calls in the manual.

有关帧规范的所有详细信息,请参见手册中的“窗口函数调用”一章。