如何在窗口函数中使用环形数据结构

I have data that is arranged in a ring structure (or circular buffer), that is it can be expressed as sequences that cycle: ...-1-2-3-4-5-1-2-3-.... See this picture to get an idea of a 5-part ring:

我有数据以环形结构(或循环缓冲区)排列,也就是说它可以表示为循环的序列:...- 1-2-3-4-5-1-2-3 -... 。看到这张图片,了解一个5部分的戒指:

如何在窗口函数中使用环形数据结构

I'd like to create a window query that can combine the lag and lead items into a three point array, but I can't figure it out. For example at part 1 of a 5-part ring, the lag/lead sequence is 5-1-2, or at part 4 is 3-4-5.

我想创建一个窗口查询,可以将滞后和铅项目组合成一个三点数组,但我无法弄明白。例如,在5部分环的第1部分,滞后/超前序列是5-1-2,或者部分4是3-4-5。

Here is an example table of two rings with different numbers of parts (always more than three per ring):

以下是两个具有不同数量部件的环的示例表(每个环总是多于三个):

create table rp (ring int, part int);
insert into rp(ring, part) values(1, generate_series(1, 5));
insert into rp(ring, part) values(2, generate_series(1, 7));

Here is a nearly successful query:

这是一个几乎成功的查询:

SELECT ring, part, array[
    lag(part, 1, NULL) over (partition by ring),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;

 ring | part | neighbours
------+------+------------
    1 |    1 | {NULL,1,2}
    1 |    2 | {1,2,3}
    1 |    3 | {2,3,4}
    1 |    4 | {3,4,5}
    1 |    5 | {4,5,1}
    2 |    1 | {NULL,1,2}
    2 |    2 | {1,2,3}
    2 |    3 | {2,3,4}
    2 |    4 | {3,4,5}
    2 |    5 | {4,5,6}
    2 |    6 | {5,6,7}
    2 |    7 | {6,7,1}
(12 rows)

The only thing I need to do is to replace the NULL with the ending point of each ring, which is the last value. Now, along with lag and lead window functions, there is a last_value function which would be ideal. However, these cannot be nested:

我唯一需要做的就是用每个环的终点替换NULL,这是最后一个值。现在,除了滞后和前导窗口函数之外,还有一个理想的last_value函数。但是,这些不能嵌套:

SELECT ring, part, array[
    lag(part, 1, last_value(part) over (partition by ring)) over (partition by ring),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;
ERROR:  window function calls cannot be nested
LINE 2:     lag(part, 1, last_value(part) over (partition by ring)) ...

Update. Thanks to @Justin's suggestion to use coalesce to avoid nesting window functions. Furthermore, it has been pointed out by numerous folks that first/last values need an explicit order by on the ring sequence, which happens to be part for this example. So randomising the input data a bit:

更新。感谢@ Justin建议使用coalesce来避免嵌套窗口函数。此外,许多人已经指出,第一个/最后一个值需要通过环序列的明确顺序,这恰好是该示例的一部分。所以稍微随机化输入数据:

create table rp (ring int, part int);
insert into rp(ring, part) select 1, generate_series(1, 5) order by random();
insert into rp(ring, part) select 2, generate_series(1, 7) order by random();

2 个解决方案

#1

Use COALESCE like @Justin provided.

像@Justin一样使用COALESCE。

With first_value() / last_value() you need to add an ORDER BY clause to the window definition or the order is undefined. You just got lucky in the example, because the rows happen to be in order right after creating the dummy table.
Once you add ORDER BY, the default window frame ends at the current row, and you need to special case the last_value() call - or revert the sort order in the window frame like demonstrated in my first example.

使用first_value()/ last_value(),您需要向窗口定义添加ORDER BY子句,或者未定义顺序。你刚才在这个例子中很幸运,因为在创建虚拟表之后,这些行恰好按顺序排列。添加ORDER BY后,默认窗口框架将在当前行结束,您需要特殊情况下调用last_value() - 或者在窗口框架中恢复排序顺序,如我的第一个示例中所示。
When reusing a window definition multiple times, an explicit WINDOW clause simplifies syntax a lot:

当多次重用窗口定义时,显式的WINDOW子句简化了语法:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,first_value(part) OVER (PARTITION BY ring ORDER BY part DESC))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part);

Better yet, reuse the same window definition, so Postgres can calculate all values in a single scan. For this to work we need to define a custom window frame:

更好的是,重用相同的窗口定义,因此Postgres可以在单次扫描中计算所有值。为此,我们需要定义一个自定义窗口框架:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER w)
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring
             ORDER BY part
             RANGE BETWEEN UNBOUNDED PRECEDING
                       AND UNBOUNDED FOLLOWING)
ORDER  BY 1,2;

You can even adapt the frame definition for each window function call:

您甚至可以调整每个窗口函数调用的帧定义:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER (w RANGE BETWEEN CURRENT ROW
                                                AND UNBOUNDED FOLLOWING))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part)
ORDER  BY 1,2;

Might be faster for rings with many parts. You'll have to test.

对于具有许多部件的环,可能会更快。你必须测试。

SQL Fiddle demonstrating all three with an improved test case. Consider query plans.

SQL Fiddle使用改进的测试用例演示了所有三个。考虑查询计划。

More about window frame definitions:

有关窗框定义的更多信息:

In the manual.

在手册中。

PostgreSQL window function: partition by comparison

PostgreSQL窗口函数:通过比较分区

PostgreSQL query with max and min date plus associated id per row

具有最大和最小日期以及每行相关id的PostgreSQL查询

#2

Query:

SQLFIDDLEExample

SELECT ring, part, array[
    coalesce(lag(part, 1, NULL) over (partition by ring), 
             max(part) over (partition by ring)),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;

Result:

| RING | PART | NEIGHBOURS |
|------|------|------------|
|    1 |    1 |      5,1,2 |
|    1 |    2 |      1,2,3 |
|    1 |    3 |      2,3,4 |
|    1 |    4 |      3,4,5 |
|    1 |    5 |      4,5,1 |
|    2 |    1 |      7,1,2 |
|    2 |    2 |      1,2,3 |
|    2 |    3 |      2,3,4 |
|    2 |    4 |      3,4,5 |
|    2 |    5 |      4,5,6 |
|    2 |    6 |      5,6,7 |
|    2 |    7 |      6,7,1 |

#1

Use COALESCE like @Justin provided.

像@Justin一样使用COALESCE。

With first_value() / last_value() you need to add an ORDER BY clause to the window definition or the order is undefined. You just got lucky in the example, because the rows happen to be in order right after creating the dummy table.
Once you add ORDER BY, the default window frame ends at the current row, and you need to special case the last_value() call - or revert the sort order in the window frame like demonstrated in my first example.

使用first_value()/ last_value(),您需要向窗口定义添加ORDER BY子句,或者未定义顺序。你刚才在这个例子中很幸运,因为在创建虚拟表之后,这些行恰好按顺序排列。添加ORDER BY后,默认窗口框架将在当前行结束,您需要特殊情况下调用last_value() - 或者在窗口框架中恢复排序顺序,如我的第一个示例中所示。
When reusing a window definition multiple times, an explicit WINDOW clause simplifies syntax a lot:

当多次重用窗口定义时,显式的WINDOW子句简化了语法:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,first_value(part) OVER (PARTITION BY ring ORDER BY part DESC))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part);

Better yet, reuse the same window definition, so Postgres can calculate all values in a single scan. For this to work we need to define a custom window frame:

更好的是,重用相同的窗口定义,因此Postgres可以在单次扫描中计算所有值。为此,我们需要定义一个自定义窗口框架:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER w)
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring
             ORDER BY part
             RANGE BETWEEN UNBOUNDED PRECEDING
                       AND UNBOUNDED FOLLOWING)
ORDER  BY 1,2;

You can even adapt the frame definition for each window function call:

您甚至可以调整每个窗口函数调用的帧定义:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER (w RANGE BETWEEN CURRENT ROW
                                                AND UNBOUNDED FOLLOWING))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part)
ORDER  BY 1,2;

Might be faster for rings with many parts. You'll have to test.

对于具有许多部件的环,可能会更快。你必须测试。

SQL Fiddle demonstrating all three with an improved test case. Consider query plans.

SQL Fiddle使用改进的测试用例演示了所有三个。考虑查询计划。

More about window frame definitions:

有关窗框定义的更多信息:

In the manual.

在手册中。

PostgreSQL window function: partition by comparison

PostgreSQL窗口函数:通过比较分区

PostgreSQL query with max and min date plus associated id per row

具有最大和最小日期以及每行相关id的PostgreSQL查询

#2

Query:

SQLFIDDLEExample

SELECT ring, part, array[
    coalesce(lag(part, 1, NULL) over (partition by ring), 
             max(part) over (partition by ring)),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;

Result:

| RING | PART | NEIGHBOURS |
|------|------|------------|
|    1 |    1 |      5,1,2 |
|    1 |    2 |      1,2,3 |
|    1 |    3 |      2,3,4 |
|    1 |    4 |      3,4,5 |
|    1 |    5 |      4,5,1 |
|    2 |    1 |      7,1,2 |
|    2 |    2 |      1,2,3 |
|    2 |    3 |      2,3,4 |
|    2 |    4 |      3,4,5 |
|    2 |    5 |      4,5,6 |
|    2 |    6 |      5,6,7 |
|    2 |    7 |      6,7,1 |

秒客网

如何在窗口函数中使用环形数据结构

2 个解决方案

#1

#2

#1

#2

相关文章