如何选择使用WITH RECURSIVE子句

I have googled and read throug some articles like this postgreSQL manual page or this blog page and tried making queries myself with a moderate success (part of them hangs, while others works good and fast), but so far I can not completely understand how this magic works.

我已经谷歌搜索并阅读了一些像这个postgreSQL手册页或这个博客页面的文章，并尝试自己成功地进行查询（其中一部分挂起，而其他人工作良好和快速），但到目前为止，我无法完全理解这是怎么回事神奇的作品。

Can anybody give very clear explanation demonstrating such query semantics and execution process, better based on typical samples like factorial calculation or full tree expansion from (id,parent_id,name) table?

任何人都可以给出非常明确的解释来演示这样的查询语义和执行过程，更好地基于典型样本，如因子计算或来自（id，parent_id，name）表的完整树扩展？

And what are the basic guidlines and typical mistakes that one should know to make good with recursive queries?

使用递归查询，应该知道的基本指导和典型错误是什么？

1 个解决方案

#1

First of all, let us try to simplify and clarify algorithm description given on the manual page. To simplify it consider only union all in with recursive clause for now (and union later):

首先，让我们尝试简化和阐明手册页上给出的算法描述。为了简化它，现在考虑只使用recursive子句加入union（以及以后的union）：

WITH RECURSIVE pseudo-entity-name(column-names) AS (
    Initial-SELECT
UNION ALL
    Recursive-SELECT using pseudo-entity-name
)
Outer-SELECT using pseudo-entity-name

To clarify it let us describe query execution process in pseudo code:

为了澄清它，让我们用伪代码描述查询执行过程：

working-recordset = result of Initial-SELECT

append working-recordset to empty outer-recordset

while( working-recordset is not empty ) begin

    new working-recordset = result of Recursive-SELECT 
        taking previous working-recordset as pseudo-entity-name

    append working-recordset to outer-recordset

end

overall-result = result of Outer-SELECT 
    taking outer-recordset as pseudo-entity-name

Or even shorter - Database engine executes initial select, taking its result rows as working set. Then it repeatedly executes recursive select on the working set, each time replacing contents of the working set with query result obtained. This process ends when empty set is returned by recursive select. And all result rows given firstly by initial select and then by recursive select are gathered and feeded to outer select, which result becomes overall query result.

甚至更短 - 数据库引擎执行初始选择，将其结果行作为工作集。然后它在工作集上重复执行递归选择，每次用获得的查询结果替换工作集的内容。当递归选择返回空集时，此过程结束。并且首先通过初始选择然后通过递归选择给出的所有结果行被收集并且被馈送到外部选择，该结果变为整体查询结果。

This query is calculating factorial of 3:

此查询计算阶乘3：

WITH RECURSIVE factorial(F,n) AS (
    SELECT 1 F, 3 n
UNION ALL
    SELECT F*n F, n-1 n from factorial where n>1
)
SELECT F from factorial where n=1

Initial select SELECT 1 F, 3 n gives us initial values: 3 for argument and 1 for function value.
Recursive select SELECT F*n F, n-1 n from factorial where n>1 states that every time we need to multiply last funcion value by last argument value and decrement argument value.
Database engine executes it like this:

初始选择SELECT 1 F，3 n给出初始值：3表示参数，1表示函数值。从因子中递归选择SELECT F * n F，n-1 n，其中n> 1表示每次我们需要将最后一个函数值乘以最后一个参数值并递减参数值。数据库引擎执行它如下：

First of all it executes initail select, which gives the initial state of working recordset:

首先，它执行initail select，它给出了工作记录集的初始状态：

F | n
--+--
1 | 3

Then it transforms working recordset with recursive query and obtain its second state:

然后它使用递归查询转换工作记录集并获得其第二个状态：

F | n
--+--
3 | 2

Then third state:

然后第三个状态：

F | n
--+--
6 | 1

In the third state there is no row which follows n>1 condition in recursive select, so forth working set is loop exits.

在第三个状态中，在递归选择中没有跟随n> 1条件的行，因此工作集是循环退出。

Outer recordset now holds all the rows, returned by initial and recursive select:

外部记录集现在包含所有行，由初始和递归选择返回：

F | n
--+--
1 | 3
3 | 2
6 | 1

Outer select filters out all intermediate results from outer recordset, showing only final factorial value which becomes overall query result:

外部选择过滤掉来自外部记录集的所有中间结果，仅显示最终的因子值，这将成为整体查询结果：

F 
--
6

And now let us consider table forest(id,parent_id,name):

现在让我们考虑表林（id，parent_id，name）：

id | parent_id | name
---+-----------+-----------------
1  |           | item 1
2  | 1         | subitem 1.1
3  | 1         | subitem 1.2
4  | 1         | subitem 1.3
5  | 3         | subsubitem 1.2.1
6  |           | item 2
7  | 6         | subitem 2.1
8  |           | item 3

'Expanding full tree' here means sorting tree items in human-readable depth-first order while calculating their levels and (maybe) paths. Both tasks (of correct sorting and calculating level or path) are not solvable in one (or even any constant number of) SELECT without using WITH RECURSIVE clause (or Oracle CONNECT BY clause, which is not supported by PostgreSQL). But this recursive query does the job (well, almost does, see the note below):

这里的“扩展完整树”意味着在人类可读的深度优先顺序中对树项进行排序，同时计算它们的级别和（可能）路径。两个任务（正确的排序和计算级别或路径）都不能在一个（甚至任何常数）SELECT中解决，而不使用WITH RECURSIVE子句（或Oracle CONNECT BY子句，PostgreSQL不支持）。但是这个递归查询完成了这项工作（好吧，几乎可以，请参阅下面的注释）：

WITH RECURSIVE fulltree(id,parent_id,level,name,path) AS (
    SELECT id, parent_id, 1 as level, name, name||'' as path from forest where parent_id is null
UNION ALL
    SELECT t.id, t.parent_id, ft.level+1 as level, t.name, ft.path||' / '||t.name as path
    from forest t, fulltree ft where t.parent_id = ft.id
)
SELECT * from fulltree order by path

Database engine executes it like this:

数据库引擎执行它如下：

Firstly, it executes initail select, which gives all highest level items (roots) from forest table:

首先，它执行initail select，它从森林表中提供所有*别的项目（根）：

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
1  |           | 1     | item 1           | item 1
8  |           | 1     | item 3           | item 3
6  |           | 1     | item 2           | item 2

Then, it executes recursive select, which gives all 2nd level items from forest table:

然后，它执行递归选择，它提供来自林表的所有第二级项目：

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
2  | 1         | 2     | subitem 1.1      | item 1 / subitem 1.1
3  | 1         | 2     | subitem 1.2      | item 1 / subitem 1.2
4  | 1         | 2     | subitem 1.3      | item 1 / subitem 1.3
7  | 6         | 2     | subitem 2.1      | item 2 / subitem 2.1

Then, it executes recursive select again, retrieving 3d level items:

然后，它再次执行递归选择，检索3d级别项：

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
5  | 3         | 3     | subsubitem 1.2.1 | item 1 / subitem 1.2 / subsubitem 1.2.1

And now it executes recursive select again, trying to retrieve 4th level items, but there are none of them, so the loop exits.

现在它再次执行递归选择，尝试检索第4级项目，但没有它们，所以循环退出。

The outer SELECT sets the correct human-readable row order, sorting on path column:

外部SELECT设置正确的人类可读行顺序，在路径列上排序：

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
1  |           | 1     | item 1           | item 1
2  | 1         | 2     | subitem 1.1      | item 1 / subitem 1.1
3  | 1         | 2     | subitem 1.2      | item 1 / subitem 1.2
5  | 3         | 3     | subsubitem 1.2.1 | item 1 / subitem 1.2 / subsubitem 1.2.1
4  | 1         | 2     | subitem 1.3      | item 1 / subitem 1.3
6  |           | 1     | item 2           | item 2
7  | 6         | 2     | subitem 2.1      | item 2 / subitem 2.1
8  |           | 1     | item 3           | item 3

NOTE: Resulting row order will remain correct only while there are no punctuation characters collation-preceeding / in the item names. If we rename Item 2 in Item 1 *, it will break row order, standing between Item 1 and its descendants.
More stable solution is using tab character (E'\t') as path separator in query (which can be substituted by more readable path separator later: in outer select, before displaing to human or etc). Tab separated paths will retain correct order until there are tabs or control characters in the item names - which easily can be checked and ruled out without loss of usability.

注意：只有在项目名称中没有标点字符排序 - 前/后，结果行顺序才会保持正确。如果我们重命名第1项中的第2项，它将中断第1项及其后代之间的行顺序。更稳定的解决方案是使用制表符（E'\ t'）作为查询中的路径分隔符（稍后可以用更易读的路径分隔符替换：在外部选择中，在移位到人类之前等）。制表符分隔的路径将保留正确的顺序，直到项目名称中有选项卡或控制字符 - 可以轻松检查和排除，而不会丢失可用性。

It is very simple to modify last query to expand any arbitrary subtree - you need only to substitute condition parent_id is null with perent_id=1 (for example). Note that this query variant will return all levels and paths relative to Item 1.

修改最后一个查询以扩展任意子树非常简单 - 只需替换条件parent_id为空，perent_id = 1（例如）。请注意，此查询变量将返回相对于项1的所有级别和路径。

And now about typical mistakes. The most notable typical mistake specific to recursive queries is defining ill stop conditions in recursive select, which results in infinite looping.

现在关于典型的错误。特定于递归查询的最显着的典型错误是在递归选择中定义不良停止条件，这导致无限循环。

For example, if we omit where n>1 condition in factorial sample above, execution of recursive select will never give an empty set (because we have no condition to filter out single row) and looping will continue infinitely.

例如，如果我们省略上面的factorial sample中的n> 1条件，则递归select的执行将永远不会给出空集（因为我们没有条件来过滤掉单行）并且循环将无限地继续。

That is the most probable reason why some of your queries hang (the other non-specific but still possible reason is very ineffective select, which executes in finite but very long time).

这是你的一些查询挂起的最可能的原因（另一个非特定但仍然可能的原因是非常无效的选择，它在有限但非常长的时间内执行）。

There are not much RECURSIVE-specific querying guidlines to mention, as far as I know. But I would like to suggest (rather obvious) step by step recursive query building procedure.

据我所知，目前还没有太多特定于RECURSIVE的查询指南。但我想建议（相当明显）逐步递归查询构建过程。

Separately build and debug your initial select.

单独构建和调试初始选择。
Wrap it with scaffolding WITH RECURSIVE construct
and begin building and debugging your recursive select.

使用带有RECURSIVE构造的脚手架包裹它，并开始构建和调试递归选择。

The recommended scuffolding construct is like this:

推荐的scuffolding结构如下：

WITH RECURSIVE rec( <Your column names> ) AS (
    <Your ready and working initial SELECT>
UNION ALL
    <Recursive SELECT that you are debugging now>
)
SELECT * from rec limit 1000

This simplest outer select will output the whole outer recordset, which, as we know, contains all output rows from initial select and every execution of recusrive select in a loop in their original output order - just like in samples above! The limit 1000 part will prevent hanging, replacing it with oversized output in which you will be able to see the missed stop point.

这个最简单的外部选择将输出整个外部记录集，正如我们所知，它包含来自初始选择的所有输出行，并且每次执行recusrive都会以原始输出顺序循环选择 - 就像上面的示例一样！限制1000部分将防止悬挂，用超大输出替换它，您将能够看到错过的停止点。

After debugging initial and recursive select build and debug your outer select.
调试初始和递归后选择构建并调试外部选择。

And now the last thing to mention - the difference in using union instead of union all in with recursive clause. It introduces row uniqueness constraint which results in two extra lines in our execution pseudocode:

现在最后要提到的是 - 使用union而不是union的区别在于recursive子句。它引入了行唯一性约束，在我们的执行伪代码中产生了两个额外的行：

working-recordset = result of Initial-SELECT

discard duplicate rows from working-recordset /*union-specific*/

append working-recordset to empty outer-recordset

while( working-recordset is not empty ) begin

    new working-recordset = result of Recursive-SELECT 
        taking previous working-recordset as pseudo-entity-name

    discard duplicate rows and rows that have duplicates in outer-recordset 
        from working-recordset /*union-specific*/

    append working-recordset to outer-recordset

end

overall-result = result of Outer-SELECT 
    taking outer-recordset as pseudo-entity-name

#1