将一个平面表解析为树的最有效/最优雅的方法是什么?

Assume you have a flat table that stores an ordered tree hierarchy:

假设您有一个存储有序树层次结构的平面表:

Id   Name         ParentId   Order
 1   'Node 1'            0      10
 2   'Node 1.1'          1      10
 3   'Node 2'            0      20
 4   'Node 1.1.1'        2      10
 5   'Node 2.1'          3      10
 6   'Node 1.2'          1      20

Here's a diagram, where we have [id] Name. Root node 0 is fictional.

这是一个图，我们有[id]名。根节点0是虚构的。

                       [0] ROOT
                          /    \ 
              [1] Node 1          [3] Node 2
              /       \                   \
    [2] Node 1.1     [6] Node 1.2      [5] Node 2.1
          /          
 [4] Node 1.1.1

What minimalistic approach would you use to output that to HTML (or text, for that matter) as a correctly ordered, correctly indented tree?

您将使用什么最小化的方法将其输出到HTML(或者文本，就这个问题而言)作为一个正确的有序的、正确的缩进树?

Assume further you only have basic data structures (arrays and hashmaps), no fancy objects with parent/children references, no ORM, no framework, just your two hands. The table is represented as a result set, which can be accessed randomly.

假设您只有基本的数据结构(数组和hashmap)，没有带父/子引用的奇特对象，没有ORM，没有框架，只有双手。该表表示为结果集，可以随机访问。

Pseudo code or plain English is okay, this is purely a conceptional question.

伪代码或简单的英语是可以的，这纯粹是一个概念上的问题。

Bonus question: Is there a fundamentally better way to store a tree structure like this in a RDBMS?

额外的问题:在RDBMS中是否有一种从根本上更好的存储树结构的方法?

EDITS AND ADDITIONS

编辑和添加

To answer one commenter's (Mark Bessey's) question: A root node is not necessary, because it is never going to be displayed anyway. ParentId = 0 is the convention to express "these are top level". The Order column defines how nodes with the same parent are going to be sorted.

要回答一个评论者的问题:根本不需要根节点，因为它永远不会显示。ParentId = 0是表示“这些是*”的约定。Order列定义了具有相同父节点的节点将如何排序。

The "result set" I spoke of can be pictured as an array of hashmaps (to stay in that terminology). For my example was meant to be already there. Some answers go the extra mile and construct it first, but thats okay.

我提到的“结果集”可以被描绘成hashmaps的数组(以保留术语)。因为我的例子已经在那里了。有些答案需要付出更多的努力，并首先构建它，但这没关系。

The tree can be arbitrarily deep. Each node can have N children. I did not exactly have a "millions of entries" tree in mind, though.

树可以是任意的深。每个节点可以有N个子节点。不过，我并没有确切地想到“数以百万计的条目”。

Don't mistake my choice of node naming ('Node 1.1.1') for something to rely on. The nodes could equally well be called 'Frank' or 'Bob', no naming structure is implied, this was merely to make it readable.

不要将我所选择的节点命名(“node 1.1.1”)误认为是要依赖的东西。节点也可以被称为“Frank”或“Bob”，不包含任何命名结构，这只是为了使其可读。

I have posted my own solution so you guys can pull it to pieces.

我已经发布了我自己的解决方案，你们可以把它拆开。

14 个解决方案

#1

401

Now that MySQL 8.0 is nearing release, all popular SQL databases will support recursive queries in standard syntax.

现在MySQL 8.0即将发布，所有流行的SQL数据库都将支持标准语法中的递归查询。

WITH RECURSIVE MyTree AS (
    SELECT * FROM MyTable WHERE ParentId IS NULL
    UNION ALL
    SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;

Below is my original answer from 2008:

以下是我2008年的原始答案:

There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:

在关系数据库中存储树结构数据有几种方法。您在示例中展示的使用了两种方法:

Adjacency List (the "parent" column) and
邻接表(“父”列)和。
Path Enumeration (the dotted-numbers in your name column).
路径枚举(名称列中的点号)。

Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.

另一个解决方案称为嵌套集，它也可以存储在同一个表中。阅读Joe Celko的《SQL中的树和层次结构来获取关于这些设计的更多信息》。

I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.

我通常喜欢一种叫做闭表(即“邻接关系”)的设计来存储树结构数据。它需要另一个表，但是查询树是很容易的。

I cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP and in my book SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

在我的SQL和PHP层次化数据的表示模型中，以及在我的《SQL反模式:避免数据库编程的陷阱》一书中，我介绍了闭包表。

CREATE TABLE ClosureTable (
  ancestor_id   INT NOT NULL REFERENCES FlatTable(id),
  descendant_id INT NOT NULL REFERENCES FlatTable(id),
  PRIMARY KEY (ancestor_id, descendant_id)
);

Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question:

在闭包表中存储所有路径，其中一个节点到另一个节点有一个直接的祖先。为每个节点包含引用自己的行。例如，使用您在问题中显示的数据集:

INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
  (1,1), (1,2), (1,4), (1,6),
  (2,2), (2,4),
  (3,3), (3,5),
  (4,4),
  (5,5),
  (6,6);

Now you can get a tree starting at node 1 like this:

现在你可以从结点1开始得到树:

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;

The output (in MySQL client) looks like the following:

输出(在MySQL客户端)如下:

+----+
| id |
+----+
|  1 | 
|  2 | 
|  4 | 
|  6 | 
+----+

In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1.

换句话说，节点3和5被排除在外，因为它们是一个单独的层次结构的一部分，而不是从节点1降下来的。

Re: comment from e-satis about immediate children (or immediate parent). You can add a "path_length" column to the ClosureTable to make it easier to query specifically for an immediate child or parent (or any other distance).

回复:评论e-satis关于直系子女(或直系父母)。您可以向ClosureTable添加“path_length”列，以便更方便地查询直接的子或父(或任何其他距离)。

INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
  (1,1,0), (1,2,1), (1,4,2), (1,6,1),
  (2,2,0), (2,4,1),
  (3,3,0), (3,5,1),
  (4,4,0),
  (5,5,0),
  (6,6,0);

Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length is 1.

然后，您可以在搜索中添加一个术语，用于查询给定节点的直接子节点。这些是路径长度为1的后代。

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
  AND path_length = 1;

+----+
| id |
+----+
|  2 | 
|  6 | 
+----+

Re comment from @ashraf: "How about sorting the whole tree [by name]?"

回复@ashraf的评论:“对整棵树(按名字)排序怎么样?”

Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name, and sort by the name.

下面是一个示例查询，它返回所有节点1的后代节点，将它们连接到包含其他节点属性(如名称)的FlatTable，并按名称进行排序。

SELECT f.name
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;

Re comment from @Nate:

从@Nate再保险的评论:

SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id) 
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id) 
WHERE a.ancestor_id = 1 
GROUP BY a.descendant_id 
ORDER BY f.name

+------------+-------------+
| name       | breadcrumbs |
+------------+-------------+
| Node 1     | 1           |
| Node 1.1   | 1,2         |
| Node 1.1.1 | 1,2,4       |
| Node 1.2   | 1,6         |
+------------+-------------+

A user suggested an edit today. SO moderators approved the edit, but I am reversing it.

一位用户建议今天进行编辑。版主批准了这个编辑，但是我把它推翻了。

The edit suggested that the ORDER BY in the last query above should be ORDER BY b.path_length, f.name, presumably to make sure the ordering matches the hierarchy. But this doesn't work, because it would order "Node 1.1.1" after "Node 1.2".

编辑建议上面最后一个查询中的ORDER BY应该是b的ORDER。path_length, f.name，可能是为了确保排序匹配层次结构。但这并不奏效，因为它会将“Node 1.1.1”命令在“Node 1.2”之后。

If you want the ordering to match the hierarchy in a sensible way, that is possible, but not simply by ordering by the path length. For example, see my answer to MySQL Closure Table hierarchical database - How to pull information out in the correct order.

如果希望排序以一种合理的方式匹配层次结构，这是可能的，但不是简单地按路径长度排序。例如，请参见我对MySQL闭包表分层数据库的回答—如何以正确的顺序提取信息。

#2

If you use nested sets (sometimes referred to as Modified Pre-order Tree Traversal) you can extract the entire tree structure or any subtree within it in tree order with a single query, at the cost of inserts being more expensive, as you need to manage columns which describe an in-order path through thee tree structure.

如果你使用嵌套集(有时称为修改预订树遍历)你可以提取整个树结构或任何子树中与一个查询树的顺序,在插入的成本变得更加昂贵,你需要管理列描述一个顺序的路径通过你树结构。

For django-mptt, I used a structure like this:

对于django-mptt，我使用了如下结构:

id  parent_id  tree_id  level  lft  rght
--  ---------  -------  -----  ---  ----
 1       null        1      0    1    14
 2          1        1      1    2     7
 3          2        1      2    3     4
 4          2        1      2    5     6
 5          1        1      1    8    13
 6          5        1      2    9    10
 7          5        1      2    11   12

Which describes a tree which looks like this (with id representing each item):

它描述了这样的树(id表示每个项目):

 1
 +-- 2
 |   +-- 3
 |   +-- 4
 |
 +-- 5
     +-- 6
     +-- 7

Or, as a nested set diagram which makes it more obvious how the lft and rght values work:

或者，作为一个嵌套的集合图，使lft和rght的工作原理更加明显:

 __________________________________________________________________________
|  Root 1                                                                  |
|   ________________________________    ________________________________   |
|  |  Child 1.1                     |  |  Child 1.2                     |  |
|  |   ___________    ___________   |  |   ___________    ___________   |  |
|  |  |  C 1.1.1  |  |  C 1.1.2  |  |  |  |  C 1.2.1  |  |  C 1.2.2  |  |  |
1  2  3___________4  5___________6  7  8  9___________10 11__________12 13 14
|  |________________________________|  |________________________________|  |
|__________________________________________________________________________|

As you can see, to get the entire subtree for a given node, in tree order, you simply have to select all rows which have lft and rght values between its lft and rght values. It's also simple to retrieve the tree of ancestors for a given node.

如您所见，要按照树的顺序获得给定节点的整个子树，您只需在其lft和rght值之间选择所有具有lft和rght值的行。为给定的节点检索祖先树也很简单。

The level column is a bit of denormalisation for convenience more than anything and the tree_id column allows you to restart the lft and rght numbering for each top-level node, which reduces the number of columns affected by inserts, moves and deletions, as the lft and rght columns have to be adjusted accordingly when these operations take place in order to create or close gaps. I made some development notes at the time when I was trying to wrap my head around the queries required for each operation.

列的非标准化水平为了方便更重要的是,tree_id列允许您重新启动融通和现在每个*节点编号,这减少了受插入的列数,移动和删除,融通和现在列必须相应地调整这些操作发生时为了创建或关闭缺口。当我试图把头脑集中在每次操作所需的查询时，我做了一些开发记录。

In terms of actually working with this data to display a tree, I created a tree_item_iterator utility function which, for each node, should give you sufficient information to generate whatever kind of display you want.

就实际使用这些数据来显示树而言，我创建了一个tree_item_iterator实用函数，对于每个节点，它应该为您提供足够的信息来生成您想要的任何类型的显示。

More info about MPTT:

更多信息关于MPTT:

Trees in SQL
树木在SQL
Storing Hierarchical Data in a Database
在数据库中存储分层数据
Managing Hierarchical Data in MySQL
管理MySQL中的分层数据

#3

As of Oracle 9i, you can use CONNECT BY.

在Oracle 9i中，您可以使用CONNECT BY。

SELECT LPAD(' ', (LEVEL - 1) * 4) || "Name" AS "Name"
FROM (SELECT * FROM TMP_NODE ORDER BY "Order")
CONNECT BY PRIOR "Id" = "ParentId"
START WITH "Id" IN (SELECT "Id" FROM TMP_NODE WHERE "ParentId" = 0)

As of SQL Server 2005, you can use a recursive common table expression (CTE).

从SQL Server 2005开始，您可以使用递归公共表表达式(CTE)。

WITH [NodeList] (
  [Id]
  , [ParentId]
  , [Level]
  , [Order]
) AS (
  SELECT [Node].[Id]
    , [Node].[ParentId]
    , 0 AS [Level]
    , CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
  FROM [Node]
  WHERE [Node].[ParentId] = 0
  UNION ALL
  SELECT [Node].[Id]
    , [Node].[ParentId]
    , [NodeList].[Level] + 1 AS [Level]
    , [NodeList].[Order] + '|'
      + CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
  FROM [Node]
    INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[ParentId]
) SELECT REPLICATE(' ', [NodeList].[Level] * 4) + [Node].[Name] AS [Name]
FROM [Node]
  INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[Id]
ORDER BY [NodeList].[Order]

Both will output the following results.

两者都将输出以下结果。

Name
'Node 1'
'    Node 1.1'
'        Node 1.1.1'
'    Node 1.2'
'Node 2'
'    Node 2.1'

#4

It's a quite old question, but as it's got many views I think it's worth to present an alternative, and in my opinion very elegant, solution.

这是一个非常古老的问题，但由于有很多观点，我认为有必要提出一个替代方案，在我看来，非常优雅的解决方案。

In order to read a tree structure you can use recursive Common Table Expressions (CTEs). It gives a possibility to fetch whole tree structure at once, have the information about the level of the node, its parent node and order within children of the parent node.

为了读取树结构，可以使用递归公共表表达式(CTEs)。它提供了一次获取整个树结构的可能性，具有节点的级别、其父节点和父节点子节点的顺序的信息。

Let me show you how this would work in PostgreSQL 9.1.

让我在PostgreSQL 9.1中向您展示这是如何工作的。

Create a structure

创建一个结构

CREATE TABLE tree (
    id int  NOT NULL,
    name varchar(32)  NOT NULL,
    parent_id int  NULL,
    node_order int  NOT NULL,
    CONSTRAINT tree_pk PRIMARY KEY (id),
    CONSTRAINT tree_tree_fk FOREIGN KEY (parent_id) 
      REFERENCES tree (id) NOT DEFERRABLE
);


insert into tree values
  (0, 'ROOT', NULL, 0),
  (1, 'Node 1', 0, 10),
  (2, 'Node 1.1', 1, 10),
  (3, 'Node 2', 0, 20),
  (4, 'Node 1.1.1', 2, 10),
  (5, 'Node 2.1', 3, 10),
  (6, 'Node 1.2', 1, 20);

Write a query

写一个查询
```
WITH RECURSIVE 
tree_search (id, name, level, parent_id, node_order) AS (
  SELECT 
    id, 
    name,
    0,
    parent_id, 
    1 
  FROM tree
  WHERE parent_id is NULL

  UNION ALL 
  SELECT 
    t.id, 
    t.name,
    ts.level + 1, 
    ts.id, 
    t.node_order 
  FROM tree t, tree_search ts 
  WHERE t.parent_id = ts.id 
) 
SELECT * FROM tree_search 
WHERE level > 0 
ORDER BY level, parent_id, node_order;
```
Here are the results:

这里是结果:
```
 id |    name    | level | parent_id | node_order 
----+------------+-------+-----------+------------
  1 | Node 1     |     1 |         0 |         10
  3 | Node 2     |     1 |         0 |         20
  2 | Node 1.1   |     2 |         1 |         10
  6 | Node 1.2   |     2 |         1 |         20
  5 | Node 2.1   |     2 |         3 |         10
  4 | Node 1.1.1 |     3 |         2 |         10
(6 rows)
```
The tree nodes are ordered by a level of depth. In the final output we would present them in the subsequent lines.

树节点按深度级别排序。在最终输出中，我们将在后面的行中显示它们。

For each level, they are ordered by parent_id and node_order within the parent. This tells us how to present them in the output - link node to the parent in this order.

对于每个级别，它们都由父级中的parent_id和node_order排序。这告诉我们如何在输出链接节点中按此顺序将它们呈现给父节点。

Having such a structure it wouldn't be difficult to make a really nice presentation in HTML.

有了这样的结构，用HTML制作一个漂亮的表示就不难了。

Recursive CTEs are available in PostgreSQL, IBM DB2, MS SQL Server and Oracle.

在PostgreSQL、IBM DB2、MS SQL Server和Oracle中都可以使用递归cte。

If you'd like to read more on recursive SQL queries, you can either check the documentation of your favourite DBMS or read my two articles covering this topic:

如果您想阅读更多关于递归SQL查询的信息，您可以查看您最喜欢的DBMS的文档，也可以阅读我关于这个主题的两篇文章:
- Do It In SQL: Recursive Tree Traversal
- 用SQL:递归树遍历算
- Get to know the power of SQL recursive queries
- 了解SQL递归查询的威力。

#5

Bill's answer is pretty gosh-darned good, this answer adds some things to it which makes me wish SO supported threaded answers.

比尔的回答非常好，这个回答增加了一些东西，这让我希望得到如此支持的线程式回答。

Anyway I wanted to support the tree structure and the Order property. I included a single property in each Node called leftSibling that does the same thing Order is meant to do in the original question (maintain left-to-right order).

总之，我想支持树结构和Order属性。我在每个节点中都包含了一个名为leftSibling的属性，该属性执行与原始问题相同的顺序(保持从左到右的顺序)。

mysql> desc nodes ;
+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| name        | varchar(255) | YES  |     | NULL    |                |
| leftSibling | int(11)      | NO   |     | 0       |                |
+-------------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> desc adjacencies;
+------------+---------+------+-----+---------+----------------+
| Field      | Type    | Null | Key | Default | Extra          |
+------------+---------+------+-----+---------+----------------+
| relationId | int(11) | NO   | PRI | NULL    | auto_increment |
| parent     | int(11) | NO   |     | NULL    |                |
| child      | int(11) | NO   |     | NULL    |                |
| pathLen    | int(11) | NO   |     | NULL    |                |
+------------+---------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

More detail and SQL code on my blog.

更多细节和SQL代码在我的博客上。

Thanks Bill your answer was helpful in getting started!

谢谢比尔，你的回答对开始很有帮助!

#6

Well given the choice, I'd be using objects. I'd create an object for each record where each object has a children collection and store them all in an assoc array (/hashtable) where the Id is the key. And blitz through the collection once, adding the children to the relevant children fields. Simple.

如果有选择的话，我会用对象。我将为每个记录创建一个对象，其中每个对象都有一个子集合，并将它们存储在一个assoc数组(/hashtable)中，其中Id是键。并快速浏览一次收藏，将孩子添加到相关的儿童领域。简单。

But because you're being no fun by restricting use of some good OOP, I'd probably iterate based on:

但是，因为限制使用一些好的OOP并不是一件有趣的事情，我可能会基于以下几点进行迭代:

function PrintLine(int pID, int level)
    foreach record where ParentID == pID
        print level*tabs + record-data
        PrintLine(record.ID, level + 1)

PrintLine(0, 0)

Edit: this is similar to a couple of other entries, but I think it's slightly cleaner. One thing I'll add: this is extremely SQL-intensive. It's nasty. If you have the choice, go the OOP route.

编辑:这与其他一些条目相似，但我认为它更简洁一些。我要补充一点:这是非常密集的sql。这是令人讨厌的。如果你有选择，走OOP路线。

#7

This was written quickly, and is neither pretty nor efficient (plus it autoboxes alot, converting between int and Integer is annoying!), but it works.

这写得很快，既不漂亮也不高效(加上它的autobox alot，在整数和整数之间转换很烦人!)，但它确实有效。

It probably breaks the rules since I'm creating my own objects but hey I'm doing this as a diversion from real work :)

它可能违反了规则，因为我正在创建我自己的对象，但嘿，我这么做是为了转移我的注意力，不去做真正的工作:)

This also assumes that the resultSet/table is completely read into some sort of structure before you start building Nodes, which wouldn't be the best solution if you have hundreds of thousands of rows.

这还假定，在开始构建节点之前，resultSet/表已经被完全读入某种结构，如果有数十万行，这将不是最佳的解决方案。

public class Node {

    private Node parent = null;

    private List<Node> children;

    private String name;

    private int id = -1;

    public Node(Node parent, int id, String name) {
        this.parent = parent;
        this.children = new ArrayList<Node>();
        this.name = name;
        this.id = id;
    }

    public int getId() {
        return this.id;
    }

    public String getName() {
        return this.name;
    }

    public void addChild(Node child) {
        children.add(child);
    }

    public List<Node> getChildren() {
        return children;
    }

    public boolean isRoot() {
        return (this.parent == null);
    }

    @Override
    public String toString() {
        return "id=" + id + ", name=" + name + ", parent=" + parent;
    }
}

public class NodeBuilder {

    public static Node build(List<Map<String, String>> input) {

        // maps id of a node to it's Node object
        Map<Integer, Node> nodeMap = new HashMap<Integer, Node>();

        // maps id of a node to the id of it's parent
        Map<Integer, Integer> childParentMap = new HashMap<Integer, Integer>();

        // create special 'root' Node with id=0
        Node root = new Node(null, 0, "root");
        nodeMap.put(root.getId(), root);

        // iterate thru the input
        for (Map<String, String> map : input) {

            // expect each Map to have keys for "id", "name", "parent" ... a
            // real implementation would read from a SQL object or resultset
            int id = Integer.parseInt(map.get("id"));
            String name = map.get("name");
            int parent = Integer.parseInt(map.get("parent"));

            Node node = new Node(null, id, name);
            nodeMap.put(id, node);

            childParentMap.put(id, parent);
        }

        // now that each Node is created, setup the child-parent relationships
        for (Map.Entry<Integer, Integer> entry : childParentMap.entrySet()) {
            int nodeId = entry.getKey();
            int parentId = entry.getValue();

            Node child = nodeMap.get(nodeId);
            Node parent = nodeMap.get(parentId);
            parent.addChild(child);
        }

        return root;
    }
}

public class NodePrinter {

    static void printRootNode(Node root) {
        printNodes(root, 0);
    }

    static void printNodes(Node node, int indentLevel) {

        printNode(node, indentLevel);
        // recurse
        for (Node child : node.getChildren()) {
            printNodes(child, indentLevel + 1);
        }
    }

    static void printNode(Node node, int indentLevel) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < indentLevel; i++) {
            sb.append("\t");
        }
        sb.append(node);

        System.out.println(sb.toString());
    }

    public static void main(String[] args) {

        // setup dummy data
        List<Map<String, String>> resultSet = new ArrayList<Map<String, String>>();
        resultSet.add(newMap("1", "Node 1", "0"));
        resultSet.add(newMap("2", "Node 1.1", "1"));
        resultSet.add(newMap("3", "Node 2", "0"));
        resultSet.add(newMap("4", "Node 1.1.1", "2"));
        resultSet.add(newMap("5", "Node 2.1", "3"));
        resultSet.add(newMap("6", "Node 1.2", "1"));

        Node root = NodeBuilder.build(resultSet);
        printRootNode(root);

    }

    //convenience method for creating our dummy data
    private static Map<String, String> newMap(String id, String name, String parentId) {
        Map<String, String> row = new HashMap<String, String>();
        row.put("id", id);
        row.put("name", name);
        row.put("parent", parentId);
        return row;
    }
}

#8

Assuming that you know that the root elements are zero, here's the pseudocode to output to text:

假设您知道根元素为零，下面是输出到文本的伪代码:

function PrintLevel (int curr, int level)
    //print the indents
    for (i=1; i<=level; i++)
        print a tab
    print curr \n;
    for each child in the table with a parent of curr
        PrintLevel (child, level+1)


for each elementID where the parentid is zero
    PrintLevel(elementID, 0)

#9

You can emulate any other data structure with a hashmap, so that's not a terrible limitation. Scanning from the top to the bottom, you create a hashmap for each row of the database, with an entry for each column. Add each of these hashmaps to a "master" hashmap, keyed on the id. If any node has a "parent" that you haven't seen yet, create an placeholder entry for it in the master hashmap, and fill it in when you see the actual node.

您可以使用hashmap来模拟任何其他数据结构，因此这并不是一个可怕的限制。从顶部到底部扫描，您将为数据库的每一行创建一个hashmap，并为每个列创建一个条目。将这些hashmap添加到一个按id键的“master”hashmap中，如果任何节点有一个尚未见过的“parent”，则在master hashmap中为其创建一个占位符条目，并在看到实际节点时将其填充。

To print it out, do a simple depth-first pass through the data, keeping track of indent level along the way. You can make this easier by keeping a "children" entry for each row, and populating it as you scan the data.

要打印出来，请执行简单的深度优先传递数据，并在此过程中跟踪缩进级别。您可以为每一行保留一个“子”条目，并在扫描数据时填充它，从而使这变得更容易。

As for whether there's a "better" way to store a tree in a database, that depends on how you're going to use the data. I've seen systems that had a known maximum depth that used a different table for each level in the hierarchy. That makes a lot of sense if the levels in the tree aren't quite equivalent after all (top level categories being different than the leaves).

至于是否有“更好”的方式将树存储在数据库中，这取决于您将如何使用数据。我曾见过一些系统，它们具有已知的最大深度，在层次结构中的每个级别都使用不同的表。如果树中的层次并不完全相等(顶层的类别与叶子不同)，这就很有意义。

#10

There are really good solutions which exploit the internal btree representation of sql indices. This is based on some great research done back around 1998.

有一些很好的解决方案可以利用sql索引的内部btree表示。这是基于1998年的一些伟大研究。

Here is an example table (in mysql).

下面是一个示例表(在mysql中)。

CREATE TABLE `node` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `tw` int(10) unsigned NOT NULL,
  `pa` int(10) unsigned DEFAULT NULL,
  `sz` int(10) unsigned DEFAULT NULL,
  `nc` int(11) GENERATED ALWAYS AS (tw+sz) STORED,
  PRIMARY KEY (`id`),
  KEY `node_tw_index` (`tw`),
  KEY `node_pa_index` (`pa`),
  KEY `node_nc_index` (`nc`),
  CONSTRAINT `node_pa_fk` FOREIGN KEY (`pa`) REFERENCES `node` (`tw`) ON DELETE CASCADE
)

The only fields necessary for the tree representation are:

树表示所必需的唯一字段是:

tw: The Left to Right DFS Pre-order index, where root = 1.
tw:从左到右的DFS预处理索引，其中root = 1。
pa: The reference (using tw) to the parent node, root has null.
pa:对父节点的引用(使用tw)， root具有null。
sz: The size of the node's branch including itself.
sz:节点的分支的大小，包括它自己。
nc: is used as syntactic sugar. it is tw+nc and represents the tw of the node's "next child".
nc:用作句法糖。它是tw+nc，表示节点“下一个子节点”的tw。

Here is an example 24 node population, ordered by tw:

下面是一个由tw排序的24节点总体示例:

+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|   1 | Root    |  1 | NULL |   24 |   25 |
|   2 | A       |  2 |    1 |   14 |   16 |
|   3 | AA      |  3 |    2 |    1 |    4 |
|   4 | AB      |  4 |    2 |    7 |   11 |
|   5 | ABA     |  5 |    4 |    1 |    6 |
|   6 | ABB     |  6 |    4 |    3 |    9 |
|   7 | ABBA    |  7 |    6 |    1 |    8 |
|   8 | ABBB    |  8 |    6 |    1 |    9 |
|   9 | ABC     |  9 |    4 |    2 |   11 |
|  10 | ABCD    | 10 |    9 |    1 |   11 |
|  11 | AC      | 11 |    2 |    4 |   15 |
|  12 | ACA     | 12 |   11 |    2 |   14 |
|  13 | ACAA    | 13 |   12 |    1 |   14 |
|  14 | ACB     | 14 |   11 |    1 |   15 |
|  15 | AD      | 15 |    2 |    1 |   16 |
|  16 | B       | 16 |    1 |    1 |   17 |
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
|  18 | D       | 23 |    1 |    1 |   24 |
|  19 | E       | 24 |    1 |    1 |   25 |
+-----+---------+----+------+------+------+

Every tree result can be done non-recursively. For instance, to get a list of ancestors of node at tw='22'

每个树结果都可以非递归地完成。例如，要获取在tw='22'中节点的祖先列表

Ancestors

的祖先

select anc.* from node me,node anc 
where me.tw=22 and anc.nc >= me.tw and anc.tw <= me.tw 
order by anc.tw;
+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|   1 | Root    |  1 | NULL |   24 |   25 |
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
+-----+---------+----+------+------+------+

Siblings and children are trivial - just use pa field ordering by tw.

兄弟姐妹和孩子是琐碎的-只是使用tw的pa字段排序。

Descendants

的后代

For example the set (branch) of nodes that are rooted at tw = 17.

例如，植根于tw = 17的节点集(分支)。

select des.* from node me,node des 
where me.tw=17 and des.tw < me.nc and des.tw >= me.tw 
order by des.tw;
+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
+-----+---------+----+------+------+------+

Additional Notes

额外的笔记

This methodology is extremely useful for when there are a far greater number of reads than there are inserts or updates.

当读取的数量远远超过插入或更新的数量时，这种方法非常有用。

Because the insertion, movement, or updating of a node in the tree requires the tree to be adjusted, it is necessary to lock the table before commencing with the action.

因为插入、移动或更新树中的节点需要调整树，所以必须在开始操作之前锁定表。

The insertion/deletion cost is high because the tw index and sz (branch size) values will need to be updated on all the nodes after the insertion point, and for all ancestors respectively.

插入/删除成本很高，因为tw索引和sz(分支大小)值需要在插入点之后的所有节点上更新，对于所有祖先节点也需要更新。

Branch moving involves moving the tw value of the branch out of range, so it is also necessary to disable foreign key constraints when moving a branch. There are, essentially four queries required to move a branch:

分支移动涉及将分支的tw值移动到范围之外，因此在移动分支时也需要禁用外键约束。移动一个分支需要四个查询:

Move the branch out of range.
把树枝移到范围之外。
Close the gap that it left. (the remaining tree is now normalised).
关闭它留下的空隙。(剩下的树现在是正常的)。
Open the gap where it will go to.
打开缺口。
Move the branch into it's new position.
将分支移动到它的新位置。

Adjust Tree Queries

调整树的查询

The opening/closing of gaps in the tree is an important sub-function used by create/update/delete methods, so I include it here.

在树中打开/关闭间隙是create/update/delete方法使用的一个重要子函数，因此我在这里包含它。

We need two parameters - a flag representing whether or not we are downsizing or upsizing, and the node's tw index. So, for example tw=18 (which has a branch size of 5). Let's assume that we are downsizing (removing tw) - this means that we are using '-' instead of '+' in the updates of the following example.

我们需要两个参数——表示我们是否在缩减规模或升级规模的标志，以及节点的tw索引。例如tw=18(它的分支大小为5)，让我们假设我们正在缩减规模(删除tw)——这意味着我们在以下示例的更新中使用'-'而不是'+'。

We first use a (slightly altered) ancestor function to update the sz value.

我们首先使用(稍微修改过的)祖先函数来更新sz值。

update node me, node anc set anc.sz = anc.sz - me.sz from 
node me, node anc where me.tw=18 
and ((anc.nc >= me.tw and anc.tw < me.pa) or (anc.tw=me.pa));

Then we need to adjust the tw for those whose tw is higher than the branch to be removed.

然后我们需要调整tw，以适应那些tw高于要移除的分支的情况。

update node me, node anc set anc.tw = anc.tw - me.sz from 
node me, node anc where me.tw=18 and anc.tw >= me.tw;

Then we need to adjust the parent for those whose pa's tw is higher than the branch to be removed.

然后我们需要为那些pa的tw大于要删除的分支调整父节点。

update node me, node anc set anc.pa = anc.pa - me.sz from 
node me, node anc where me.tw=18 and anc.pa >= me.tw;

#11

If nested hash maps or arrays can be created, then I can simply go down the table from the beginning and add each item to the nested array. I must trace each line to the root node in order to know which level in the nested array to insert into. I can employ memoization so that I do not need to look up the same parent over and over again.

如果可以创建嵌套的散列映射或数组，那么我可以简单地从一开始就从表中删除，并将每个项添加到嵌套的数组中。我必须跟踪每一行到根节点，以便知道要插入到嵌套数组中的哪个级别。我可以使用记忆，这样我就不需要一遍又一遍地查找同一个父类。

Edit: I would read the entire table into an array first, so it will not query the DB repeatedly. Of course this won't be practical if your table is very large.

编辑:我先将整个表读入一个数组，这样它就不会重复地查询DB。当然，如果你的桌子很大的话，这是不现实的。

After the structure is built, I must do a depth first traverse through it and print out the HTML.

在构建结构之后，我必须首先对其进行深度遍历并打印出HTML。

There's no better fundamental way to store this information using one table (I could be wrong though ;), and would love to see a better solution ). However, if you create a scheme to employ dynamically created db tables, then you opened up a whole new world at the sacrifice of simplicity, and the risk of SQL hell ;).

没有更好的基本方法来使用一个表来存储这些信息(我可能是错的;)，并且希望看到更好的解决方案)。但是，如果您创建了一个使用动态创建的db表的方案，那么您将以牺牲简单性和SQL hell的风险为代价打开一个全新的世界;)

#12

If elements are in tree order, as shown in your example, you can use something like the following Python example:

如果元素按照树的顺序排列，如您的示例所示，您可以使用以下Python示例:

delimiter = '.'
stack = []
for item in items:
  while stack and not item.startswith(stack[-1]+delimiter):
    print "</div>"
    stack.pop()
  print "<div>"
  print item
  stack.append(item)

What this does is maintain a stack representing the current position in the tree. For each element in the table, it pops stack elements (closing the matching divs) until it finds the parent of the current item. Then it outputs the start of that node and pushes it to the stack.

这样做的目的是维护表示树中当前位置的堆栈。对于表中的每个元素，它都会弹出堆栈元素(关闭匹配的div)，直到找到当前项的父元素。然后输出该节点的开始并将其推入堆栈。

If you want to output the tree using indenting rather than nested elements, you can simply skip the print statements to print the divs, and print a number of spaces equal to some multiple of the size of the stack before each item. For example, in Python:

如果您想使用缩进而不是嵌套元素来输出树，您可以简单地跳过打印语句来打印div，并在每个条目之前打印一些等于堆栈大小的倍数的空格。例如,在Python中:

print "  " * len(stack)

You could also easily use this method to construct a set of nested lists or dictionaries.

您也可以很容易地使用此方法来构造一组嵌套列表或字典。

Edit: I see from your clarification that the names were not intended to be node paths. That suggests an alternate approach:

编辑:我从您的澄清中看到，这些名称并不打算是节点路径。这意味着另一种方法:

idx = {}
idx[0] = []
for node in results:
  child_list = []
  idx[node.Id] = child_list
  idx[node.ParentId].append((node, child_list))

This constructs a tree of arrays of tuples(!). idx[0] represents the root(s) of the tree. Each element in an array is a 2-tuple consisting of the node itself and a list of all its children. Once constructed, you can hold on to idx[0] and discard idx, unless you want to access nodes by their ID.

这构造了一个元组数组(!)的树。idx[0]表示树的根。数组中的每个元素都是一个2元组，由节点本身和所有子元素的列表组成。一旦构造好，您可以保留idx[0]并丢弃idx，除非您希望通过它们的ID访问节点。

#13

To Extend Bill's SQL solution you can basically do the same using a flat array. Further more if your strings all have the same lenght and your maximum number of children are known (say in a binary tree) you can do it using a single string (character array). If you have arbitrary number of children this complicates things a bit... I would have to check my old notes to see what can be done.

要扩展Bill的SQL解决方案，您基本上可以使用平面数组进行相同的操作。此外，如果您的字符串都具有相同的lenght，并且您的最大子代数已知(比如在二叉树中)，您可以使用单个字符串(字符数组)来完成。如果你有任意数量的孩子这就有点复杂了…我得查一下我的旧笔记，看看能做些什么。

Then, sacrificing a bit of memory, especially if your tree is sparse and/or unballanced, you can, with a bit of index math, access all the strings randomly by storing your tree, width first in the array like so (for a binary tree):

然后，牺牲一点内存，特别是如果你的树是稀疏的和/或不受限制的，你可以，用一些索引数学，通过存储你的树来随机访问所有的字符串，宽度首先在数组中像这样(对于二叉树):

String[] nodeArray = [L0root, L1child1, L1child2, L2Child1, L2Child2, L2Child3, L2Child4] ...

yo know your string length, you know it

你知道你的线绳长度，你知道

I'm at work now so cannot spend much time on it but with interest I can fetch a bit of code to do this.

我现在正在工作，所以不能在这上面花太多时间，但是有兴趣的话，我可以取一些代码来做这个。

We use to do it to search in binary trees made of DNA codons, a process built the tree, then we flattened it to search text patterns and when found, though index math (revers from above) we get the node back... very fast and efficient, tough our tree rarely had empty nodes, but we could searh gigabytes of data in a jiffy.

我们用它来搜索由DNA密码子组成的二叉树，一个构建树的过程，然后我们将它变平以搜索文本模式，当找到时，通过索引数学(从上面的revers)我们得到了节点……我们的树很少有空节点，但是我们可以在短时间内搜索千兆字节的数据。

#14

Think about using nosql tools like neo4j for hierarchial structures. e.g a networked application like linkedin uses couchbase (another nosql solution)

考虑使用nosql工具，比如neo4j来实现层次结构。e。g像linkedin这样的网络应用使用couchbase(另一个nosql解决方案)

But use nosql only for data-mart level queries and not to store / maintain transactions

但是只对数据集市级别的查询使用nosql，而不存储/维护事务

#1

401