提高修改的预订树遍历算法的可扩展性

时间:2022-09-09 16:39:34

I've been thinking about the modified preorder tree traversal algorithm for storing trees within a flat table (such as SQL).

我一直在考虑修改的预订树遍历算法,用于在平面表(例如SQL)中存储树。

One property I dislike about the standard approach is that to insert a node you have to touch (on average) N/2 of the nodes (everything with left or right higher than the insert point).

我不喜欢标准方法的一个属性是插入一个节点,你必须触摸(平均)N / 2个节点(左边或右边高于插入点的所有东西)。

The implementations I've seen rely on sequentially numbered values. This leaves no room for updates.

我见过的实现依赖于顺序编号的值。这没有留下更新的余地。

This seems bad for concurrency and scaling. Imagine you have a tree rooted at the world containing user groups for every account in a large system, it's extremely large, to the point you must store subsets of the tree on different servers. Touching half of all the nodes to add a node to the bottom of the tree is bad.

这对于并发和扩展似乎很糟糕。想象一下,你有一棵植根于世界的树,它包含大型系统中每个帐户的用户组,它非常大,你必须将树的子集存储在不同的服务器上。触摸所有节点的一半以将节点添加到树的底部是不好的。

Here is the idea I was considering. Basically leave room for inserts by partitioning the keyspace and dividing at each level.

这是我正在考虑的想法。基本上通过划分键空间并在每个级别划分来为插入留出空间。

Here's an example with Nmax = 64 (this would normally be the MAX_INT of your DB)

这是Nmax = 64的示例(这通常是数据库的MAX_INT)

                     0:64
              ________|________
             /                 \
         1:31                   32:63
        /    \                 /     \
    2:14    15-30         33:47       48:62

Here, a node is added to the left half of the tree.

这里,节点被添加到树的左半部分。

                     0:64  
              ________|________
             /                 \
         1:31                  32:63
      /   |   \               /     \
  2:11  11:20  21:30     33:47       48:62

The alogorithm must be extended for the insert and removal process to recursively renumber to the left/right indexes for the subtree. Since querying for immediate children of a node is complicated, I think it makes sense to also store the parent id in the table. The algorithm can then select the sub tree (using left > p.left && right < p.right), then use node.id and node.parent to work through the list, subdividing the indexes.

必须扩展算法以进行插入和删除过程,以递归重新编号为子树的左/右索引。由于查询节点的直接子节点很复杂,我认为将父节点ID存储在表中也是有意义的。然后,算法可以选择子树(使用left> p.left && right

),然后使用node.id和node.parent来处理列表,细分索引。

This is more complex than just incrementing all the indexes to make room for the insert (or decrementing for removal), but it has the potential to affect far fewer nodes (only decendenants of the parent of the inserted/removed node).

这比仅增加所有索引以便为插入腾出空间(或减少删除)更复杂,但它有可能影响更少的节点(仅插入/删除节点的父节点的后代)。

My question(s) are basically:

我的问题基本上是:

  1. Has this idea been formalized or implemented?

    这个想法是否正式化或实施?

  2. Is this the same as nested intervals?

    这与嵌套间隔相同吗?

3 个解决方案

#1


5  

I have heard of people doing this before, for the same reasons, yes.

我之前听说有人这样做,出于同样的原因,是的。

Note that you do lose at a couple of small advantages of the algorithm by doing this

请注意,通过执行此操作,您确实会失去算法的一些小优势

  • normally, you can tell the number of descendants of a node by ((right - left + 1) div 2). This can occasionally be useful, if e.g. you'd displaying a count in a treeview which should include the number of children to be found further down in the tree
  • 通常,您可以通过((右 - 左+ 1)div 2)来判断节点的后代数。如果例如,这有时可能是有用的。您将在树视图中显示一个计数,其中应包括要在树中找到的子项数

  • Flowing from the above, it's easy to select out all leaf nodes -- WHERE (right = left + 1).
  • 从上面开始,很容易选择所有叶节点 - WHERE(右=左+ 1)。

These are fairly minor advantages and may not be useful to you anyway, though for some usage patterns they're obviously handy.

这些都是相当小的优点,无论如何可能对你没用,但对于某些使用模式,它们显然很方便。

That said, it does sound like materialized paths may be more useful to you, as suggested above.

也就是说,听起来像物化路径可能对您更有用,如上所述。

#2


2  

I think you're better off looking at a different way of storing trees. If your tree is broad but not terribly deep (which seems likely for the case you suggested), you can store the complete list of ancestors up to the root against each node. That way, modifying a node doesn't require touching any nodes other than the node being modified.

我认为你最好看一下不同的树木储存方式。如果您的树很宽但不是非常深(这似乎可能是您建议的情况),您可以将完整的祖先列表存储到每个节点的根目录。这样,修改节点不需要触摸除被修改节点之外的任何节点。

#3


1  

You can split your table into two: the first is (node ID, node value), the second (node ID, child ID), which stores all the edges of the tree. Insertion and deletion then become O(tree depth) (you have to navigate to the element and fix what is below it).

您可以将表拆分为两个:第一个是(节点ID,节点值),第二个(节点ID,子ID),它存储树的所有边。插入和删除然后变为O(树深度)(您必须导航到元素并修复其下面的内容)。

The solution you propose looks like a B-tree. If you can estimate the total number of nodes in your tree, then you can choose the depth of the tree beforehand.

您提出的解决方案看起来像B树。如果可以估计树中的节点总数,则可以预先选择树的深度。

#1


5  

I have heard of people doing this before, for the same reasons, yes.

我之前听说有人这样做,出于同样的原因,是的。

Note that you do lose at a couple of small advantages of the algorithm by doing this

请注意,通过执行此操作,您确实会失去算法的一些小优势

  • normally, you can tell the number of descendants of a node by ((right - left + 1) div 2). This can occasionally be useful, if e.g. you'd displaying a count in a treeview which should include the number of children to be found further down in the tree
  • 通常,您可以通过((右 - 左+ 1)div 2)来判断节点的后代数。如果例如,这有时可能是有用的。您将在树视图中显示一个计数,其中应包括要在树中找到的子项数

  • Flowing from the above, it's easy to select out all leaf nodes -- WHERE (right = left + 1).
  • 从上面开始,很容易选择所有叶节点 - WHERE(右=左+ 1)。

These are fairly minor advantages and may not be useful to you anyway, though for some usage patterns they're obviously handy.

这些都是相当小的优点,无论如何可能对你没用,但对于某些使用模式,它们显然很方便。

That said, it does sound like materialized paths may be more useful to you, as suggested above.

也就是说,听起来像物化路径可能对您更有用,如上所述。

#2


2  

I think you're better off looking at a different way of storing trees. If your tree is broad but not terribly deep (which seems likely for the case you suggested), you can store the complete list of ancestors up to the root against each node. That way, modifying a node doesn't require touching any nodes other than the node being modified.

我认为你最好看一下不同的树木储存方式。如果您的树很宽但不是非常深(这似乎可能是您建议的情况),您可以将完整的祖先列表存储到每个节点的根目录。这样,修改节点不需要触摸除被修改节点之外的任何节点。

#3


1  

You can split your table into two: the first is (node ID, node value), the second (node ID, child ID), which stores all the edges of the tree. Insertion and deletion then become O(tree depth) (you have to navigate to the element and fix what is below it).

您可以将表拆分为两个:第一个是(节点ID,节点值),第二个(节点ID,子ID),它存储树的所有边。插入和删除然后变为O(树深度)(您必须导航到元素并修复其下面的内容)。

The solution you propose looks like a B-tree. If you can estimate the total number of nodes in your tree, then you can choose the depth of the tree beforehand.

您提出的解决方案看起来像B树。如果可以估计树中的节点总数,则可以预先选择树的深度。