是否有可能延迟地遍历O(1)内存使用、尾部调用优化的递归数据结构?

时间:2022-02-17 17:04:25

Let's say that we have a recursive data-structure, like a binary tree. There are many ways to traverse it, and they have different memory-usage profiles. For instance, if we were to simply print the value of each node, using pseudo-code like the following in-order traversal...

假设我们有一个递归的数据结构,比如二叉树。有很多遍历它的方法,它们有不同的内存使用概要。例如,如果我们只是简单地打印每个节点的值,使用伪代码,比如下面的顺序遍历…

visitNode(node) {
    if (node == null) return;
    visitNode(node.leftChild);
    print(node.value);
    visitNode(node.rightChild);
}

...our memory usage would be constant, but due to the recursive calls, we would increase the size of the call stack. On very large trees, this could potentially overflow it.

…我们的内存使用将是常量,但是由于递归调用,我们将增加调用堆栈的大小。在非常大的树上,这可能会溢出。

Let's say that we decided to optimize for call-stack size; assuming that this language is capable of proper tailcalls, we could rewrite this as the following pre-order traversal...

假设我们决定优化调用堆栈大小;假设这种语言能够进行适当的尾调用,我们可以将其重写为以下顺序遍历…

visitNode(node, nodes = []) {
    if (node != null) {
        print(node.value);
        visitNode(nodes.head, nodes.tail + [node.left, node.right]);
    } else if (node == null && nodes.length != 0 ) {
        visitNode(nodes.head, nodes.tail);
    } else return;
}

While we would never blow the stack, we would now see heap usage increase linearly with respect to the size of the tree.

虽然我们不会破坏堆栈,但是我们现在会看到堆使用随着树的大小线性增加。

Let's say we were then to attempt to lazily traverse the tree - here is where my reasoning gets fuzzy. I think that even using a basic lazy evaluation strategy, we would grow memory at the same rate as the tailcall optimized version. Here is a concrete example using Scala's Stream class, which provides lazy evaluation:

假设我们试图惰性地遍历这棵树——这就是我的推理变得模糊的地方。我认为,即使使用基本的惰性评估策略,我们也会以与尾调优化版本相同的速度增长内存。下面是一个使用Scala的流类的具体示例,它提供了延迟计算:

sealed abstract class Node[A] {
  def toStream: Stream[Node[A]]
  def value: A
}

case class Fork[A](value: A, left: Node[A], right: Node[A]) extends Node[A] {
  def toStream: Stream[Node[A]] = this #:: left.toStream.append(right.toStream)
}

case class Leaf[A](value: A) extends Node[A] {
  def toStream: Stream[Node[A]] = this #:: Stream.empty
}

Although only the head of the stream is strictly evaluated, anytime the left.toStream.append(right.toStream) is evaluated, I think this would actually evaluate the head of both the left and right streams. Even if it doesn't (due to append cleverness), I think that recursively building this thunk (to borrow a term from Haskell) would essentially grow memory at the same rate. Rather than saying, "put this node in the list of nodes to traverse", we're basically saying, "here's another value to evaluate that will tell you what to traverse next", but the outcome is the same; linear memory growth.

虽然严格地只计算流的头部,但是无论何时计算左。tostream .append(右。tostream)时,我认为这实际上会同时计算左和右流的头部。即使它没有(由于附加的聪明),我认为递归地构建这个thunk(借用Haskell的一个术语)本质上将以同样的速度增长内存。与其说“将这个节点放在要遍历的节点列表中”,不如说“这是另一个要计算的值,它会告诉你接下来要遍历什么”,但结果是一样的;线性内存增长。

The only strategy I can think of that would avoid this is having mutable state in each node declaring which paths have been traversed. This would allow us to have a referentially transparent function that says, "Given a node, I will tell you which single node you should traverse next", and we could use that to build an O(1) iterator.

我能想到的唯一可以避免这种情况的策略是在每个节点中声明已遍历的路径都具有可变状态。这将允许我们有一个引用透明的函数,它说,“给定一个节点,我将告诉您接下来应该遍历哪个节点”,我们可以使用它来构建一个O(1)迭代器。

Is there another way to accomplish O(1), tailcall optimized traversal of a binary tree, possibly without mutable state?

有没有另一种方法来完成O(1),尾部调用优化遍历一个二叉树,可能没有可变状态?

4 个解决方案

#1


11  

Is there another way to accomplish O(1), tailcall optimized traversal of a binary tree, possibly without mutable state?

有没有另一种方法来完成O(1),尾部调用优化遍历一个二叉树,可能没有可变状态?

As I stated in my comment, you can do this if the tree need not survive the traversal. Here's a Haskell example:

正如我在注释中所指出的,如果树在遍历过程中不能存活,您可以这样做。这是一个Haskell的例子:

data T = Leaf | Node T Int T

inOrder :: T -> [Int]
inOrder Leaf                     =  []
inOrder (Node Leaf x r)          =  x : inOrder r
inOrder (Node (Node l x m) y r)  =  inOrder $ Node l x (Node m y r)

This takes O(1) auxiliary space if we assume the garbage collector will clean up any Node that we just processed, so we effectively replace it by a right-rotated version. However, if the nodes we process cannot immediately be garbage-collected, then the final clause may build up an O(n) number of nodes before it hits a leaf.

如果我们假设垃圾收集器将清理我们刚刚处理的任何节点,那么这将占用O(1)辅助空间,因此我们可以用右旋转的版本替换它。但是,如果我们处理的节点不能立即被垃圾收集,那么最后的子句可以在到达叶子之前构建O(n)个节点。

If you have parent pointers, then it's also doable. Parent pointers require mutable state, though, and prevent sharing of subtrees, so they're not really functional. If you represent an iterator by a pair (cur, prev) that is initially (root, nil), then you can perform iteration as outlined here. You need a language with pointer comparisons to make this work, though.

如果你有父指针,那么它也是可行的。但是,父指针需要可变的状态,并且防止子树的共享,所以它们不是真正的功能。如果您用一对(cur, prev)表示一个迭代器,它最初是(root, nil),那么您可以执行这里列出的迭代。不过,您需要一种具有指针比较的语言来实现这一点。

Without parent pointers and mutable state, you need to maintain some data structure that at least tracks where the root of the tree is and how to get there, since you'll need such a structure at some point during in-order or post-order traversal. Such a structure necessarily takes Ω(d) space where d is the depth of the tree.

在没有父指针和可变状态的情况下,您需要维护一些数据结构,这些数据结构至少可以跟踪树的根在哪里,以及如何到达那里,因为在顺序遍历或后顺序遍历的过程中,您将需要这样的结构。这种结构必然需要Ω(d)空间,d是树的深度。

#2


8  

A fancy answer.

一个漂亮的答案。

We can use free monads to get efficient memory utilization bound.

我们可以使用免费的monads来获得有效的内存利用边界。

    {-# LANGUAGE RankNTypes
               , MultiParamTypeClasses
               , FlexibleInstances
               , UndecidableInstances #-}

    class Algebra f x where
      phi :: f x -> x

A algebra of a functor f is a function phi from f x to x for some x. For example, any monad has a algebra for any object m x:

一个函数f的代数是一个函数从f x到x的函数,例如,任何一元函数都有一个关于任何对象m x的代数:

    instance (Monad m) => Algebra m (m x) where
      phi = join

A free monad for any functor f can be constructed (possibly, some sort of functors only, like omega-cocomplete, or some such; but all Haskell types are polynomial functors, which are omega-cocomplete, so the statement is certainly true for all Haskell functors):

可以构造任意f函子的一个*单链元(可能只有某种函子,如-cocomplete等;但是所有Haskell类型都是多项式函子,它们是-cocomplete,所以这个命题对所有Haskell函子都是成立的):

    data Free f a = Free (forall x. Algebra f x => (a -> x) -> x)
    runFree g (Free m) = m g

    instance Functor (Free f) where
      fmap f m = Free $ \g -> runFree (g . f) m

    wrap :: (Functor f) => f (Free f a) -> Free f a
    wrap f = Free $ \g -> phi $ fmap (runFree g) f

    instance (Functor f) => Algebra f (Free f a) where
      phi = wrap

    instance (Functor f) => Monad (Free f) where
      return a = Free ($ a)
      m >>= f = fjoin $ fmap f m

    fjoin :: (Functor f) => Free f (Free f a) -> Free f a
    fjoin mma = Free $ \g -> runFree (runFree g) mma

Now we can use Free to construct free monad for functor T a:

现在我们可以用Free构造函数T a的Free monad:

    data T a b = T a b b
    instance Functor (T a) where
      fmap f (T a l r) = T a (f l) (f r)

For this functor we can define a algebra for object [a]

对于这个函数,我们可以为对象定义一个代数[a]

    instance Algebra (T a) [a] where
      phi (T a l r) = l++(a:r)

A tree is a free monad over functor T a:

一棵树是一种*的monad,它的作用是:

    type Tree a = Free (T a) ()

It can be constructed using the following functions (if defined as ADT, they'd be constructor names, so nothing extraordinary):

它可以使用以下函数来构造(如果定义为ADT,它们就是构造函数名,所以没什么特别的):

    tree :: a -> Tree a -> Tree a -> Tree a
    tree a l r = phi $ T a l r -- phi here is for Algebra f (Free f a)
    -- and translates T a (Tree a) into Tree a

    leaf :: Tree a
    leaf = return ()

To demonstrate how this works:

为了演示这是如何工作的:

    bar = tree 'a' (tree 'b' leaf leaf) $ tree 'r' leaf leaf
    buz = tree 'b' leaf $ tree 'u' leaf $ tree 'z' leaf leaf
    foo = tree 'f' leaf $ tree 'o' (tree 'o' leaf leaf) leaf

    toString = runFree (\_ -> [] :: String)

    main = print $ map toString [bar, buz, foo]

As runFree traverses the tree to replace leaf () with [], the algebra for T a [a] in all contexts is the algebra that constructs a string representing in-order traversal of the tree. Because functor T a b constructs a new tree as it goes, it must have the same memory consumption characteristics as the solution quoted by larsmans - if the tree is not kept in memory, the nodes are discarded as soon as they are replaced by the string representing the whole subtree.

当runFree遍历树以替换leaf()时,所有上下文中的T a [a]的代数都是构造一个表示树的有序遍历的字符串的代数。因为函子T b构造一个新的树就其本身而言,它必须有相同的内存消耗特征解援引larsmans——如果这棵树不是保存在内存中,节点丢弃就取代了字符串代表整个子树。

#3


1  

Given that you have references to nodes' parents, there's a nice solution posted here. Replace the while loop with a tail-recursive call (passing in last and current and that should do it.

假设您有对节点的父节点的引用,这里有一个很好的解决方案。用尾部递归调用替换while循环(传入最后一个循环和当前循环,应该这样做。

The built-in back-references allow you to keep track of traversal ordering. Without these, I can't think of a way to do it on a (balanced) tree with less than O(log(n)) auxiliary space.

内置的反向引用允许您跟踪遍历排序。如果没有这些,我就想不出在一个(平衡的)树上使用小于O(log(n))的辅助空间的方法。

#4


0  

I was not able to find an answer but I got some pointers. Go have a look at http://www.ics.uci.edu/~dan/pub.html, scroll down to

我找不到答案,但我得到了一些建议。去看看http://www.ics.uci.edu/~dan/pub.html,向下滚动到

[33] D.S. Hirschberg and S.S. Seiden, A bounded-space tree traversal algorithm, Information Processing Letters 47 (1993)

D.S. Hirschberg和s . Seiden,一个有界空间树遍历算法,信息处理字母47 (1993)

Download the postscript file, you may need to convert it to PDF (my ps viewer was unable to present it correctly). It mentions on page 2 (Table 1) a number of algorithms and additional literature.

下载postscript文件,您可能需要将它转换为PDF(我的ps查看器无法正确显示它)。它在第2页(表1)中提到了一些算法和其他文献。

#1


11  

Is there another way to accomplish O(1), tailcall optimized traversal of a binary tree, possibly without mutable state?

有没有另一种方法来完成O(1),尾部调用优化遍历一个二叉树,可能没有可变状态?

As I stated in my comment, you can do this if the tree need not survive the traversal. Here's a Haskell example:

正如我在注释中所指出的,如果树在遍历过程中不能存活,您可以这样做。这是一个Haskell的例子:

data T = Leaf | Node T Int T

inOrder :: T -> [Int]
inOrder Leaf                     =  []
inOrder (Node Leaf x r)          =  x : inOrder r
inOrder (Node (Node l x m) y r)  =  inOrder $ Node l x (Node m y r)

This takes O(1) auxiliary space if we assume the garbage collector will clean up any Node that we just processed, so we effectively replace it by a right-rotated version. However, if the nodes we process cannot immediately be garbage-collected, then the final clause may build up an O(n) number of nodes before it hits a leaf.

如果我们假设垃圾收集器将清理我们刚刚处理的任何节点,那么这将占用O(1)辅助空间,因此我们可以用右旋转的版本替换它。但是,如果我们处理的节点不能立即被垃圾收集,那么最后的子句可以在到达叶子之前构建O(n)个节点。

If you have parent pointers, then it's also doable. Parent pointers require mutable state, though, and prevent sharing of subtrees, so they're not really functional. If you represent an iterator by a pair (cur, prev) that is initially (root, nil), then you can perform iteration as outlined here. You need a language with pointer comparisons to make this work, though.

如果你有父指针,那么它也是可行的。但是,父指针需要可变的状态,并且防止子树的共享,所以它们不是真正的功能。如果您用一对(cur, prev)表示一个迭代器,它最初是(root, nil),那么您可以执行这里列出的迭代。不过,您需要一种具有指针比较的语言来实现这一点。

Without parent pointers and mutable state, you need to maintain some data structure that at least tracks where the root of the tree is and how to get there, since you'll need such a structure at some point during in-order or post-order traversal. Such a structure necessarily takes Ω(d) space where d is the depth of the tree.

在没有父指针和可变状态的情况下,您需要维护一些数据结构,这些数据结构至少可以跟踪树的根在哪里,以及如何到达那里,因为在顺序遍历或后顺序遍历的过程中,您将需要这样的结构。这种结构必然需要Ω(d)空间,d是树的深度。

#2


8  

A fancy answer.

一个漂亮的答案。

We can use free monads to get efficient memory utilization bound.

我们可以使用免费的monads来获得有效的内存利用边界。

    {-# LANGUAGE RankNTypes
               , MultiParamTypeClasses
               , FlexibleInstances
               , UndecidableInstances #-}

    class Algebra f x where
      phi :: f x -> x

A algebra of a functor f is a function phi from f x to x for some x. For example, any monad has a algebra for any object m x:

一个函数f的代数是一个函数从f x到x的函数,例如,任何一元函数都有一个关于任何对象m x的代数:

    instance (Monad m) => Algebra m (m x) where
      phi = join

A free monad for any functor f can be constructed (possibly, some sort of functors only, like omega-cocomplete, or some such; but all Haskell types are polynomial functors, which are omega-cocomplete, so the statement is certainly true for all Haskell functors):

可以构造任意f函子的一个*单链元(可能只有某种函子,如-cocomplete等;但是所有Haskell类型都是多项式函子,它们是-cocomplete,所以这个命题对所有Haskell函子都是成立的):

    data Free f a = Free (forall x. Algebra f x => (a -> x) -> x)
    runFree g (Free m) = m g

    instance Functor (Free f) where
      fmap f m = Free $ \g -> runFree (g . f) m

    wrap :: (Functor f) => f (Free f a) -> Free f a
    wrap f = Free $ \g -> phi $ fmap (runFree g) f

    instance (Functor f) => Algebra f (Free f a) where
      phi = wrap

    instance (Functor f) => Monad (Free f) where
      return a = Free ($ a)
      m >>= f = fjoin $ fmap f m

    fjoin :: (Functor f) => Free f (Free f a) -> Free f a
    fjoin mma = Free $ \g -> runFree (runFree g) mma

Now we can use Free to construct free monad for functor T a:

现在我们可以用Free构造函数T a的Free monad:

    data T a b = T a b b
    instance Functor (T a) where
      fmap f (T a l r) = T a (f l) (f r)

For this functor we can define a algebra for object [a]

对于这个函数,我们可以为对象定义一个代数[a]

    instance Algebra (T a) [a] where
      phi (T a l r) = l++(a:r)

A tree is a free monad over functor T a:

一棵树是一种*的monad,它的作用是:

    type Tree a = Free (T a) ()

It can be constructed using the following functions (if defined as ADT, they'd be constructor names, so nothing extraordinary):

它可以使用以下函数来构造(如果定义为ADT,它们就是构造函数名,所以没什么特别的):

    tree :: a -> Tree a -> Tree a -> Tree a
    tree a l r = phi $ T a l r -- phi here is for Algebra f (Free f a)
    -- and translates T a (Tree a) into Tree a

    leaf :: Tree a
    leaf = return ()

To demonstrate how this works:

为了演示这是如何工作的:

    bar = tree 'a' (tree 'b' leaf leaf) $ tree 'r' leaf leaf
    buz = tree 'b' leaf $ tree 'u' leaf $ tree 'z' leaf leaf
    foo = tree 'f' leaf $ tree 'o' (tree 'o' leaf leaf) leaf

    toString = runFree (\_ -> [] :: String)

    main = print $ map toString [bar, buz, foo]

As runFree traverses the tree to replace leaf () with [], the algebra for T a [a] in all contexts is the algebra that constructs a string representing in-order traversal of the tree. Because functor T a b constructs a new tree as it goes, it must have the same memory consumption characteristics as the solution quoted by larsmans - if the tree is not kept in memory, the nodes are discarded as soon as they are replaced by the string representing the whole subtree.

当runFree遍历树以替换leaf()时,所有上下文中的T a [a]的代数都是构造一个表示树的有序遍历的字符串的代数。因为函子T b构造一个新的树就其本身而言,它必须有相同的内存消耗特征解援引larsmans——如果这棵树不是保存在内存中,节点丢弃就取代了字符串代表整个子树。

#3


1  

Given that you have references to nodes' parents, there's a nice solution posted here. Replace the while loop with a tail-recursive call (passing in last and current and that should do it.

假设您有对节点的父节点的引用,这里有一个很好的解决方案。用尾部递归调用替换while循环(传入最后一个循环和当前循环,应该这样做。

The built-in back-references allow you to keep track of traversal ordering. Without these, I can't think of a way to do it on a (balanced) tree with less than O(log(n)) auxiliary space.

内置的反向引用允许您跟踪遍历排序。如果没有这些,我就想不出在一个(平衡的)树上使用小于O(log(n))的辅助空间的方法。

#4


0  

I was not able to find an answer but I got some pointers. Go have a look at http://www.ics.uci.edu/~dan/pub.html, scroll down to

我找不到答案,但我得到了一些建议。去看看http://www.ics.uci.edu/~dan/pub.html,向下滚动到

[33] D.S. Hirschberg and S.S. Seiden, A bounded-space tree traversal algorithm, Information Processing Letters 47 (1993)

D.S. Hirschberg和s . Seiden,一个有界空间树遍历算法,信息处理字母47 (1993)

Download the postscript file, you may need to convert it to PDF (my ps viewer was unable to present it correctly). It mentions on page 2 (Table 1) a number of algorithms and additional literature.

下载postscript文件,您可能需要将它转换为PDF(我的ps查看器无法正确显示它)。它在第2页(表1)中提到了一些算法和其他文献。