What is your preferred method of traversing a tree data structure, since recursive method calls can be pretty inefficient in some circumstances. I am simply using a generator like the one above. Do you have any hints to make it faster?
遍历树数据结构的首选方法是什么,因为在某些情况下递归方法调用可能效率很低。我只是使用像上面那样的发电机。你有什么提示让它更快吗?
def children(self):
stack = [self.entities]
while stack:
for e in stack.pop():
yield e
if e.entities:
stack.append(e.entities)
Here is some test data. The first one is recursive, the second uses the generator:
这是一些测试数据。第一个是递归的,第二个使用生成器:
s = time.time()
for i in range(100000):
e.inc_counter()
print time.time() - s
s = time.time()
for i in range(100000):
for e in e.children():
e.inc_counter_s()
print time.time() - s
Results:
0.416000127792
0.298999786377
Test code:
import random
class Entity():
def __init__(self, name):
self.entities = []
self.name = name
self.counter = 1
self.depth = 0
def add_entity(self, e):
e.depth = self.depth + 1
self.entities.append(e)
def inc_counter_r(self):
for e in self.entities:
e.counter += 1
e.inc_counter_r()
def children(self):
stack = [self.entities]
while stack:
for e in stack.pop():
yield e
if e.entities:
stack.append(e.entities)
root = Entity("main")
def fill_node(root, max_depth):
if root.depth <= max_depth:
for i in range(random.randint(10, 15)):
e = Entity("node_%s_%s" % (root.depth, i))
root.add_entity(e)
fill_node(e, max_depth)
fill_node(root, 3)
import time
s = time.time()
for i in range(100):
root.inc_counter_r()
print "recursive:", time.time() - s
s = time.time()
for i in range(100):
for e in root.children():
e.counter += 1
print "generator:", time.time() - s
8 个解决方案
#1
5
I can't think of any big algorithmic improvements, but a simple microoptimisation you can make is to bind frequently called methods (such as stack.append / stack.pop) to locals (this saves a dictionary lookup)
我无法想到任何大的算法改进,但是你可以做的一个简单的微优化是将频繁调用的方法(例如stack.append / stack.pop)绑定到本地(这节省了字典查找)
def children(self):
stack = [self.entities]
push = stack.append
pop = stack.pop
while stack:
for e in pop():
yield e
if e.entities:
push(e.entities)
This gives a small (~15%) speedup by my tests (using 100 traversals of an 8-deep tree with 4 children at each node gives me the below timings:)
这通过我的测试给出了一个小的(~15%)加速(使用100个遍历的8个深度树,每个节点有4个孩子给我下面的时间:)
children : 5.53942348004
children_bind: 4.77636131253
Not huge, but worth doing if speed is important.
不是很大,但如果速度很重要,那就值得做。
#2
5
Unless your tree is really large or you have really high (real) requirements for speed, I would choose the recursive method. Easier to read, easier to code.
除非你的树真的很大或你对速度有很高的(实际)要求,否则我会选择递归方法。更易于阅读,更易于编码。
#3
4
I'm not sure if you can reduce the overhead much on a full in-order traversal of a tree, if you use recursion the call stack will grow some, otherwise you must manually use a stack to push references of the children while visiting each node. Which way is fastest and uses less memory, depends on the expensiveness of the call stack vs. a normal stack. (I would guess the callstack is faster since it should be optimized for its use, and recursion is much easier to implement)
我不确定你是否可以在树的完整有序遍历上减少开销,如果你使用递归调用堆栈会增长一些,否则你必须手动使用堆栈来推送孩子们的参考节点。哪种方式最快并且使用更少的内存,取决于调用堆栈与普通堆栈的昂贵程度。 (我猜测callstack更快,因为它应该针对其使用进行优化,并且递归更容易实现)
If you don't care about the order you visit the nodes, some implementations of trees is actually stored in a dynamic array or linked list or stack wich you can traverse linearly if you don't care about the order it's traversed.
如果您不关心访问节点的顺序,树的某些实现实际上存储在动态数组或链表或堆栈中,如果您不关心它遍历的顺序,您可以线性遍历。
But why is it important to have a fast traversal anyway? Trees are good for searching, arrays/linked lists is good for full traversal. If you often need full in-order traversal but few searches and insertions/deletions, an ordered linked list might be best, if searching is what you do most you use a tree. If the data is really massive, so that memory overhead may render recursion impossible, you should use a database.
但是为什么快速遍历很重要呢?树很适合搜索,数组/链表有利于完全遍历。如果您经常需要完整的有序遍历,但搜索和插入/删除很少,那么有序链表可能是最好的,如果搜索是您最常使用的树。如果数据非常庞大,那么内存开销可能导致递归变得不可能,那么您应该使用数据库。
#4
4
Recursive function calls are not incredibly inefficient, that is an old programming myth. (If they're badly implemented, they may incur a larger overhead than necessary, but calling them "incredibly inefficient" is plain wrong.)
递归函数调用并不是非常低效,这是一个古老的编程神话。 (如果它们执行得很糟糕,它们可能会产生比必要的更大的开销,但称它们“非常低效”是完全错误的。)
Remember: don't optimize prematurely, and never optimize without benchmarking first.
请记住:不要过早优化,不要在没有基准测试的情况下进行优化。
#5
3
If you have a lot of RAM and the tree doesn't change often, you can cache the result of the call:
如果你有很多RAM并且树不经常更改,你可以缓存调用的结果:
def children(self):
if self._children_cache is not None:
return self._children_cache
# Put your code into collectChildren()
self._children_cache = self.collectChildren()
return self._children_cache
Whenever the tree changes, set the cache to None. In this case, using recursive calls might be more effective since the results will accumulate faster.
每当树发生更改时,将缓存设置为“无”。在这种情况下,使用递归调用可能更有效,因为结果将更快地累积。
#6
1
I've written iterative tree-traversal code in the past: it's very ugly, and not fast, unless you know exactly how many children not only each subtree will have, but how many levels there are.
我过去编写了迭代树遍历代码:它非常难看,而且速度不快,除非你确切地知道每个子树不仅有多少个子,而且知道有多少个级别。
#7
0
I don't know too much about Python internals of function calls, but I really can't imagine that your code snippet is faster than recursively traversing the tree.
我不太了解函数调用的Python内部,但我真的无法想象你的代码片段比递归遍历树更快。
The call stack (used for function calls, including recursive ones) is typically very fast. Going to the next object will only cost you a single function call. But in your snippet - where you use a stack object, going to the next object will cost you a stack.append (possibly allocating memory on heap), a stack.push (possibly freeing memory from heap), and a yield.
调用堆栈(用于函数调用,包括递归调用)通常非常快。转到下一个对象只需要花费一个函数调用。但是在你的代码片段中 - 你使用堆栈对象,转到下一个对象将花费你stack.append(可能在堆上分配内存),stack.push(可能从堆中释放内存)和yield。
The main problem with recursive calls is that you might blow the stack if your tree gets too deep. This isn't likely to happen.
递归调用的主要问题是如果树太深,你可能会破坏堆栈。这不太可能发生。
#8
0
Here's a pair of small corrections.
这是一对小修正。
def children(self):
stack = [self.entities]
for e in stack:
yield e
if e.entities:
stack.extend(e.entities)
I actually think the generator, using append, isn't visiting all the nodes. I think you mean to extend
the stack with all entities, not append
a simple list of entities to the stack.
我实际上认为使用append的生成器不会访问所有节点。我认为你的意思是扩展堆栈与所有实体,而不是将简单的实体列表附加到堆栈。
Also, when the for
loop terminates, the while
loop in your original example will also terminate because there's no change to the empty stack after the for
loop.
此外,当for循环终止时,原始示例中的while循环也将终止,因为for循环后空堆栈没有变化。
#1
5
I can't think of any big algorithmic improvements, but a simple microoptimisation you can make is to bind frequently called methods (such as stack.append / stack.pop) to locals (this saves a dictionary lookup)
我无法想到任何大的算法改进,但是你可以做的一个简单的微优化是将频繁调用的方法(例如stack.append / stack.pop)绑定到本地(这节省了字典查找)
def children(self):
stack = [self.entities]
push = stack.append
pop = stack.pop
while stack:
for e in pop():
yield e
if e.entities:
push(e.entities)
This gives a small (~15%) speedup by my tests (using 100 traversals of an 8-deep tree with 4 children at each node gives me the below timings:)
这通过我的测试给出了一个小的(~15%)加速(使用100个遍历的8个深度树,每个节点有4个孩子给我下面的时间:)
children : 5.53942348004
children_bind: 4.77636131253
Not huge, but worth doing if speed is important.
不是很大,但如果速度很重要,那就值得做。
#2
5
Unless your tree is really large or you have really high (real) requirements for speed, I would choose the recursive method. Easier to read, easier to code.
除非你的树真的很大或你对速度有很高的(实际)要求,否则我会选择递归方法。更易于阅读,更易于编码。
#3
4
I'm not sure if you can reduce the overhead much on a full in-order traversal of a tree, if you use recursion the call stack will grow some, otherwise you must manually use a stack to push references of the children while visiting each node. Which way is fastest and uses less memory, depends on the expensiveness of the call stack vs. a normal stack. (I would guess the callstack is faster since it should be optimized for its use, and recursion is much easier to implement)
我不确定你是否可以在树的完整有序遍历上减少开销,如果你使用递归调用堆栈会增长一些,否则你必须手动使用堆栈来推送孩子们的参考节点。哪种方式最快并且使用更少的内存,取决于调用堆栈与普通堆栈的昂贵程度。 (我猜测callstack更快,因为它应该针对其使用进行优化,并且递归更容易实现)
If you don't care about the order you visit the nodes, some implementations of trees is actually stored in a dynamic array or linked list or stack wich you can traverse linearly if you don't care about the order it's traversed.
如果您不关心访问节点的顺序,树的某些实现实际上存储在动态数组或链表或堆栈中,如果您不关心它遍历的顺序,您可以线性遍历。
But why is it important to have a fast traversal anyway? Trees are good for searching, arrays/linked lists is good for full traversal. If you often need full in-order traversal but few searches and insertions/deletions, an ordered linked list might be best, if searching is what you do most you use a tree. If the data is really massive, so that memory overhead may render recursion impossible, you should use a database.
但是为什么快速遍历很重要呢?树很适合搜索,数组/链表有利于完全遍历。如果您经常需要完整的有序遍历,但搜索和插入/删除很少,那么有序链表可能是最好的,如果搜索是您最常使用的树。如果数据非常庞大,那么内存开销可能导致递归变得不可能,那么您应该使用数据库。
#4
4
Recursive function calls are not incredibly inefficient, that is an old programming myth. (If they're badly implemented, they may incur a larger overhead than necessary, but calling them "incredibly inefficient" is plain wrong.)
递归函数调用并不是非常低效,这是一个古老的编程神话。 (如果它们执行得很糟糕,它们可能会产生比必要的更大的开销,但称它们“非常低效”是完全错误的。)
Remember: don't optimize prematurely, and never optimize without benchmarking first.
请记住:不要过早优化,不要在没有基准测试的情况下进行优化。
#5
3
If you have a lot of RAM and the tree doesn't change often, you can cache the result of the call:
如果你有很多RAM并且树不经常更改,你可以缓存调用的结果:
def children(self):
if self._children_cache is not None:
return self._children_cache
# Put your code into collectChildren()
self._children_cache = self.collectChildren()
return self._children_cache
Whenever the tree changes, set the cache to None. In this case, using recursive calls might be more effective since the results will accumulate faster.
每当树发生更改时,将缓存设置为“无”。在这种情况下,使用递归调用可能更有效,因为结果将更快地累积。
#6
1
I've written iterative tree-traversal code in the past: it's very ugly, and not fast, unless you know exactly how many children not only each subtree will have, but how many levels there are.
我过去编写了迭代树遍历代码:它非常难看,而且速度不快,除非你确切地知道每个子树不仅有多少个子,而且知道有多少个级别。
#7
0
I don't know too much about Python internals of function calls, but I really can't imagine that your code snippet is faster than recursively traversing the tree.
我不太了解函数调用的Python内部,但我真的无法想象你的代码片段比递归遍历树更快。
The call stack (used for function calls, including recursive ones) is typically very fast. Going to the next object will only cost you a single function call. But in your snippet - where you use a stack object, going to the next object will cost you a stack.append (possibly allocating memory on heap), a stack.push (possibly freeing memory from heap), and a yield.
调用堆栈(用于函数调用,包括递归调用)通常非常快。转到下一个对象只需要花费一个函数调用。但是在你的代码片段中 - 你使用堆栈对象,转到下一个对象将花费你stack.append(可能在堆上分配内存),stack.push(可能从堆中释放内存)和yield。
The main problem with recursive calls is that you might blow the stack if your tree gets too deep. This isn't likely to happen.
递归调用的主要问题是如果树太深,你可能会破坏堆栈。这不太可能发生。
#8
0
Here's a pair of small corrections.
这是一对小修正。
def children(self):
stack = [self.entities]
for e in stack:
yield e
if e.entities:
stack.extend(e.entities)
I actually think the generator, using append, isn't visiting all the nodes. I think you mean to extend
the stack with all entities, not append
a simple list of entities to the stack.
我实际上认为使用append的生成器不会访问所有节点。我认为你的意思是扩展堆栈与所有实体,而不是将简单的实体列表附加到堆栈。
Also, when the for
loop terminates, the while
loop in your original example will also terminate because there's no change to the empty stack after the for
loop.
此外,当for循环终止时,原始示例中的while循环也将终止,因为for循环后空堆栈没有变化。