为什么LinkedList通常比列表慢？

I started using some LinkedList’s instead of Lists in some of my C# algorithms hoping to speed them up. However, I noticed that they just felt slower. Like any good developer, I figured that I should do due diligence and verify my feelings. So I decided to benchmark some simple loops.

我开始在我的一些C＃算法中使用一些LinkedList而不是Lists来希望加速它们。但是，我注意到他们感觉速度慢了。像任何优秀的开发人员一样，我认为我应该尽职尽责并验证我的感受。所以我决定对一些简单循环进行基准测试

I thought that populating the collections with some random integers should be sufficient. I ran this code in Debug mode to avoid any compiler optimizations. Here is the code that I used:

我认为用一些随机整数填充集合应该足够了。我在调试模式下运行此代码以避免任何编译器优化。这是我使用的代码：

var rand = new Random(Environment.TickCount);
var ll = new LinkedList<int>();
var list = new List<int>();
int count = 20000000;

BenchmarkTimer.Start("Linked List Insert");
for (int x = 0; x < count; ++x)
  ll.AddFirst(rand.Next(int.MaxValue));
BenchmarkTimer.StopAndOutput();

BenchmarkTimer.Start("List Insert");
for (int x = 0; x < count; ++x)
  list.Add(rand.Next(int.MaxValue));
BenchmarkTimer.StopAndOutput();

int y = 0;
BenchmarkTimer.Start("Linked List Iterate");
foreach (var i in ll)
  ++y; //some atomic operation;
BenchmarkTimer.StopAndOutput();

int z = 0;
BenchmarkTimer.Start("List Iterate");
foreach (var i in list)
  ++z; //some atomic operation;
BenchmarkTimer.StopAndOutput();

Here is output:

这是输出：

Linked List Insert: 8959.808 ms
List Insert: 845.856 ms
Linked List Iterate: 203.632 ms
List Iterate: 125.312 ms

This result baffled me. A Linked List insert should be O(1) whereas as List Insert is Θ(1), O(n) (because of copy) if it needs to be resized. Both list iterations should be O(1) because of the enumerator. I looked at the disassembled output and it doesn’t shed much light on the situation.

这个结果让我感到困惑。链接列表插入应为O（1），而列表插入为Θ（1），O（n）（因为复制）如果需要调整大小。由于枚举器，两个列表迭代都应该是O（1）。我查看了拆卸后的输出，并没有对这种情况有所了解。

Anyone else have any thoughts on why this is? Did I miss something glaringly obvious?

其他人对这是为什么有任何想法？我错过了一些明显的东西吗？

Note: here is the source for the simple BenchmarkTimer class: http://procbits.com/2010/08/25/benchmarking-c-apps-algorithms/

注意：以下是简单BenchmarkTimer类的源代码：http：//procbits.com/2010/08/25/benchmarking-c-apps-algorithms/

6 个解决方案

#1

Update (in response to your comment): you're right, discussing big-O notation by itself is not exactly useful. I included a link to James's answer in my original response because he already offered a good explanation of the technical reasons why List<T> outperforms LinkedList<T> in general.

更新（回应你的评论）：你是对的，讨论big-O符号本身并不完全有用。我在原始回复中包含了詹姆斯答案的链接，因为他已经提供了一个很好的解释为什么List 一般优于LinkedList 的技术原因。

Basically, it's a matter of memory allocation and locality. When all of your collection's elements are stored in an array internally (as is the case with List<T>), it's all in one contiguous block of memory which can be accessed very quickly. This applies both to adding (as this simply writes to a location within the already-allocated array) as well as iterating (as this accesses many memory locations that are very close together rather than having to follow pointers to completely disconnected memory locations).

基本上，这是内存分配和位置的问题。当你的所有集合元素都在内部存储在一个数组中时（就像List 的情况一样），它们都在一个连续的内存块中，可以非常快速地访问。这既适用于添加（因为它只是写入已经分配的数组中的位置）以及迭代（因为它访问非常靠近的许多内存位置，而不必遵循指向完全断开的内存位置的指针）。

A LinkedList<T> is a specialized collection, which only outshines List<T> in the case where you are performing random insertions or removals from the middle of the list—and even then, only maybe.

LinkedList 是一个专门的集合，在你从列表中间执行随机插入或删除的情况下，它只会优于List - 即便如此，也许只有。

As for the question of scaling: you're right, if big-O notation is all about how well an operation scales, then an O(1) operation should eventually beat out an O(>1) operation given a large enough input—which is obviously what you were going for with 20 million iterations.

关于缩放的问题：你是对的，如果大O符号完全是关于操作的扩展程度，那么在给定足够大的输入的情况下，O（1）操作最终应该击败O（> 1）操作 - 这显然是你想要的2000万次迭代。

This is why I mentioned that List<T>.Add has an amortized complexity of O(1). That means adding to a list is also an operation that scales linearly with the size of the input, the same (effectively) as with a linked list. Forget about the fact that occasionally the list has to resize itself (this is where the "amortized" comes in; I encourage you to visit that Wikipedia article if you haven't already). They scale the same.

这就是我提到List .Add具有O（1）的摊销复杂度的原因。这意味着添加到列表中的操作也是与输入大小成线性比例的操作，与链接列表相同（有效）。忘记这个事实，偶尔列表必须调整自己（这是“摊销”的地方;我鼓励你访问*的文章，如果你还没有）。他们的规模相同。

Now, interestingly, and perhaps counter-intuitively, this means that if anything, the performance difference between List<T> and LinkedList<T> (again, when it comes to adding) actually becomes more obvious as the number of elements increases. The reason is that when the list runs out of space in its internal array, it doubles the size of the array; and thus with more and more elements, the frequency of resizing operations decreases—to the point where the array is basically never resizing.

现在，有趣的是，也许是反直觉的，这意味着如果有的话，List 和LinkedList 之间的性能差异（再次，当涉及到添加时）实际上随着元素数量的增加而变得更加明显。原因是当列表在其内部数组中耗尽空间时，它会使数组的大小加倍;因此，随着越来越多的元素，调整大小操作的频率降低到基本上从不调整大小的程度。

So let's say a List<T> starts with an internal array large enough to hold 4 elements (I believe that's accurate, though I don't remember for sure). Then as you add up to 20 million elements, it resizes itself a total of ~(log₂(20000000) - 1) or 23 times. Compare this to the 20 million times you're performing the considerably less efficient AddLast on a LinkedList<T>, which allocates a new LinkedListNode<T> with every call, and those 23 resizes suddenly seem pretty insignificant.

所以我们假设一个List 以一个足以容纳4个元素的内部数组开始（我相信这是准确的，尽管我不记得确定）。然后，当您添加多达2000万个元素时，它会自行调整大约〜（log2（20000000） - 1）或23次。将此与您在LinkedList 上执行效率相当低的AddLast的2000万次进行比较，后者在每次调用时都会分配一个新的LinkedListNode ，这23个调整大小突然显得非常微不足道。

I hope this helps! If I haven't been clear on any points, let me know and I will do my best to clarify and/or correct myself.

我希望这有帮助！如果我对任何要点都不清楚，请告诉我，我会尽力澄清和/或纠正自己。

James is right on.

詹姆斯是对的。

Remember that big-O notation is meant to give you an idea of how the performance of an algorithm scales. It does not mean that something that performs in guaranteed O(1) time will outperform something else that performs in amortized O(1) time (as is the case with List<T>).

请记住，big-O表示法旨在让您了解算法的性能如何扩展。这并不意味着在保证的O（1）时间内执行的某些操作将胜过在分摊的O（1）时间内执行的其他操作（如List 的情况）。

Suppose you have a choice of two jobs, one of which requires a commute 5 miles down a road that occasionally suffers from traffic jams. Ordinarily this drive should take you about 10 minutes, but on a bad day it could be more like 30 minutes. The other job is 60 miles away but the highway is always clear and never has any traffic jams. This drive always takes you an hour.

假设您可以选择两个工作，其中一个工作需要在一条路上行驶5英里，偶尔会遇到交通拥堵。通常这个驱动器应该花费大约10分钟，但在糟糕的一天它可能更像是30分钟。另一项工作是60英里远，但高速公路总是很清晰，从来没有任何交通拥堵。这个驱动器总是需要一个小时。

That's basically the situation with List<T> and LinkedList<T> for purposes of adding to the end of the list.

这基本上是List 和LinkedList 的情况，目的是添加到列表的末尾。

#2

Keep in mind that you've got lists of primitives. For List this is very simple because it creates a whole array of int and it's very easy for it to shift these down when it doesn't have to allocate more memory.

请记住，你已经有了基元列表。对于List，这非常简单，因为它创建了一个完整的int数组，当它不必分配更多内存时，它很容易将它们向下移动。

Contrast this to a LinkedList that always must allocate memory to wrap the ints. Thus I think the memory allocation is probably what's contributing the most to your time. If you already had the node allocated, it should be faster overall. I'd try an experiment with the overload of AddFirst that takes a LinkedListNode to verify (that is, create the LinkedListNode outside of the scope of the timer, just time the add of it).

将此与一个始终必须分配内存以包装整数的LinkedList进行对比。因此，我认为内存分配可能对您的时间贡献最大。如果您已经分配了节点，那么整体应该更快。我尝试使用AddFirst的重载进行实验，该实验使得LinkedListNode进行验证（即，创建在定时器范围之外的LinkedListNode，只需添加它的时间）。

Iterating is similar, it's much more efficient to go to the next index in an internal array than to follow links.

迭代是类似的，转到内部数组中的下一个索引比跟踪链接要高效得多。

#3

As James stated in his answer, memory allocation is probably one cause why the LinkedList is slower.

正如詹姆斯在他的回答中所说，内存分配可能是导致LinkedList变慢的一个原因。

Additionally I believe the major difference originates from an invalid test. You are adding items to the beginning of the linked list, but to the end of the ordinary list. Wouldn't adding items to the beginning of the ordinary list shift the benchmarking results in favor of the LinkedList again?

此外，我认为主要区别来自无效测试。您将项目添加到链接列表的开头，但是添加到普通列表的末尾。不会在普通列表的开头添加项目会使基准测试结果再次转向支持LinkedList吗？

#4

I highly recommend the article Number crunching: why you should never use a linked-list again. There isn't much there that isn't anywhere else, but I spent quite a bit of time trying to figure out why LinkedList<T> was so much slower than List<T> in situations I thought would obviously favor the linked list before I found it, and after looking it over, things made a bit more sense:

我强烈推荐文章数字运算：为什么你不应该再次使用链表。没有其他地方没有多少，但我花了相当多的时间试图弄清楚为什么LinkedList 比List 慢得多，在我认为显然会偏爱链表的情况下我找到了它，看了之后，事情变得更有意义了：

The linked list has items in disjoint areas of memory, and as a result, one could say it is cache line hostile, because it maximizes cache misses. The disjoint memory makes traversing the list result in frequent and costly unexpected RAM lookups.

链接列表具有不相交的内存区域中的项目，因此，可以说它是高速缓存线恶意，因为它最大化了高速缓存未命中。不相交的内存使得遍历列表导致频繁且昂贵的意外RAM查找。

A vector [equivalent to ArrayList or List<T>] on other hand has its items stored in adjacent memory, and in so doing, is able to maximize cache utilization and avoid cache misses. Often, in practice, this more than offsets the cost incurred when shuffling data around.

另一方面，向量[等同于ArrayList或List ]将其项存储在相邻的存储器中，并且这样做，能够最大化高速缓存利用率并避免高速缓存未命中。通常，在实践中，这不仅可以抵消在重新调整数据时产生的成本。

If you'd like to hear that from a more authoritative source, this is from Tips for Improving Time-Critical Code on MSDN:

如果您希望从更权威的来源听到，请参阅MSDN上改进时间关键代码的提示：

Sometimes a data structure that looks great turns out to be horrible because of poor locality of reference. Here are two examples:

有时，由于参考地点不佳，看起来很棒的数据结构变得非常糟糕。以下是两个例子：

Dynamically allocated linked lists (LinkedListNode<T> is a reference type, so it is dynamically allocated) can reduce program performance because when you search for an item or when you traverse a list to the end, each skipped link could miss the cache or cause a page fault. A list implementation based on simple arrays might actually be much faster because of better caching and fewer page faults— even allowing for the fact that the array would be harder to grow, it still might be faster.

动态分配的链表（LinkedListNode 是一种引用类型，因此它是动态分配的）可以降低程序性能，因为当您搜索项目或遍历列表到最后时，每个跳过的链接可能会丢失缓存或原因页面错误。基于简单数组的列表实现实际上可能更快，因为更好的缓存和更少的页面错误 - 甚至允许阵列更难以增长，它仍然可能更快。

Hash tables that use dynamically allocated linked lists can degrade performance. By extension, hash tables that use dynamically allocated linked lists to store their contents might perform substantially worse. In fact, in the final analysis, a simple linear search through an array might actually be faster (depending on the circumstances). Array-based hash tables (IIRC, Dictionary<TKey,TValue> is array-based) are an often-overlooked implementation which frequently has superior performance.

使用动态分配的链接列表的哈希表可能会降低性能。通过扩展，使用动态分配的链表来存储其内容的哈希表可能表现得更糟。事实上，在最后的分析中，通过数组进行简单的线性搜索实际上可能更快（取决于具体情况）。基于数组的哈希表（IIRC，字典是基于数组的）是一种经常被忽视的实现，通常具有优越的性能。，tvalue>

This is my original (far less useful) answer where I did some performance tests.

这是我原来的（远没那么有用）的答案，我做了一些性能测试。

The general consensus seems to be that the linked list is allocating memory on every add (because the node is a class) and that does seem to be the case. I tried to isolate the allocation code from the timed code that adds items to the list and made a gist from the result: https://gist.github.com/zeldafreak/d11ae7781f5d43206f65

普遍的共识似乎是链表是在每次添加时分配内存（因为节点是一个类），而且似乎确实如此。我试图将分配代码与添加项目的定时代码隔离开来，并从结果中得到一个要点：https：//gist.github.com/zeldafreak/d11ae7781f5d43206f65

I run the test code 5 times and call GC.Collect() between them. Inserting 20 million nodes into the linked list takes 193-211ms (198ms) compared to 77-89ms (81ms), so even without the allocation, a standard list is a little over 2x faster. Iterating over a list takes 54-59ms, compared to 76-101ms for the linked list, which is a more modest 50%-ish faster.

我运行测试代码5次，并在它们之间调用GC.Collect（）。在链表中插入2000万个节点需要193-211毫秒（198毫秒），而77-89毫秒（81毫秒），所以即使没有分配，标准列表也要快2倍。迭代列表需要54-59ms，相比之下链表的76-101ms，这是一个更温和的50％ - 更快。

#5

I've done the same test with List and LinkedList inserting actual objects (Annonymous Types, actually) into the list,and Linked List is slower than List in that case as well.

我已经完成了相同的测试，List和LinkedList将实际对象（实际上是Annonymous Types）插入到列表中，并且在这种情况下Linked List比List慢。

However, LinkedList DOES speed up if you insert items like this, instead of using AddFirst, and AddLast:

但是，如果您插入这样的项目而不是使用AddFirst和AddLast，LinkedList DOES会加速：

LinkedList<T> list = new LinkedList<T>();
LinkedListNode<T> last = null;
foreach(var x in aLotOfStuff)
{
    if(last == null)
        last = list.AddFirst(x);
    else
        last = list.AddAfter(last, x);
}

AddAfter seems to be faster than AddLast. I would assume internally .NET would track the 'tail'/last object by ref, and go right to it when doing an AddLast(), but perhaps AddLast() causes it to traverse the entire list to the end?

AddAfter似乎比AddLast快。我假设内部.NET会通过ref跟踪'tail'/ last对象，并在执行AddLast（）时直接进入它，但是也许AddLast（）会使它遍历整个列表到最后？

#6

Since the other answers didn't mention this, I'm adding another.

由于其他答案没有提到这个，我正在添加另一个。

Although your print statement says "List Insert" you actually called List<T>.Add, which is the one kind of "insertion" that List is actually good at. Add is a special case of just using the next element is the underlying storage array and nothing has to get moved out of the way. Try really using List<T>.Insert instead to make it the worst case instead of the best case.

虽然你的print语句说“List Insert”你实际上调用了List .Add，这是List实际上擅长的一种“插入”。添加是一个特殊情况，只是使用下一个元素是底层存储阵列，没有什么必须移开。尝试使用List .Insert来使其成为最坏的情况，而不是最好的情况。

Edit:

编辑：

To summarize, for the purposes of insertion, a list is a special-purpose data structure that is only fast at one kind of insertion: append to the end. A linked-list is a general-purpose data structure that is equally fast at inserting anywhere into the list. And there is one more detail: the linked-list has higher memory and CPU overhead so its fixed costs are higher.

总而言之，为了插入，列表是一种特殊用途的数据结构，只能在一种插入时快速：附加到末尾。链表是一种通用数据结构，在插入列表的任何位置时同样快。还有一个细节：链表具有更高的内存和CPU开销，因此其固定成本更高。

So your benchmark compares general-purpose linked-list insertion against special-purpose list append to the end and so it is not surprising that the finely-tuned optimized data structure that is being used exactly as it was intended is performing well. If you want linked list to compare favorably, you need a benchmark that list will find challenging and that means you will need to insert at the beginning or into the middle of the list.

因此，您的基准测试会将通用链接列表插入与附加到特殊用途列表的内容进行比较，因此精确调整的优化数据结构正如预期的那样运行良好也就不足为奇了。如果您希望链表有利地进行比较，则需要一个基准列表，该列表将发现具有挑战性，这意味着您需要在列表的开头或中间插入。

#1