支持O(1)随机访问和最坏情况O(1)附加的数据结构?

I realize a resizable indexed collection that uses an array to store its elements (like List<T> in .NET or ArrayList in Java) has amortized O(1) insertion time at the end of the collection. But then there's always that one pesky insertion at critical junctures where the collection has just reached its capacity and the next insertion requires a full copy of all elements in the internal array to a new one (presumably twice as large).

我实现了一个可调整的索引集合，它使用一个数组来存储它的元素(比如在。net中List 或Java中的ArrayList)，在集合的末尾处有一个已摊销的O(1)插入时间。但是在关键节点上总是有一个讨厌的插入，在那里集合刚刚达到它的容量，下一个插入需要将内部数组中的所有元素全部复制到一个新的数组中(大概是原来的两倍)。

A common mistake (in my opinion) is to go with a linked list to "fix" this problem; but I believe the overhead of allocating a node for every element can be quite wasteful, and in fact would dwarf the benefit of a guaranteed O(1) insertion in that rare case that the array insertion is costly—when, in fact, every other array insertion is significantly cheaper (and faster).

一个常见的错误(在我看来)是使用一个链表来“修复”这个问题;但是我相信为每个元素分配一个节点的开销是非常浪费的，而且实际上会使O(1)的保证插入的好处相形见绌，在这种罕见的情况下，数组插入是非常昂贵的——实际上，其他所有的数组插入都非常便宜(而且更快)。

What I was thinking might make sense is a hybrid approach consisting of a linked list of arrays, where every time the current "head" array reaches its capacity, a new array twice as large is added to the linked list. Then no copy would be necessary since the linked list would still have the original array. Essentially this seems analogous (to me) to the List<T> or ArrayList approach, except that wherever you previously would've incurred the cost of copying all the elements of the internal array, here you only incur the cost of allocating a new array plus a single node insertion.

我所想到的可能是一种混合方法，它由一个数组链表组成，每当当前的“head”数组达到其容量时，就会向链表中添加一个两倍大的新数组。那么就不需要复制，因为链表仍然具有原始数组。从本质上来说，这与List 或ArrayList方法类似，但是无论您以前在什么地方需要复制内部数组的所有元素，在这里您只需要分配一个新数组和一个节点插入。

Of course, this would complicate other features if they were desired (e.g., inserting/removing into/from the middle of the collection); but as I've expressed in the title, I'm really just looking for an add-only (and iterate) collection.

当然，如果需要其他特性(例如，将/移除到/从集合中间插入/移除)，这将使其他特性复杂化;但是，正如我在标题中所表示的，我实际上只是在寻找一个仅加载项(和迭代)集合。

Are there any data structures out there ideally suited to this purpose? Or, can you think of one yourself?

有什么数据结构最适合这个目的吗?或者，你能自己想出一个吗?

4 个解决方案

#1

There is a beautiful structure called an extendible array that has worst-case O(1) insertion and O(n) memory overhead (that is, it's asymptotically comparable to a dynamic array, but has O(1) worst-case insertion). The trick is to take the approach that the vector uses - doubling and copying - but to make the copying lazy. For example, suppose you have an array of four elements like this one:

有一个漂亮的结构叫做可扩展数组，它有最坏情况的O(1)插入和O(n)内存开销(也就是说，它与动态数组渐进地类似，但是有O(1)最坏情况的插入)。诀窍是采取向量使用的方法——加倍和复制——但要使复制变得懒惰。例如，假设你有一个由四个元素组成的数组，比如这个:

[1] [2] [3] [4]

If you want to add a new number, say 5, you begin by allocating an array that's twice as large:

如果你想增加一个新的数字，比如5，你首先要分配一个两倍大的数组:

[1] [2] [3] [4]
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

Next, you insert 5 into the new array:

接下来，将5插入到新数组中:

[1] [2] [3] [4]
[ ] [ ] [ ] [ ] [5] [ ] [ ] [ ]

Finally, pull down the 4 from the old array into the new:

最后，将旧数组中的4拉到新数组中:

[1] [2] [3] [ ]
[ ] [ ] [ ] [4] [5] [ ] [ ] [ ]

From now on, any time you do an insert, add the element to the new array and pull down one more element from the old array. For example, after adding 6, we'd get

从现在开始，每当执行插入操作时，将元素添加到新数组中，并从旧数组中再拉出一个元素。例如，在添加6之后，我们会得到。

[1] [2] [ ] [ ]
[ ] [ ] [3] [4] [5] [6] [ ] [ ]

After inserting two more values, we end up here:

在插入了另外两个值之后，我们在这里结束:

[ ] [ ] [ ] [ ]
[1] [2] [3] [4] [5] [6] [7] [8]

If we now need to add one more element, we discard the now-empty old array and allocate an array twice as large as the current array (capable of holding 16 elements):

如果我们现在需要添加一个元素，我们将丢弃已清空的旧数组，并分配一个比当前数组大两倍的数组(能够容纳16个元素):

[1] [2] [3] [4] [5] [6] [7] [8]
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

And repeat this process. Discounting the cost of a memory allocation (which is usually sublinear in the size of the array), you do at most O(1) work per insertion.

并重复这个过程。考虑到内存分配的成本(通常是数组大小的次线性)，每次插入最多只能做O(1)项工作。

Lookups are still O(1), since you just decide which of the two arrays to look in, while insertions in the middle are O(n) because of the shuffling.

查找仍然是O(1)，因为您刚刚决定要查看两个数组中的哪一个，而中间的插入是O(n)，因为拖移。

If you're curious, I have a Java implementation of this structure on my personal site. I don't know how useful you'll find it, but you're more than welcome to try it out.

如果您好奇，我在我的个人站点上有这个结构的Java实现。我不知道你会发现它有多有用，但你可以试一试。

Hope this helps!

希望这可以帮助!

EDIT: If you want to invest a bit of time reading over a research paper and trying to implement a fairly complex data structure, you can get the same result (worst-case O(1) append) in O(√n) space overhead (which is provably optimal, by the way) using the ideas in this paper. I never got around to actually implementing this, but it's certainly well-worth the read if memory is a super-scarce resource. Interestingly, it uses this above construction as a subroutine!

编辑:如果你想投资一些时间阅读研究论文,试图实施一个相当复杂的数据结构,可以得到相同的结果(最坏的O(1)附加)在O(√n)空间开销(可能为最优,顺便)使用本文中的观点。我从来没有真正地去实现它，但是如果内存是一种非常稀缺的资源，那么它确实值得一读。有趣的是，它将上面的构造用作子例程!

#2

When I need a container like that, I use my implementation of the structure described in "Resizeable Arrays in Optimal Time and Space"

当我需要这样的容器时，我使用了“在最佳时间和空间中可调整数组”中描述的结构实现

#3

OK. What you have described is almost exactly what std::deque is in the C++ standard library. The difference is that an array(usually) is used to hold the pointers to the sub arrays instead of using a linked list.

好的。您所描述的几乎正是std::deque所在的c++标准库。不同之处在于，数组(通常)用于保存指向子数组的指针，而不是使用链表。

#4

One idea would be to create a list of few elements, like:

一种想法是创建一个包含少数元素的列表，比如:

struct item
{
    int data[NUM_ITEMS];
    item *next;
}

In this case insert would take O(1) and if you reached the limit just create a new block and append it to the end of your list

在这种情况下，insert将取O(1)，如果达到了极限，只需创建一个新的块并将其附加到列表的末尾

#1

[1] [2] [3] [4]