索引集(用于向量中的有效删除)

时间:2021-02-28 04:18:30

I was just about to implement my own class for efficient removal from an array, but thought I'd ask to see if anything like it already exists. What I want is list-like access efficiency but using an array. I want to use an array for reasons of cache coherence and so I don't have to continually be calling a memory allocator (as using std::list would when allocating nodes).

我正准备实现我自己的类,以便从数组中有效删除,但我想我会问,看看它是否已经存在。我想要的是类似列表的访问效率,但使用数组。我想使用数组出于缓存一致性的原因,所以我不必一直调用内存分配器(因为在分配节点时使用std :: list)。

What I thought about doing was creating a class with two arrays. The first is a set of elements and the second array is a set of integers where each integer is a free slot in the first array. So I can add/remove elements from the array fairly easily, without allocating new memory for them, simply by taking an index from the free list and using that for the new element.

我想做的是创建一个包含两个数组的类。第一个是一组元素,第二个数组是一组整数,其中每个整数是第一个数组中的空闲槽。因此,我可以非常轻松地从数组中添加/删除元素,而无需为它们分配新内存,只需从空闲列表中获取索引并将其用于新元素即可。

Does anything like this exist already? If I do my own, I'll have to also make my own iterators, so you can iterate the set avoiding any empty slots in the array, and I don't fancy that very much.

有这样的事情吗?如果我自己做,我还必须自己创建迭代器,这样你就可以迭代这个集合,避免数组中的任何空插槽,我不太喜欢它。

Thanks.

Note: The kind of operations I want to perform on the set are:

注意:我想在集合上执行的操作类型是:

  1. Iteration
  2. Random access of individual elements, by index (or "handle" as I'm thinking of it)
  3. 通过索引随机访问单个元素(或“我正在考虑”的“句柄”)

  4. Removal of an element anywhere in the set
  5. 删除集合中任何位置的元素

  6. Addition of an element to the set (order unimportant)
  7. 在集合中添加元素(顺序不重要)

4 个解决方案

#1


1  

std::list<T> actually does sound exactly like the theoretically correct data structure for your job, because it supports the four operations you listed, all with optimal space and time complexity. std::list<T>::iterator is a handle that remains valid even if you add/remove other items to/from the list.

std :: list 实际上听起来与您工作的理论上正确的数据结构完全相同,因为它支持您列出的四个操作,所有操作都具有最佳的空间和时间复杂度。 std :: list :: iterator是一个句柄,即使你在列表中添加/删除其他项目,它仍然有效。

It may be that there is a custom allocator (i.e. not std::allocator<T>) that you could use with std::list<T, Allocator> to get the performance you want (internally pool nodes and then don't do runtime allocation everytime you add or remove a node). But that might be overkill.

可能有一个自定义分配器(即不是std :: allocator ),你可以使用std :: list 来获得你想要的性能(内部池节点然后不做每次添加或删除节点时的运行时分配)。但这可能是矫枉过正的。 ,allocator>

I would start just using a std::list<T> with the default allocator and then only look at custom allocators or other data structures if you find the performance is too bad for your application.

我将开始使用带有默认分配器的std :: list ,然后只查看自定义分配器或其他数据结构,如果您发现性能对您的应用程序来说太糟糕了。

#2


1  

If maintaining order of elements is irrelevant, use swap-and-pop.

如果维护元素的顺序无关紧要,请使用swap-and-pop。

Copy/move the last element over the one to be removed, then pop the back element. Super easy and efficient. You don't even need to bother with special checks for removing the element since it'll Just Work(tm) if you use the standard C++ vector and operations.

将最后一个元素复制/移动到要移除的元素上,然后弹出后面的元素。超级简单高效。你甚至不需要特别检查去除元素,因为如果你使用标准的C ++向量和操作它将是Just Work(tm)。

*iter = std::move(container.back());
container.pop_back();

I don't recall if pop_back() invalidated iterators on vector, but I don't think it does. If it does, just use indices directly or to recalculate a new valid iterator.

我不记得pop_back()是否使vector上的迭代器失效,但我认为它没有。如果是这样,只需直接使用索引或重新计算新的有效迭代器。

auto delta = iter - container.begin();
// mutate container
iter = container.begin() + delta;

#3


1  

You can use a single array by storing the information about the "empty" slots in the space of the empty slots.

您可以通过在空插槽的空间中存储有关“空”插槽的信息来使用单个阵列。

For a contiguous block of empty slots in your array A, say of k slots starting from index n, store (k, n') at location A[n] (where n' is the index of the next block of free indexes). You may have to pack the two ints into a single word if your array is storing word-sized objects.

对于阵列A中连续的空槽块,比如从索引n开始的k个槽,在位置A [n]处存储(k,n')(其中n'是下一个空闲索引块的索引)。如果阵列存储字大小的对象,则可能必须将两个整数打包成一个单词。

You're essentially storing a linked-list of free blocks, like a memory-manager might do.

你实际上存储了一个空闲块的链表,就像内存管理器可能做的那样。

It's a bit of a pain to code, but this'll allow you to allocate a free index in O(1) time, and to iterate through the allocated indices in O(n) time, where n is the number of allocated slots. Freeing an index will be O(n) time though in the worst case: this is the same problem as fragmented memory.

代码有点麻烦,但这将允许您在O(1)时间内分配一个空闲索引,并在O(n)时间内迭代分配的索引,其中n是分配的时隙数。在最坏的情况下,释放索引将是O(n)时间:这与碎片化内存的问题相同。

For the first free block, you can either store the index separately, or have the convention that you never allocate A[0] so you can always start a free-index search from there.

对于第一个空闲块,您可以单独存储索引,也可以使用您从未分配A [0]的约定,这样您就可以始终从那里开始*索引搜索。

#4


0  

std::map might be useful in your case.

std :: map可能对你的情况有用。

#1


1  

std::list<T> actually does sound exactly like the theoretically correct data structure for your job, because it supports the four operations you listed, all with optimal space and time complexity. std::list<T>::iterator is a handle that remains valid even if you add/remove other items to/from the list.

std :: list 实际上听起来与您工作的理论上正确的数据结构完全相同,因为它支持您列出的四个操作,所有操作都具有最佳的空间和时间复杂度。 std :: list :: iterator是一个句柄,即使你在列表中添加/删除其他项目,它仍然有效。

It may be that there is a custom allocator (i.e. not std::allocator<T>) that you could use with std::list<T, Allocator> to get the performance you want (internally pool nodes and then don't do runtime allocation everytime you add or remove a node). But that might be overkill.

可能有一个自定义分配器(即不是std :: allocator ),你可以使用std :: list 来获得你想要的性能(内部池节点然后不做每次添加或删除节点时的运行时分配)。但这可能是矫枉过正的。 ,allocator>

I would start just using a std::list<T> with the default allocator and then only look at custom allocators or other data structures if you find the performance is too bad for your application.

我将开始使用带有默认分配器的std :: list ,然后只查看自定义分配器或其他数据结构,如果您发现性能对您的应用程序来说太糟糕了。

#2


1  

If maintaining order of elements is irrelevant, use swap-and-pop.

如果维护元素的顺序无关紧要,请使用swap-and-pop。

Copy/move the last element over the one to be removed, then pop the back element. Super easy and efficient. You don't even need to bother with special checks for removing the element since it'll Just Work(tm) if you use the standard C++ vector and operations.

将最后一个元素复制/移动到要移除的元素上,然后弹出后面的元素。超级简单高效。你甚至不需要特别检查去除元素,因为如果你使用标准的C ++向量和操作它将是Just Work(tm)。

*iter = std::move(container.back());
container.pop_back();

I don't recall if pop_back() invalidated iterators on vector, but I don't think it does. If it does, just use indices directly or to recalculate a new valid iterator.

我不记得pop_back()是否使vector上的迭代器失效,但我认为它没有。如果是这样,只需直接使用索引或重新计算新的有效迭代器。

auto delta = iter - container.begin();
// mutate container
iter = container.begin() + delta;

#3


1  

You can use a single array by storing the information about the "empty" slots in the space of the empty slots.

您可以通过在空插槽的空间中存储有关“空”插槽的信息来使用单个阵列。

For a contiguous block of empty slots in your array A, say of k slots starting from index n, store (k, n') at location A[n] (where n' is the index of the next block of free indexes). You may have to pack the two ints into a single word if your array is storing word-sized objects.

对于阵列A中连续的空槽块,比如从索引n开始的k个槽,在位置A [n]处存储(k,n')(其中n'是下一个空闲索引块的索引)。如果阵列存储字大小的对象,则可能必须将两个整数打包成一个单词。

You're essentially storing a linked-list of free blocks, like a memory-manager might do.

你实际上存储了一个空闲块的链表,就像内存管理器可能做的那样。

It's a bit of a pain to code, but this'll allow you to allocate a free index in O(1) time, and to iterate through the allocated indices in O(n) time, where n is the number of allocated slots. Freeing an index will be O(n) time though in the worst case: this is the same problem as fragmented memory.

代码有点麻烦,但这将允许您在O(1)时间内分配一个空闲索引,并在O(n)时间内迭代分配的索引,其中n是分配的时隙数。在最坏的情况下,释放索引将是O(n)时间:这与碎片化内存的问题相同。

For the first free block, you can either store the index separately, or have the convention that you never allocate A[0] so you can always start a free-index search from there.

对于第一个空闲块,您可以单独存储索引,也可以使用您从未分配A [0]的约定,这样您就可以始终从那里开始*索引搜索。

#4


0  

std::map might be useful in your case.

std :: map可能对你的情况有用。