对于非重复项目,最有效的std容器是什么?

时间:2022-09-05 22:52:11

What is the most efficient way of adding non-repeated elements into STL container and what kind of container is the fastest? I have a large amount of data and I'm afraid each time I try to check if it is a new element or not, it takes a lot of time. I hope map be very fast.

将非重复元素添加到STL容器中的最有效方法是什么?哪种容器最快?我有大量的数据,我担心每次尝试检查它是否是新元素时,都需要花费很多时间。我希望地图非常快。

// 1- Map
map<int, int> Map;
...
if(Map.find(Element)!=Map.end()) Map[Element]=ID;

// 2-Vector
vector<int> Vec;
...
if(find(Vec.begin(), Vec.end(), Element)!=Vec.end()) Vec.push_back(Element);

// 3-Set
// Edit: I made a mistake: set::find is O(LogN) not O(N)

6 个解决方案

#1


12  

Both set and map has O(log(N)) performance for looking up keys. vector has O(N).

set和map都具有O(log(N))性能,可用于查找键。矢量有O(N)。

The difference between set and map, as far as you should be concerned, is whether you need to associate a key with a value, or just store a value directly. If you need the former, use a map, if you need the latter, use a set.

就您应该关注的而言,集合和映射之间的区别在于您是需要将键与值关联,还是仅直接存储值。如果您需要前者,请使用地图,如果您需要后者,请使用一套。

In both cases, you should just use insert() instead of doing a find().

在这两种情况下,您应该只使用insert()而不是find()。

The reason is insert() will insert the value into the container if and only if the container does not already contain that value (in the case of map, if the container does not contain that key). This might look like

原因是当且仅当容器尚未包含该值时,insert()才会将值插入容器中(如果容器不包含该键,则为map的情况)。这可能看起来像

Map.insert(std::make_pair(Element, ID));

for a map or

对于地图或

Set.insert(Element);

for a set.

一套。

You can consult the return value to determine whether or not an insertion was actually performed.

您可以查阅返回值以确定是否实际执行了插入。


If you're using C++11, you have two more choices, which are std::unordered_map and std::unordered_set. These both have amortized O(1) performance for insertions and lookups. However, they also require that the key (or value, in the case of set) be hashable, which means you'll need to specialize std::hash<> for your key. Conversely, std::map and std::set require that your key (or value, in the case of set) respond to operator<().

如果您使用的是C ++ 11,则还有两个选项,即std :: unordered_map和std :: unordered_set。这些都为插入和查找分摊了O(1)性能。但是,它们还要求密钥(或值,在set的情况下)是可清除的,这意味着您需要为密钥专门化std :: hash <>。相反,std :: map和std :: set要求您的键(或值,在set的情况下)响应operator <()。

#2


6  

If you're using C++11, you can use std::unordered_set. That would allow you O(1) existence-checking (technically amortized O(1) -- O(n) in the worst case).

如果您使用的是C ++ 11,则可以使用std :: unordered_set。这将允许你进行O(1)存在检查(在最坏的情况下技术上摊销的O(1) - O(n))。

std::set would probably be your second choice with O(lg n).

std :: set可能是你的第二个选择O(lg n)。

Basically, std::unordered_set is a hash table and std::set is a tree structure (a red black tree in every implementation I've ever seen)1.

基本上,std :: unordered_set是一个哈希表,std :: set是一个树结构(在我见过的每个实现中都是一个红黑树)1。

Depending on how well your hashes distribute and how many items you have, a std::set might actually be faster. If it's truly performance critical, then as always, you'll want to do benchmarking.

根据您的哈希分布的程度以及您拥有的项目数量,std :: set实际上可能更快。如果它真的对性能至关重要,那么一如既往,您将需要进行基准测试。

1) Technically speaking, I don't believe either are required to be implemented as a hash table or as a balanced BST. If I remember correctly, the standard just mandates the run time bounds, not the implementation -- it just turns out that those are the only viable implementations that fit the bounds.

1)从技术上讲,我不相信任何一个都需要实现为哈希表或平衡的BST。如果我没记错的话,标准只是强制执行运行时限,而不是实现 - 它只是证明那些是唯一适合边界的可行实现。

#3


3  

You should use a std::set; it is a container designed to hold a single (equivalent) copy of an object and is implemented as a binary search tree. Therefore, it is O(log N), not O(N), in the size of the container.

你应该使用std :: set;它是一个容器,用于保存对象的单个(等效)副本,并实现为二叉搜索树。因此,容器的大小为O(log N),而不是O(N)。

std::set and std::map often share a large part of their underlying implementation; you should check out your local STL implementation.

std :: set和std :: map经常共享其底层实现的很大一部分;你应该检查你当地的STL实施。

Having said all this, complexity is only one measure of performance. You may have better performance using a sorted vector, as it keeps the data local to one another and, therefore, more likely to hit the caches. Cache coherence is a large part of data structure design these days.

说到这一切,复杂性只是性能的一个衡量标准。使用排序向量可能会有更好的性能,因为它会使数据彼此保持局部性,因此更有可能触及缓存。如今,缓存一致性是数据结构设计的重要组成部分。

#4


1  

Sounds like you want to use a std::set. It's elements are unique, so you don't need to care about uniqueness when adding elements, and a.find(k) (where a is an std::set and k is a value) is defined as being logarithmic in complexity.

听起来你想要使用std :: set。它的元素是唯一的,因此在添加元素时不需要关心唯一性,并且a.find(k)(其中a是std :: set且k是值)被定义为复杂性的对数。

#5


1  

if your elements can be hashed for O(1), then better to use an index in a unordered_map or unordered_set (not in a map/set because they use RB tree in implementation which is O(logN) find complexity)

如果你的元素可以为O(1)进行散列,那么最好在unordered_map或unordered_set中使用索引(不在map / set中,因为它们在实现中使用RB树,这是O(logN)发现复杂性)

#6


1  

Your examples show a definite pattern:

您的示例显示了明确的模式:

check if the value is already in container
  if not, add the value to the container.

Both of these operation would potentially take some time. First, looking up an element can be done in O(N) time (linear search) if the elements are not arranged in any particular manner (e.g., just a plain std::vector), it could be done in O(logN) time (binary search) if the elements are sorted (e.g., either std::map or std::set), and it could be done in O(1) time if the elements are hashed (e.g., either std::unordered_map or std::unordered_set).

这两种操作都可能需要一些时间。首先,如果元素没有以任何特定的方式排列(例如,只是普通的std :: vector),可以在O(N)时间(线性搜索)中查找元素,它可以在O(logN)中完成time(binary search)如果元素被排序(例如,std :: map或std :: set),并且如果元素被散列(例如,std :: unordered_map或者std :: unordered_map,则可以在O(1)时间内完成)的std :: unordered_set)。

The insertion will be O(1) (amortized) for a plain vector or an unordered container (hash container), although the hash container will be a bit slower. For a sorted container like set or map, you'll have log-time insertions because it needs to look for the place to insert it before inserting it.

对于普通向量或无序容器(哈希容器),插入将是O(1)(摊销),尽管哈希容器会慢一点。对于像set或map这样的已排序容器,您将具有日志时间插入,因为它需要在插入之前查找插入它的位置。

So, the conclusion, use std::unordered_set or std::unordered_map (if you need the key-value feature). And you don't need to check before doing the insertion, these are unique-key containers, they don't allow duplicates.

所以,结论是,使用std :: unordered_set或std :: unordered_map(如果你需要键值功能)。并且您在插入之前不需要检查,这些是唯一密钥容器,它们不允许重复。

If std::unordered_set / std::unordered_map (from C++11) or std::tr1::unordered_set / std::tr1::unordered_map (since 2007) are not available to you (or any equivalent), then the next best alternative is std::set / std::map.

如果您(或任何等价物)无法使用std :: unordered_set / std :: unordered_map(来自C ++ 11)或std :: tr1 :: unordered_set / std :: tr1 :: unordered_map(自2007年起),那么下一个最好的选择是std :: set / std :: map。

#1


12  

Both set and map has O(log(N)) performance for looking up keys. vector has O(N).

set和map都具有O(log(N))性能,可用于查找键。矢量有O(N)。

The difference between set and map, as far as you should be concerned, is whether you need to associate a key with a value, or just store a value directly. If you need the former, use a map, if you need the latter, use a set.

就您应该关注的而言,集合和映射之间的区别在于您是需要将键与值关联,还是仅直接存储值。如果您需要前者,请使用地图,如果您需要后者,请使用一套。

In both cases, you should just use insert() instead of doing a find().

在这两种情况下,您应该只使用insert()而不是find()。

The reason is insert() will insert the value into the container if and only if the container does not already contain that value (in the case of map, if the container does not contain that key). This might look like

原因是当且仅当容器尚未包含该值时,insert()才会将值插入容器中(如果容器不包含该键,则为map的情况)。这可能看起来像

Map.insert(std::make_pair(Element, ID));

for a map or

对于地图或

Set.insert(Element);

for a set.

一套。

You can consult the return value to determine whether or not an insertion was actually performed.

您可以查阅返回值以确定是否实际执行了插入。


If you're using C++11, you have two more choices, which are std::unordered_map and std::unordered_set. These both have amortized O(1) performance for insertions and lookups. However, they also require that the key (or value, in the case of set) be hashable, which means you'll need to specialize std::hash<> for your key. Conversely, std::map and std::set require that your key (or value, in the case of set) respond to operator<().

如果您使用的是C ++ 11,则还有两个选项,即std :: unordered_map和std :: unordered_set。这些都为插入和查找分摊了O(1)性能。但是,它们还要求密钥(或值,在set的情况下)是可清除的,这意味着您需要为密钥专门化std :: hash <>。相反,std :: map和std :: set要求您的键(或值,在set的情况下)响应operator <()。

#2


6  

If you're using C++11, you can use std::unordered_set. That would allow you O(1) existence-checking (technically amortized O(1) -- O(n) in the worst case).

如果您使用的是C ++ 11,则可以使用std :: unordered_set。这将允许你进行O(1)存在检查(在最坏的情况下技术上摊销的O(1) - O(n))。

std::set would probably be your second choice with O(lg n).

std :: set可能是你的第二个选择O(lg n)。

Basically, std::unordered_set is a hash table and std::set is a tree structure (a red black tree in every implementation I've ever seen)1.

基本上,std :: unordered_set是一个哈希表,std :: set是一个树结构(在我见过的每个实现中都是一个红黑树)1。

Depending on how well your hashes distribute and how many items you have, a std::set might actually be faster. If it's truly performance critical, then as always, you'll want to do benchmarking.

根据您的哈希分布的程度以及您拥有的项目数量,std :: set实际上可能更快。如果它真的对性能至关重要,那么一如既往,您将需要进行基准测试。

1) Technically speaking, I don't believe either are required to be implemented as a hash table or as a balanced BST. If I remember correctly, the standard just mandates the run time bounds, not the implementation -- it just turns out that those are the only viable implementations that fit the bounds.

1)从技术上讲,我不相信任何一个都需要实现为哈希表或平衡的BST。如果我没记错的话,标准只是强制执行运行时限,而不是实现 - 它只是证明那些是唯一适合边界的可行实现。

#3


3  

You should use a std::set; it is a container designed to hold a single (equivalent) copy of an object and is implemented as a binary search tree. Therefore, it is O(log N), not O(N), in the size of the container.

你应该使用std :: set;它是一个容器,用于保存对象的单个(等效)副本,并实现为二叉搜索树。因此,容器的大小为O(log N),而不是O(N)。

std::set and std::map often share a large part of their underlying implementation; you should check out your local STL implementation.

std :: set和std :: map经常共享其底层实现的很大一部分;你应该检查你当地的STL实施。

Having said all this, complexity is only one measure of performance. You may have better performance using a sorted vector, as it keeps the data local to one another and, therefore, more likely to hit the caches. Cache coherence is a large part of data structure design these days.

说到这一切,复杂性只是性能的一个衡量标准。使用排序向量可能会有更好的性能,因为它会使数据彼此保持局部性,因此更有可能触及缓存。如今,缓存一致性是数据结构设计的重要组成部分。

#4


1  

Sounds like you want to use a std::set. It's elements are unique, so you don't need to care about uniqueness when adding elements, and a.find(k) (where a is an std::set and k is a value) is defined as being logarithmic in complexity.

听起来你想要使用std :: set。它的元素是唯一的,因此在添加元素时不需要关心唯一性,并且a.find(k)(其中a是std :: set且k是值)被定义为复杂性的对数。

#5


1  

if your elements can be hashed for O(1), then better to use an index in a unordered_map or unordered_set (not in a map/set because they use RB tree in implementation which is O(logN) find complexity)

如果你的元素可以为O(1)进行散列,那么最好在unordered_map或unordered_set中使用索引(不在map / set中,因为它们在实现中使用RB树,这是O(logN)发现复杂性)

#6


1  

Your examples show a definite pattern:

您的示例显示了明确的模式:

check if the value is already in container
  if not, add the value to the container.

Both of these operation would potentially take some time. First, looking up an element can be done in O(N) time (linear search) if the elements are not arranged in any particular manner (e.g., just a plain std::vector), it could be done in O(logN) time (binary search) if the elements are sorted (e.g., either std::map or std::set), and it could be done in O(1) time if the elements are hashed (e.g., either std::unordered_map or std::unordered_set).

这两种操作都可能需要一些时间。首先,如果元素没有以任何特定的方式排列(例如,只是普通的std :: vector),可以在O(N)时间(线性搜索)中查找元素,它可以在O(logN)中完成time(binary search)如果元素被排序(例如,std :: map或std :: set),并且如果元素被散列(例如,std :: unordered_map或者std :: unordered_map,则可以在O(1)时间内完成)的std :: unordered_set)。

The insertion will be O(1) (amortized) for a plain vector or an unordered container (hash container), although the hash container will be a bit slower. For a sorted container like set or map, you'll have log-time insertions because it needs to look for the place to insert it before inserting it.

对于普通向量或无序容器(哈希容器),插入将是O(1)(摊销),尽管哈希容器会慢一点。对于像set或map这样的已排序容器,您将具有日志时间插入,因为它需要在插入之前查找插入它的位置。

So, the conclusion, use std::unordered_set or std::unordered_map (if you need the key-value feature). And you don't need to check before doing the insertion, these are unique-key containers, they don't allow duplicates.

所以,结论是,使用std :: unordered_set或std :: unordered_map(如果你需要键值功能)。并且您在插入之前不需要检查,这些是唯一密钥容器,它们不允许重复。

If std::unordered_set / std::unordered_map (from C++11) or std::tr1::unordered_set / std::tr1::unordered_map (since 2007) are not available to you (or any equivalent), then the next best alternative is std::set / std::map.

如果您(或任何等价物)无法使用std :: unordered_set / std :: unordered_map(来自C ++ 11)或std :: tr1 :: unordered_set / std :: tr1 :: unordered_map(自2007年起),那么下一个最好的选择是std :: set / std :: map。