I am wondering why std::map
and std::set
use std::less
as the default functor to compare keys. Why not use a functor that works similar to strcmp? Something like:
我想知道为什么std::map和std::set use std::less as the default functor to compare keys。为什么不使用类似于strcmp的函数呢?喜欢的东西:
template <typename T> struct compare
{
// Return less than 0 if lhs < rhs
// Return 0 if lhs == rhs
// Return greater than 0 if lhs > rhs
int operator()(T const& lhs, T const& rhs)
{
return (lhs-rhs);
}
}
Say a map
has two object in it, with keys key1
and key2
. Now we want to insert another object with key key3
.
例如,一个map有两个对象,其中有keys key1和key2。现在我们要插入另一个键为key3的对象。
When using std::less
, the insert
function needs to first call std::less::operator()
with key1
and key3
. Assume std::less::operator()(key1, key3)
returns false. It has to call std::less::operator()
again with the keys switched, std::less::operator()(key3, key1)
, to decide whether key1
is equal to key3
or key3
is greater than key1
. There are two calls to std::less::operator()
to make a decision if the first call returns false.
使用std::less时,插入函数需要先调用std::less::operator(),带有key1和key3。假设std::less::操作符()(key1, key3)返回false。它必须调用std::less::操作符(),再用键切换,std::less::operator()(key3, key1),以决定key1是否等于key3或key3大于key1。对于std::less::操作符(),有两个调用,如果第一个调用返回false,则需要作出决定。
Had std::map::insert
used compare
, there would be sufficient information to make the right decision using just one call.
有了std::map::insert used compare,只有一个调用就有足够的信息来做出正确的决策。
Depending on the type of the key in map, std::less::operator()(key1, key2)
could be expensive.
根据映射中的键的类型,std::less::操作符()(key1, key2)可能比较昂贵。
Unless I am missing something very basic, shouldn't std::map
and std::set
use something like compare
instead of std::less
as the default functor to compare keys?
除非我漏掉了一些非常基本的东西,否则std: map和std::set不应该使用compare之类的东西而不是std::less作为默认的函数来比较键吗?
2 个解决方案
#1
20
I decided to ask Alexander Stepanov (designer of the STL) about this. I'm allowed to quote him as follows:
我决定问问亚历山大·斯特帕诺夫(STL的设计师)。我可以这样引用他的话:
Originally, I proposed 3-way comparisons. The standard committee asked me to change to standard comparison operators. I did what I was told. I have been advocating adding 3-way components to the standard for over 20 years.
最初,我提出了三向比较。标准委员会要求我改为标准比较操作符。我按要求做了。20多年来,我一直提倡在标准中加入3维组件。
But note that perhaps unintuitively, 2-way is not a huge overhead. You don't have to make twice as many comparisons. It's only one comparison per node on the way down (no equality check). The cost is not being able to return early (when the key is in a non-leaf) and one extra comparison at the end (swapping the arguments to check equality). If I'm not mistaken, that makes
但是请注意,也许不是很直观,2-way并不是一个巨大的开销。你不需要做两倍的比较。在向下的过程中,每个节点只有一个比较(没有相等的检查)。代价是不能尽早返回(当键在非叶中),并且在末尾进行额外的比较(交换参数以检查等式)。如果我没弄错的话,那就对了
1 + 1/2*1 + 1/4*2 + 1/8*3 + ...
= 1 + 1/2+1/4+1/8+... + 1/4+1/8+... + ...
-> 3 (depth -> infty)
extra comparisons on average on a balanced tree that contains the queried element.
在包含查询元素的平衡树上的平均额外比较。
On the other hand, 3-way comparison doesn't have terrible overhead: Branchless 3-way integer comparison. Now whether an extra branch to check the comparison result against 0 (equality) at each node is less overhead than paying ~3 extra comparisons at the end is another question. Probably doesn't matter much. But I think the comparison itself should have been 3-valued, so that the decision whether to use all 3 outcomes could be changed.
另一方面,三向比较没有可怕的开销:无分支的三向整数比较。现在,一个额外的分支是否检查每个节点上的0(相等)的比较结果,比支付~3额外的比较开销要少,这是另一个问题。可能并不重要。但是我认为比较本身应该是3值的,这样就可以改变是否使用这3种结果的决定。
Update: See comments below for why I think that 3-way comparison is better in trees, but not necessarily in flat arrays.
更新:请参阅下面的评论,了解为什么我认为三向比较在树中更好,但不一定在平面数组中更好。
#2
17
Tree based containers only require Strict Weak Total Ordering.
基于树的容器只需要严格的弱全局排序。
See https://www.sgi.com/tech/stl/StrictWeakOrdering.html
参见https://www.sgi.com/tech/stl/StrictWeakOrdering.html
-
write access
写访问
The insertion point for maps and sets is purely determined by a single binary search, e.g.
lower_bound
orupper_bound
. The runtime complexity of binary search isO(log n)
映射和集合的插入点纯粹由单个的二进制搜索决定,例如,小写的或大写的。二进制搜索的运行时复杂度是O(log n)
-
read access
读访问
The same applies to searching: the search is vastly more efficient than a linear equality scan, precisely because most elements do not need to be compared. The trick is that the containers are ordered.
同样的道理也适用于搜索:搜索比线性等同性扫描效率高得多,因为大多数元素不需要进行比较。诀窍在于容器是有序的。
The upshot is that the equality
information need not be present. Just, that items can have equivalent ordering.
结果是平等信息不需要出现。只是,这些物品可以有相同的顺序。
In practice this just just means fewer constraints on your element types, less work to implement the requirements and optimal performance in common usage scenarios. There will always be trade-offs. (E.g. for large collections, hash-tables (unordered sets and maps) are often more efficient. Note that these do require equatable
elements, and they employ a hashing scheme for fast lookup)
在实践中,这仅仅意味着对您的元素类型的约束更少,在公共使用场景中实现需求和最佳性能的工作更少。总是会有权衡取舍的。(例如,对于大型集合,哈希表(无序集和映射)通常更有效。请注意,这些都需要可平等的元素,并且它们采用了快速查找的散列方案)
#1
20
I decided to ask Alexander Stepanov (designer of the STL) about this. I'm allowed to quote him as follows:
我决定问问亚历山大·斯特帕诺夫(STL的设计师)。我可以这样引用他的话:
Originally, I proposed 3-way comparisons. The standard committee asked me to change to standard comparison operators. I did what I was told. I have been advocating adding 3-way components to the standard for over 20 years.
最初,我提出了三向比较。标准委员会要求我改为标准比较操作符。我按要求做了。20多年来,我一直提倡在标准中加入3维组件。
But note that perhaps unintuitively, 2-way is not a huge overhead. You don't have to make twice as many comparisons. It's only one comparison per node on the way down (no equality check). The cost is not being able to return early (when the key is in a non-leaf) and one extra comparison at the end (swapping the arguments to check equality). If I'm not mistaken, that makes
但是请注意,也许不是很直观,2-way并不是一个巨大的开销。你不需要做两倍的比较。在向下的过程中,每个节点只有一个比较(没有相等的检查)。代价是不能尽早返回(当键在非叶中),并且在末尾进行额外的比较(交换参数以检查等式)。如果我没弄错的话,那就对了
1 + 1/2*1 + 1/4*2 + 1/8*3 + ...
= 1 + 1/2+1/4+1/8+... + 1/4+1/8+... + ...
-> 3 (depth -> infty)
extra comparisons on average on a balanced tree that contains the queried element.
在包含查询元素的平衡树上的平均额外比较。
On the other hand, 3-way comparison doesn't have terrible overhead: Branchless 3-way integer comparison. Now whether an extra branch to check the comparison result against 0 (equality) at each node is less overhead than paying ~3 extra comparisons at the end is another question. Probably doesn't matter much. But I think the comparison itself should have been 3-valued, so that the decision whether to use all 3 outcomes could be changed.
另一方面,三向比较没有可怕的开销:无分支的三向整数比较。现在,一个额外的分支是否检查每个节点上的0(相等)的比较结果,比支付~3额外的比较开销要少,这是另一个问题。可能并不重要。但是我认为比较本身应该是3值的,这样就可以改变是否使用这3种结果的决定。
Update: See comments below for why I think that 3-way comparison is better in trees, but not necessarily in flat arrays.
更新:请参阅下面的评论,了解为什么我认为三向比较在树中更好,但不一定在平面数组中更好。
#2
17
Tree based containers only require Strict Weak Total Ordering.
基于树的容器只需要严格的弱全局排序。
See https://www.sgi.com/tech/stl/StrictWeakOrdering.html
参见https://www.sgi.com/tech/stl/StrictWeakOrdering.html
-
write access
写访问
The insertion point for maps and sets is purely determined by a single binary search, e.g.
lower_bound
orupper_bound
. The runtime complexity of binary search isO(log n)
映射和集合的插入点纯粹由单个的二进制搜索决定,例如,小写的或大写的。二进制搜索的运行时复杂度是O(log n)
-
read access
读访问
The same applies to searching: the search is vastly more efficient than a linear equality scan, precisely because most elements do not need to be compared. The trick is that the containers are ordered.
同样的道理也适用于搜索:搜索比线性等同性扫描效率高得多,因为大多数元素不需要进行比较。诀窍在于容器是有序的。
The upshot is that the equality
information need not be present. Just, that items can have equivalent ordering.
结果是平等信息不需要出现。只是,这些物品可以有相同的顺序。
In practice this just just means fewer constraints on your element types, less work to implement the requirements and optimal performance in common usage scenarios. There will always be trade-offs. (E.g. for large collections, hash-tables (unordered sets and maps) are often more efficient. Note that these do require equatable
elements, and they employ a hashing scheme for fast lookup)
在实践中,这仅仅意味着对您的元素类型的约束更少,在公共使用场景中实现需求和最佳性能的工作更少。总是会有权衡取舍的。(例如,对于大型集合,哈希表(无序集和映射)通常更有效。请注意,这些都需要可平等的元素,并且它们采用了快速查找的散列方案)