为什么标准迭代器范围[begin, end]而不是[begin, end]?

时间:2021-09-05 18:18:58

Why does the Standard define end() as one past the end, instead of at the actual end?

为什么标准将end()定义为结束后的一个,而不是实际的结束?

7 个解决方案

#1


270  

The best argument easily is the one made by Dijkstra himself:

最简单的论据就是Dijkstra自己提出的:

  • You want the size of the range to be a simple difference end − begin;

    你想要的大小范围是一个简单的差异最终−开始;

  • including the lower bound is more "natural" when sequences degenerate to empty ones, and also because the alternative (excluding the lower bound) would require the existence of a "one-before-the-beginning" sentinel value.

    当序列退化为空的时候,包含下界更“自然”,而且还因为另一种选择(不包括下界)需要存在一个“一开始”的前哨值。

You still need to justify why you start counting at zero rather than one, but that wasn't part of your question.

你仍然需要证明为什么你开始数0而不是1,但这不是你问题的一部分。

The wisdom behind the [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calls to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n0, n1)[n1, n2)[n2,n3). Another example is the standard iteration loop for (it = begin; it != end; ++it), which runs end - begin times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you'd handle empty ranges.

当你有任何一种算法处理多个嵌套或迭代调用的基于区间结构的调用时,(开始、结束)约定的智慧就会得到回报。相比之下,使用双闭合范围将会导致一个接一个的、非常不愉快和嘈杂的代码。例如,考虑一个分区[n0, n1] [n1, n2] [n2,n3]。另一个例子是标准的迭代循环(it = begin;它! =结束;+it),运行结束-开始时间。如果两个端点都包含在内,相应的代码的可读性将大大降低——想象一下如何处理空范围。

Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural "beginning" so that you can write the range as [0, N), without any awkward offsets or corrections.

最后,我们也可以做一个好的论点为什么计数应该从0开始:半开的公约的范围,我们就建立了,如果你有一系列的N个元素(例如数组枚举的成员),0是自然的“开始”,然后你可以写范围为(0,N),没有任何尴尬的补偿和修正。

In a nutshell: the fact that we don't see the number 1 everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.

简而言之:在基于范围的算法中,我们并没有在所有地方看到数字1,这是[开始,结束]约定的直接后果和动机。

#2


71  

Why does the Standard define end() as one past the end, instead of at the actual end?

为什么标准将end()定义为结束后的一个,而不是实际的结束?

Because:

因为:

  1. It avoids special handling for empty ranges. For empty ranges, begin() is equal to end() &
  2. 它避免了对空范围的特殊处理。对于空范围,begin()等于end() &
  3. It makes the end criterion simple for loops that iterate over the elements: The loops simply continue as long as end() is not reached.
  4. 它使循环遍历元素的结束条件变得简单:只要没有到达end(),循环就会继续。

#3


67  

Actually, a lot of iterator related stuff suddenly makes much more sense if you consider the iterators not pointing at the elements of the sequence but in between, with dereferencing accessing the next element right to it. Then the "one past end" iterator suddenly makes immediate sense:

实际上,如果考虑到迭代器不是指向序列的元素,而是指向序列之间的元素,并取消对其下一个元素的引用,那么很多与迭代器相关的内容会突然变得更有意义。然后“one past end”迭代器突然有了立意:

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^               ^
   |               |
 begin            end

Obviously begin points to the beginning of the sequence, and end points to the end of the same sequence. Dereferencing begin accesses the element A, and dereferencing end makes no sense because there's no element right to it. Also, adding an iterator i in the middle gives

显然,开始点指向序列的开始点,结束点指向同一个序列的结束点。取消引用开始访问元素A,而取消引用结束没有意义,因为它没有元素。此外,在中间添加迭代器i也会得到这个结果

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
 begin     i      end

and you immediately see that the range of elements from begin to i contains the elements A and B while the range of elements from i to end contains the elements C and D. Dereferencing i gives the element right of it, that is the first element of the second sequence.

你马上看到的范围从开始我包含元素A和B,而元素的范围从我结束包含元素C和d .废弃我给元素,这是第二个序列的第一个元素。

Even the "off-by-one" for reverse iterators suddenly becomes obvious that way: Reversing that sequence gives:

甚至反向迭代器的“off-by-one”也突然变得如此明显:反转序列会得到:

   +---+---+---+---+
   | D | C | B | A |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
rbegin     ri     rend
 (end)    (i)   (begin)

I've written the corresponding non-reverse (base) iterators in parentheses below. You see, the reverse iterator belonging to i (which I've named ri) still points in between elements B and C. However due to reversing the sequence, now element B is on the right to it.

我在下面的括号中编写了相应的非反向(基本)迭代器。你看,属于i(我命名为ri)的反向迭代器仍然指向元素B和c之间。但是由于顺序颠倒,现在元素B在它的右边。

#4


58  

Because then

因为这样

size() == end() - begin()   // For iterators for whom subtraction is valid

and you won't have to do awkward things like

你也不需要做一些尴尬的事情

// Never mind that this is INVALID for input iterators...
bool empty() { return begin() == end() + 1; }

and you won't accidentally write erroneous code like

你不会不小心写出错误的代码

bool empty() { return begin() == end() - 1; }    // a typo from the first version
                                                 // of this post
                                                 // (see, it really is confusing)

bool empty() { return end() - begin() == -1; }   // Signed/unsigned mismatch
// Plus the fact that subtracting is also invalid for many iterators

Also: What would find() return if end() pointed to a valid element?
Do you really want another member called invalid() which returns an invalid iterator?!
Two iterators is already painful enough...

另外:如果end()指向一个有效的元素,会发现()返回什么?您真的希望另一个名为invalid()的成员返回一个无效的迭代器吗?两个迭代器已经够痛苦的了……

Oh, and see this related post.

哦,看看这个相关的帖子。


Also:

If the end was before the last element, how would you insert() at the true end?!

如果结束在最后一个元素之前,那么如何在真正的结束处插入()?

#5


22  

The iterator idiom of half-closed ranges [begin(), end()) is originally based on pointer arithmetic for plain arrays. In that mode of operation, you would have functions that were passed an array and a size.

半闭范围[begin(), end())]的迭代器习惯用法最初是基于普通数组的指针算法。在这种操作模式下,您将拥有传递数组和大小的函数。

void func(int* array, size_t size)

Converting to half-closed ranges [begin, end) is very simple when you have that information:

转换到半闭合范围[开始,结束]是非常简单的,当你有那个信息:

int* begin;
int* end = array + size;

for (int* it = begin; it < end; ++it) { ... }

To work with fully-closed ranges, it's harder:

要在完全闭合的范围内工作,就更难了:

int* begin;
int* end = array + size - 1;

for (int* it = begin; it <= end; ++it) { ... }

Since pointers to arrays are iterators in C++ (and the syntax was designed to allow this), it's much easier to call std::find(array, array + size, some_value) than it is to call std::find(array, array + size - 1, some_value).

由于指向数组的指针在c++中是迭代器(语法被设计为允许这样做),所以调用std: find(数组、数组+ size、some_value)要比调用std::find(数组、数组+ size - 1、some_value)容易得多。


Plus, if you work with half-closed ranges, you can use the != operator to check for the end condition, becuase (if your operators are defined correctly) < implies !=.

另外,如果您使用的是半闭范围,您可以使用!=操作符检查结束条件,因为(如果您的操作符定义正确)< implies !=。

for (int* it = begin; it != end; ++ it) { ... }

However there's no easy way to do this with fully-closed ranges. You're stuck with <=.

然而,对于完全闭合的范围来说,要做到这一点并不容易。你坚持< =。

The only kind of iterator that supports < and > operations in C++ are random-access iterators. If you had to write a <= operator for every iterator class in C++, you'd have to make all of your iterators fully comparable, and you'd fewer choices for creating less capable iterators (such as the bidirectional iterators on std::list, or the input iterators that operate on iostreams) if C++ used fully-closed ranges.

在c++中,唯一支持 <和> 操作的迭代器是随机访问迭代器。如果你有写< =操作符为每个迭代器类在c++中,你必须使你所有的迭代器完全可比,你会选择用于创建更少少迭代器(如双向迭代器在std::列表,或输入迭代器操作iostream)如果使用c++完全关闭。

#6


8  

With the end() pointing one past the end, it is easy to iterate a collection with a for loop:

用end()指向结束后的一个,很容易使用for循环迭代一个集合:

for (iterator it = collection.begin(); it != collection.end(); it++)
{
    DoStuff(*it);
}

With end() pointing to the last element, a loop would be more complex:

使用end()指向最后一个元素,循环将更加复杂:

iterator it = collection.begin();
while (!collection.empty())
{
    DoStuff(*it);

    if (it == collection.end())
        break;

    it++;
}

#7


0  

  1. If a container is empty, begin() == end().
  2. 如果容器是空的,那么begin() = end()。
  3. C++ Programmers tend to use != instead of < (less than) in loop conditions, therefore having end() pointing to a position one off-the-end is convenient.
  4. c++程序员倾向于在循环条件下使用!=而不是<(小于),因此有end()指向一个现成的位置是很方便的。

#1


270  

The best argument easily is the one made by Dijkstra himself:

最简单的论据就是Dijkstra自己提出的:

  • You want the size of the range to be a simple difference end − begin;

    你想要的大小范围是一个简单的差异最终−开始;

  • including the lower bound is more "natural" when sequences degenerate to empty ones, and also because the alternative (excluding the lower bound) would require the existence of a "one-before-the-beginning" sentinel value.

    当序列退化为空的时候,包含下界更“自然”,而且还因为另一种选择(不包括下界)需要存在一个“一开始”的前哨值。

You still need to justify why you start counting at zero rather than one, but that wasn't part of your question.

你仍然需要证明为什么你开始数0而不是1,但这不是你问题的一部分。

The wisdom behind the [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calls to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n0, n1)[n1, n2)[n2,n3). Another example is the standard iteration loop for (it = begin; it != end; ++it), which runs end - begin times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you'd handle empty ranges.

当你有任何一种算法处理多个嵌套或迭代调用的基于区间结构的调用时,(开始、结束)约定的智慧就会得到回报。相比之下,使用双闭合范围将会导致一个接一个的、非常不愉快和嘈杂的代码。例如,考虑一个分区[n0, n1] [n1, n2] [n2,n3]。另一个例子是标准的迭代循环(it = begin;它! =结束;+it),运行结束-开始时间。如果两个端点都包含在内,相应的代码的可读性将大大降低——想象一下如何处理空范围。

Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural "beginning" so that you can write the range as [0, N), without any awkward offsets or corrections.

最后,我们也可以做一个好的论点为什么计数应该从0开始:半开的公约的范围,我们就建立了,如果你有一系列的N个元素(例如数组枚举的成员),0是自然的“开始”,然后你可以写范围为(0,N),没有任何尴尬的补偿和修正。

In a nutshell: the fact that we don't see the number 1 everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.

简而言之:在基于范围的算法中,我们并没有在所有地方看到数字1,这是[开始,结束]约定的直接后果和动机。

#2


71  

Why does the Standard define end() as one past the end, instead of at the actual end?

为什么标准将end()定义为结束后的一个,而不是实际的结束?

Because:

因为:

  1. It avoids special handling for empty ranges. For empty ranges, begin() is equal to end() &
  2. 它避免了对空范围的特殊处理。对于空范围,begin()等于end() &
  3. It makes the end criterion simple for loops that iterate over the elements: The loops simply continue as long as end() is not reached.
  4. 它使循环遍历元素的结束条件变得简单:只要没有到达end(),循环就会继续。

#3


67  

Actually, a lot of iterator related stuff suddenly makes much more sense if you consider the iterators not pointing at the elements of the sequence but in between, with dereferencing accessing the next element right to it. Then the "one past end" iterator suddenly makes immediate sense:

实际上,如果考虑到迭代器不是指向序列的元素,而是指向序列之间的元素,并取消对其下一个元素的引用,那么很多与迭代器相关的内容会突然变得更有意义。然后“one past end”迭代器突然有了立意:

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^               ^
   |               |
 begin            end

Obviously begin points to the beginning of the sequence, and end points to the end of the same sequence. Dereferencing begin accesses the element A, and dereferencing end makes no sense because there's no element right to it. Also, adding an iterator i in the middle gives

显然,开始点指向序列的开始点,结束点指向同一个序列的结束点。取消引用开始访问元素A,而取消引用结束没有意义,因为它没有元素。此外,在中间添加迭代器i也会得到这个结果

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
 begin     i      end

and you immediately see that the range of elements from begin to i contains the elements A and B while the range of elements from i to end contains the elements C and D. Dereferencing i gives the element right of it, that is the first element of the second sequence.

你马上看到的范围从开始我包含元素A和B,而元素的范围从我结束包含元素C和d .废弃我给元素,这是第二个序列的第一个元素。

Even the "off-by-one" for reverse iterators suddenly becomes obvious that way: Reversing that sequence gives:

甚至反向迭代器的“off-by-one”也突然变得如此明显:反转序列会得到:

   +---+---+---+---+
   | D | C | B | A |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
rbegin     ri     rend
 (end)    (i)   (begin)

I've written the corresponding non-reverse (base) iterators in parentheses below. You see, the reverse iterator belonging to i (which I've named ri) still points in between elements B and C. However due to reversing the sequence, now element B is on the right to it.

我在下面的括号中编写了相应的非反向(基本)迭代器。你看,属于i(我命名为ri)的反向迭代器仍然指向元素B和c之间。但是由于顺序颠倒,现在元素B在它的右边。

#4


58  

Because then

因为这样

size() == end() - begin()   // For iterators for whom subtraction is valid

and you won't have to do awkward things like

你也不需要做一些尴尬的事情

// Never mind that this is INVALID for input iterators...
bool empty() { return begin() == end() + 1; }

and you won't accidentally write erroneous code like

你不会不小心写出错误的代码

bool empty() { return begin() == end() - 1; }    // a typo from the first version
                                                 // of this post
                                                 // (see, it really is confusing)

bool empty() { return end() - begin() == -1; }   // Signed/unsigned mismatch
// Plus the fact that subtracting is also invalid for many iterators

Also: What would find() return if end() pointed to a valid element?
Do you really want another member called invalid() which returns an invalid iterator?!
Two iterators is already painful enough...

另外:如果end()指向一个有效的元素,会发现()返回什么?您真的希望另一个名为invalid()的成员返回一个无效的迭代器吗?两个迭代器已经够痛苦的了……

Oh, and see this related post.

哦,看看这个相关的帖子。


Also:

If the end was before the last element, how would you insert() at the true end?!

如果结束在最后一个元素之前,那么如何在真正的结束处插入()?

#5


22  

The iterator idiom of half-closed ranges [begin(), end()) is originally based on pointer arithmetic for plain arrays. In that mode of operation, you would have functions that were passed an array and a size.

半闭范围[begin(), end())]的迭代器习惯用法最初是基于普通数组的指针算法。在这种操作模式下,您将拥有传递数组和大小的函数。

void func(int* array, size_t size)

Converting to half-closed ranges [begin, end) is very simple when you have that information:

转换到半闭合范围[开始,结束]是非常简单的,当你有那个信息:

int* begin;
int* end = array + size;

for (int* it = begin; it < end; ++it) { ... }

To work with fully-closed ranges, it's harder:

要在完全闭合的范围内工作,就更难了:

int* begin;
int* end = array + size - 1;

for (int* it = begin; it <= end; ++it) { ... }

Since pointers to arrays are iterators in C++ (and the syntax was designed to allow this), it's much easier to call std::find(array, array + size, some_value) than it is to call std::find(array, array + size - 1, some_value).

由于指向数组的指针在c++中是迭代器(语法被设计为允许这样做),所以调用std: find(数组、数组+ size、some_value)要比调用std::find(数组、数组+ size - 1、some_value)容易得多。


Plus, if you work with half-closed ranges, you can use the != operator to check for the end condition, becuase (if your operators are defined correctly) < implies !=.

另外,如果您使用的是半闭范围,您可以使用!=操作符检查结束条件,因为(如果您的操作符定义正确)< implies !=。

for (int* it = begin; it != end; ++ it) { ... }

However there's no easy way to do this with fully-closed ranges. You're stuck with <=.

然而,对于完全闭合的范围来说,要做到这一点并不容易。你坚持< =。

The only kind of iterator that supports < and > operations in C++ are random-access iterators. If you had to write a <= operator for every iterator class in C++, you'd have to make all of your iterators fully comparable, and you'd fewer choices for creating less capable iterators (such as the bidirectional iterators on std::list, or the input iterators that operate on iostreams) if C++ used fully-closed ranges.

在c++中,唯一支持 <和> 操作的迭代器是随机访问迭代器。如果你有写< =操作符为每个迭代器类在c++中,你必须使你所有的迭代器完全可比,你会选择用于创建更少少迭代器(如双向迭代器在std::列表,或输入迭代器操作iostream)如果使用c++完全关闭。

#6


8  

With the end() pointing one past the end, it is easy to iterate a collection with a for loop:

用end()指向结束后的一个,很容易使用for循环迭代一个集合:

for (iterator it = collection.begin(); it != collection.end(); it++)
{
    DoStuff(*it);
}

With end() pointing to the last element, a loop would be more complex:

使用end()指向最后一个元素,循环将更加复杂:

iterator it = collection.begin();
while (!collection.empty())
{
    DoStuff(*it);

    if (it == collection.end())
        break;

    it++;
}

#7


0  

  1. If a container is empty, begin() == end().
  2. 如果容器是空的,那么begin() = end()。
  3. C++ Programmers tend to use != instead of < (less than) in loop conditions, therefore having end() pointing to a position one off-the-end is convenient.
  4. c++程序员倾向于在循环条件下使用!=而不是<(小于),因此有end()指向一个现成的位置是很方便的。