for循环中pIter!= cont.end()的性能

时间:2021-04-05 04:07:11

I was getting through "Exceptional C++" by Herb Sutter lately, and I have serious doubts about a particular recommendation he gives in Item 6 - Temporary Objects.

我最近接受了Herb Sutter的“Exceptional C ++”,我对他在第6项 - 临时对象中提出的特别建议表示严重怀疑。

He offers to find unnecessary temporary objects in the following code:

他提供了在以下代码中查找不必要的临时对象:

string FindAddr(list<Employee> emps, string name) 
{
  for (list<Employee>::iterator i = emps.begin(); i != emps.end(); i++)
  {
    if( *i == name )
    {
      return i->addr;
    }
  }
  return "";
}

As one of the example, he recommends to precompute the value of emps.end() before the loop, since there is a temporary object created on every iteration:

作为示例之一,他建议在循环之前预先计算emps.end()的值,因为在每次迭代时都会创建一个临时对象:

For most containers (including list), calling end() returns a temporary object that must be constructed and destroyed. Because the value will not change, recomputing (and reconstructing and redestroying) it on every loop iteration is both needlessly inefficient and unaesthetic. The value should be computed only once, stored in a local object, and reused.

对于大多数容器(包括列表),调用end()返回一个必须构造和销毁的临时对象。因为值不会改变,所以在每次循环迭代中重新计算(并重建和重新描述)它都是不必要的低效和不美观的。该值应仅计算一次,存储在本地对象中,然后重复使用。

And he suggests replacing by the following:

他建议替换以下内容:

list<Employee>::const_iterator end(emps.end());
for (list<Employee>::const_iterator i = emps.begin(); i != end; ++i)

For me, this is unnecessary complication. Even if one replaces ugly type declarations with compact auto, he still gets two lines of code instead of one. Even more, he has this end variable in the outer scope.

对我来说,这是不必要的并发症。即使用紧凑型auto替换丑陋的类型声明,他仍然会得到两行代码而不是一行代码。更重要的是,他在外部范围内有这个结束变量。

I was sure modern compilers will optimize this piece of code anyway, because I'm actually using const_iterator here and it is easy to check whether the loop content is accessing the container somehow. Compilers got smarter within the last 13 years, right?

我确信现代编译器无论如何都会优化这段代码,因为我实际上在这里使用const_iterator并且很容易检查循环内容是否以某种方式访问​​容器。编译器在过去的13年里变得更聪明,对吧?

Anyway, I will prefer the first version with i != emps.end() in most cases, where I'm not so much worried about performance. But I want to know for sure, whether this is a kind of construction I could rely on a compiler to optimize?

无论如何,在大多数情况下,我更喜欢使用i!= emps.end()的第一个版本,我不太担心性能。但我想知道,这是否是一种我可以依靠编译器进行优化的结构?

Update

Thanks for your suggestions on how to make this useless code better. Please note, my question is about compiler, not programming techniques. The only relevant answers for now are from NPE and Ellioh.

感谢您就如何更好地制作这些无用的代码提出建议。请注意,我的问题是关于编译器,而不是编程技术。目前唯一相关的答案来自NPE和Ellioh。

7 个解决方案

#1


9  

UPD: The book you are speaking about has been published in 1999, unless I'm mistaking. That's 14 years ago, and in modern programming 14 years is a lot of time. Many recommendations that were good and reliable in 1999, may be completely obsolete by now. Though my answer is about a single compiler and a single platform, there is also a more general idea.

UPD:你讲的这本书已于1999年出版,除非我误解了。那是14年前,在现代节目中,14年是很多时间。许多在1999年都很好和可靠的建议现在可能已经完全过时了。虽然我的答案是关于单个编译器和单个平台,但还有一个更一般的想法。

Caring about extra variables, reusing a return value of trivial methods and similar tricks of old C++ is a step back towards the C++ of 1990s. Trivial methods like end() should be inlined quite well, and the result of inlining should be optimized as a part of the code it is called from. 99% situations do not require manual actions such as creating an end variable at all. Such things should be done only if:

关注额外的变量,重用微不足道的方法的返回值和旧C ++的类似技巧,是朝着20世纪90年代的C ++迈出的一步。应该很好地内联诸如end()之类的简单方法,并且应该优化内联的结果作为调用它的代码的一部分。 99%的情况不需要手动操作,例如根本不创建结束变量。这样的事情应该只在以下情况下完成

  1. You KNOW that on some of the compilers/platforms you should run on the code is not optimized well.
  2. 您知道在某些编译器/平台上,您应该在代码上运行并没有很好地优化。

  3. It has become a bottleneck in your program ("avoid premature optimization").
  4. 它已成为您程序的瓶颈(“避免过早优化”)。

I've looked at what is generated by 64-bit g++:

我看过64位g ++生成的内容:

gcc version 4.6.3 20120918 (prerelease) (Ubuntu/Linaro 4.6.3-10ubuntu1)

Initially I thought that with optimizations on it should be ok and there should be no difference between two versions. But looks like things are strange: the version you considered non-optimal is actually better. I think, the moral is: there is no reason to try being smarter than a compiler. Let's see both versions.

最初我认为优化它应该没问题,两个版本之间应该没有区别。但看起来很奇怪:你认为非最佳的版本实际上更好。我认为,道德是:没有理由尝试比编译器更聪明。我们来看两个版本。

#include <list>

using namespace std;

int main() {
  list<char> l;
  l.push_back('a');

  for(list<char>::iterator i=l.begin(); i != l.end(); i++)
      ;

  return 0;
}

int main1() {
  list<char> l;
  l.push_back('a');
  list<char>::iterator e=l.end();
  for(list<char>::iterator i=l.begin(); i != e; i++)
      ;

  return 0;
}

Then we should compile this with optimizations on (I use 64-bit g++, you may try your compiler) and disassemble main and main1:

然后我们应该通过优化来编译它(我使用64位g ++,你可以试试你的编译器)并反汇编main和main1:

For main:

(gdb) disas main
Dump of assembler code for function main():
   0x0000000000400650 <+0>: push   %rbx
   0x0000000000400651 <+1>: mov    $0x18,%edi
   0x0000000000400656 <+6>: sub    $0x20,%rsp
   0x000000000040065a <+10>:    lea    0x10(%rsp),%rbx
   0x000000000040065f <+15>:    mov    %rbx,0x10(%rsp)
   0x0000000000400664 <+20>:    mov    %rbx,0x18(%rsp)
   0x0000000000400669 <+25>:    callq  0x400630 <_Znwm@plt>
   0x000000000040066e <+30>:    cmp    $0xfffffffffffffff0,%rax
   0x0000000000400672 <+34>:    je     0x400678 <main()+40>
   0x0000000000400674 <+36>:    movb   $0x61,0x10(%rax)
   0x0000000000400678 <+40>:    mov    %rax,%rdi
   0x000000000040067b <+43>:    mov    %rbx,%rsi
   0x000000000040067e <+46>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>
   0x0000000000400683 <+51>:    mov    0x10(%rsp),%rax
   0x0000000000400688 <+56>:    cmp    %rbx,%rax
   0x000000000040068b <+59>:    je     0x400698 <main()+72>
   0x000000000040068d <+61>:    nopl   (%rax)
   0x0000000000400690 <+64>:    mov    (%rax),%rax
   0x0000000000400693 <+67>:    cmp    %rbx,%rax
   0x0000000000400696 <+70>:    jne    0x400690 <main()+64>
   0x0000000000400698 <+72>:    mov    %rbx,%rdi
   0x000000000040069b <+75>:    callq  0x400840 <std::list<char, std::allocator<char> >::~list()>
   0x00000000004006a0 <+80>:    add    $0x20,%rsp
   0x00000000004006a4 <+84>:    xor    %eax,%eax
   0x00000000004006a6 <+86>:    pop    %rbx
   0x00000000004006a7 <+87>:    retq   

Look at the commands located at 0x0000000000400683-0x000000000040068b. That's the loop body and it seems to be perfectly optimized:

查看位于0x0000000000400683-0x000000000040068b的命令。这是循环体,似乎完美优化:

   0x0000000000400690 <+64>:    mov    (%rax),%rax
   0x0000000000400693 <+67>:    cmp    %rbx,%rax
   0x0000000000400696 <+70>:    jne    0x400690 <main()+64>

For main1:

(gdb) disas main1
Dump of assembler code for function main1():
   0x00000000004007b0 <+0>: push   %rbp
   0x00000000004007b1 <+1>: mov    $0x18,%edi
   0x00000000004007b6 <+6>: push   %rbx
   0x00000000004007b7 <+7>: sub    $0x18,%rsp
   0x00000000004007bb <+11>:    mov    %rsp,%rbx
   0x00000000004007be <+14>:    mov    %rsp,(%rsp)
   0x00000000004007c2 <+18>:    mov    %rsp,0x8(%rsp)
   0x00000000004007c7 <+23>:    callq  0x400630 <_Znwm@plt>
   0x00000000004007cc <+28>:    cmp    $0xfffffffffffffff0,%rax
   0x00000000004007d0 <+32>:    je     0x4007d6 <main1()+38>
   0x00000000004007d2 <+34>:    movb   $0x61,0x10(%rax)
   0x00000000004007d6 <+38>:    mov    %rax,%rdi
   0x00000000004007d9 <+41>:    mov    %rsp,%rsi
   0x00000000004007dc <+44>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>
   0x00000000004007e1 <+49>:    mov    (%rsp),%rdi
   0x00000000004007e5 <+53>:    cmp    %rbx,%rdi
   0x00000000004007e8 <+56>:    je     0x400818 <main1()+104>
   0x00000000004007ea <+58>:    mov    %rdi,%rax
   0x00000000004007ed <+61>:    nopl   (%rax)
   0x00000000004007f0 <+64>:    mov    (%rax),%rax
   0x00000000004007f3 <+67>:    cmp    %rbx,%rax
   0x00000000004007f6 <+70>:    jne    0x4007f0 <main1()+64>
   0x00000000004007f8 <+72>:    mov    (%rdi),%rbp
   0x00000000004007fb <+75>:    callq  0x4005f0 <_ZdlPv@plt>
   0x0000000000400800 <+80>:    cmp    %rbx,%rbp
   0x0000000000400803 <+83>:    je     0x400818 <main1()+104>
   0x0000000000400805 <+85>:    nopl   (%rax)
   0x0000000000400808 <+88>:    mov    %rbp,%rdi
   0x000000000040080b <+91>:    mov    (%rdi),%rbp
   0x000000000040080e <+94>:    callq  0x4005f0 <_ZdlPv@plt>
   0x0000000000400813 <+99>:    cmp    %rbx,%rbp
   0x0000000000400816 <+102>:   jne    0x400808 <main1()+88>
   0x0000000000400818 <+104>:   add    $0x18,%rsp
   0x000000000040081c <+108>:   xor    %eax,%eax
   0x000000000040081e <+110>:   pop    %rbx
   0x000000000040081f <+111>:   pop    %rbp
   0x0000000000400820 <+112>:   retq   

The code for the loop is similar, it is:

循环的代码类似,它是:

   0x00000000004007f0 <+64>:    mov    (%rax),%rax
   0x00000000004007f3 <+67>:    cmp    %rbx,%rax
   0x00000000004007f6 <+70>:    jne    0x4007f0 <main1()+64>

But there is alot of extra stuff around the loop. Apparently, extra code has made the things WORSE.

但循环周围还有很多额外的东西。显然,额外的代码使事情变得更糟糕。

#2


8  

I've compiled the following slightly hacky code using g++ 4.7.2 with -O3 -std=c++11, and got identical assembly for both functions:

我使用带有-O3 -std = c ++ 11的g ++ 4.7.2编译了以下稍微hacky的代码,并为这两个函数获得了相同的程序集:

#include <list>
#include <string>

using namespace std;

struct Employee: public string { string addr; };

string FindAddr1(list<Employee> emps, string name)
{
  for (list<Employee>::const_iterator i = emps.begin(); i != emps.end(); i++)
  {
    if( *i == name )
    {
      return i->addr;
    }
  }
  return "";
}

string FindAddr2(list<Employee> emps, string name)
{
  list<Employee>::const_iterator end(emps.end());
  for (list<Employee>::const_iterator i = emps.begin(); i != end; i++)
  {
    if( *i == name )
    {
      return i->addr;
    }
  }
  return "";
}

In any event, I think the choice between the two versions should be primarily based on grounds of readability. Without profiling data, micro-optimizations like this to me look premature.

无论如何,我认为两个版本之间的选择应主要基于可读性。如果没有分析数据,像我这样的微优化看起来还为时过早。

#3


4  

Contrary to popular belief, I don't see any difference between VC++ and gcc in this respect. I did a quick check with both g++ 4.7.2 and MS C++ 17 (aka VC++ 2012).

与流行的看法相反,我认为VC ++和gcc在这方面没有任何区别。我用g ++ 4.7.2和MS C ++ 17(又名VC ++ 2012)快速检查了一下。

In both cases I compared the code generated with the code as in the question (with headers and such added to let it compile), to the following code:

在这两种情况下,我将生成的代码与问题中的代码进行比较(使用标题添加以便编译),代码如下:

string FindAddr(list<Employee> emps, string name) 
{
    auto end = emps.end();
    for (list<Employee>::iterator i = emps.begin(); i != end; i++)
    {
        if( *i == name )
        {
            return i->addr;
        }
    }
    return "";
}

In both cases the result was essentially identical for the two pieces of code. VC++ includes line-number comments in the code, which changed because of the extra line, but that was the only difference. With g++ the output files were identical.

在这两种情况下,两个代码的结果基本相同。 VC ++在代码中包含行号注释,由于额外的行而改变,但这是唯一的区别。使用g ++,输出文件是相同的。

Doing the same with std::vector instead of std::list, gave pretty much the same result -- no significant difference. For some reason, g++ did switch the order of operands for one instruction, from cmp esi, DWORD PTR [eax+4] to cmp DWORD PTR [eax+4], esi, but (again) this is utterly irrelevant.

对std :: vector而不是std :: list做同样的事情给出了几乎相同的结果 - 没有显着差异。出于某种原因,g ++确实切换了一条指令的操作数顺序,从cmp esi,DWORD PTR [eax + 4]到cmp DWORD PTR [eax + 4],esi,但是(再次)这完全无关紧要。

Bottom line: no, you're not likely to gain anything from manually hoisting the code out of the loop with a modern compiler (at least with optimization enabled -- I was using /O2b2 with VC++ and /O3 with g++; comparing optimization with optimization turned off seems pretty pointless to me).

结论:不,你不可能通过现代编译器手动将代码从循环中提取出来(至少启用了优化 - 我使用/ O2b2和VC ++和/ O3和g ++;比较优化与关闭优化似乎对我来说毫无意义)。

#4


3  

A couple of things... the first is that in general the cost of building an iterator (in Release mode, unchecked allocators) is minimal. They are usually wrappers around a pointer. With checked allocators (default in VS) you might have some cost, but if you really need the performance, after testing rebuild with unchecked allocators.

一些事情......首先,通常构建迭代器的成本(在发布模式下,未经检查的分配器)是最小的。它们通常是指针周围的包装器。使用已检查的分配器(在VS中为默认值),您可能会有一些成本,但如果确实需要性能,则在使用未经检查的分配器测试重建之后。

The code need not be as ugly as what you posted:

代码不必像你发布的那样难看:

for (list<Employee>::const_iterator it=emps.begin(), end=emps.end(); 
                                    it != end; ++it )

The main decision on whether you want to use one or the other approaches should be in terms of what operations are being applied to the container. If the container might be changing it's size then you might want to recompute the end iterator in each iteration. If not, you can just precompute once and reuse as in the code above.

关于是否要使用一种或另一种方法的主要决定应该是对容器应用的操作。如果容器可能正在改变它的大小,那么您可能希望在每次迭代中重新计算结束迭代器。如果没有,您可以预先计算一次并重复使用,如上面的代码所示。

#5


2  

Containers like vector returns variable, which stores pointer to the end, on end() call, that optimized. If you've written container which does some lookups, etc on end() call consider writing

像vector这样的容器返回变量,它在end()调用中存储指向结尾的指针,该变量已经过优化。如果你已经编写了容器,它会在end()调用中执行一些查找等,请考虑编写

for (list<Employee>::const_iterator i = emps.begin(), end = emps.end(); i != end; ++i)
{
...
}

for speed

#6


2  

If you really need the performance, you let your shiny new C++11 compiler write it for you:

如果你真的需要性能,那么让你的闪亮的新C ++ 11编译器为你编写它:

for (const auto &i : emps) {
    /* ... */
}

Yes, this is tongue-in-cheek (sort of). Herb's example here is now out of date. But since your compiler doesn't support it yet, let's get to the real question:

是的,这是诙谐的(有点)。赫伯的例子现在已经过时了。但由于你的编译器还不支持它,让我们来看看真正的问题:

Is this a kind of construction I could rely on a compiler to optimize?

这是一种我可以依靠编译器进行优化的构造吗?

My rule of thumb is that the compiler writers are way smarter than I am. I can't rely on a compiler to optimize any one piece of code, because it might choose to optimize something else that has a bigger impact. The only way to know for sure is to try out both approaches on your compiler on your system and see what happens. Check your profiler results. If the call to .end() sticks out, save it in a separate variable. Otherwise, don't worry about it.

我的经验法则是编译器编写者比我更聪明。我不能依赖编译器来优化任何一段代码,因为它可能会选择优化其他具有更大影响的东西。确切知道的唯一方法是在系统上的编译器上尝试这两种方法,看看会发生什么。检查您的探查器结果。如果对.end()的调用突出,请将其保存在单独的变量中。否则,不要担心。

#7


0  

Use std algorithms

He's right of course; calling end can instantiate and destroy a temporary object, which is generally bad.

他是对的;调用end可以实例化和销毁一个临时对象,这通常很糟糕。

Of course, the compiler can optimise this away in a lot of cases.

当然,编译器可以在很多情况下优化它。

There is a better and more robust solution: encapsulate your loops.

有一个更好,更强大的解决方案:封装你的循环。

The example you gave is in fact std::find, give or take the return value. Many other loops also have std algorithms, or at least something similar enough that you can adapt - my utility library has a transform_if implementation, for example.

你给出的例子实际上是std :: find,give或take返回值。许多其他循环也有std算法,或者至少类似的东西你可以适应 - 我的实用程序库有一个transform_if实现,例如。

So, hide loops in a function and take a const& to end. Same fix as your example, but much much cleaner.

因此,隐藏函数中的循环并使const结束。与您的示例相同,但更清洁。

#1


9  

UPD: The book you are speaking about has been published in 1999, unless I'm mistaking. That's 14 years ago, and in modern programming 14 years is a lot of time. Many recommendations that were good and reliable in 1999, may be completely obsolete by now. Though my answer is about a single compiler and a single platform, there is also a more general idea.

UPD:你讲的这本书已于1999年出版,除非我误解了。那是14年前,在现代节目中,14年是很多时间。许多在1999年都很好和可靠的建议现在可能已经完全过时了。虽然我的答案是关于单个编译器和单个平台,但还有一个更一般的想法。

Caring about extra variables, reusing a return value of trivial methods and similar tricks of old C++ is a step back towards the C++ of 1990s. Trivial methods like end() should be inlined quite well, and the result of inlining should be optimized as a part of the code it is called from. 99% situations do not require manual actions such as creating an end variable at all. Such things should be done only if:

关注额外的变量,重用微不足道的方法的返回值和旧C ++的类似技巧,是朝着20世纪90年代的C ++迈出的一步。应该很好地内联诸如end()之类的简单方法,并且应该优化内联的结果作为调用它的代码的一部分。 99%的情况不需要手动操作,例如根本不创建结束变量。这样的事情应该只在以下情况下完成

  1. You KNOW that on some of the compilers/platforms you should run on the code is not optimized well.
  2. 您知道在某些编译器/平台上,您应该在代码上运行并没有很好地优化。

  3. It has become a bottleneck in your program ("avoid premature optimization").
  4. 它已成为您程序的瓶颈(“避免过早优化”)。

I've looked at what is generated by 64-bit g++:

我看过64位g ++生成的内容:

gcc version 4.6.3 20120918 (prerelease) (Ubuntu/Linaro 4.6.3-10ubuntu1)

Initially I thought that with optimizations on it should be ok and there should be no difference between two versions. But looks like things are strange: the version you considered non-optimal is actually better. I think, the moral is: there is no reason to try being smarter than a compiler. Let's see both versions.

最初我认为优化它应该没问题,两个版本之间应该没有区别。但看起来很奇怪:你认为非最佳的版本实际上更好。我认为,道德是:没有理由尝试比编译器更聪明。我们来看两个版本。

#include <list>

using namespace std;

int main() {
  list<char> l;
  l.push_back('a');

  for(list<char>::iterator i=l.begin(); i != l.end(); i++)
      ;

  return 0;
}

int main1() {
  list<char> l;
  l.push_back('a');
  list<char>::iterator e=l.end();
  for(list<char>::iterator i=l.begin(); i != e; i++)
      ;

  return 0;
}

Then we should compile this with optimizations on (I use 64-bit g++, you may try your compiler) and disassemble main and main1:

然后我们应该通过优化来编译它(我使用64位g ++,你可以试试你的编译器)并反汇编main和main1:

For main:

(gdb) disas main
Dump of assembler code for function main():
   0x0000000000400650 <+0>: push   %rbx
   0x0000000000400651 <+1>: mov    $0x18,%edi
   0x0000000000400656 <+6>: sub    $0x20,%rsp
   0x000000000040065a <+10>:    lea    0x10(%rsp),%rbx
   0x000000000040065f <+15>:    mov    %rbx,0x10(%rsp)
   0x0000000000400664 <+20>:    mov    %rbx,0x18(%rsp)
   0x0000000000400669 <+25>:    callq  0x400630 <_Znwm@plt>
   0x000000000040066e <+30>:    cmp    $0xfffffffffffffff0,%rax
   0x0000000000400672 <+34>:    je     0x400678 <main()+40>
   0x0000000000400674 <+36>:    movb   $0x61,0x10(%rax)
   0x0000000000400678 <+40>:    mov    %rax,%rdi
   0x000000000040067b <+43>:    mov    %rbx,%rsi
   0x000000000040067e <+46>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>
   0x0000000000400683 <+51>:    mov    0x10(%rsp),%rax
   0x0000000000400688 <+56>:    cmp    %rbx,%rax
   0x000000000040068b <+59>:    je     0x400698 <main()+72>
   0x000000000040068d <+61>:    nopl   (%rax)
   0x0000000000400690 <+64>:    mov    (%rax),%rax
   0x0000000000400693 <+67>:    cmp    %rbx,%rax
   0x0000000000400696 <+70>:    jne    0x400690 <main()+64>
   0x0000000000400698 <+72>:    mov    %rbx,%rdi
   0x000000000040069b <+75>:    callq  0x400840 <std::list<char, std::allocator<char> >::~list()>
   0x00000000004006a0 <+80>:    add    $0x20,%rsp
   0x00000000004006a4 <+84>:    xor    %eax,%eax
   0x00000000004006a6 <+86>:    pop    %rbx
   0x00000000004006a7 <+87>:    retq   

Look at the commands located at 0x0000000000400683-0x000000000040068b. That's the loop body and it seems to be perfectly optimized:

查看位于0x0000000000400683-0x000000000040068b的命令。这是循环体,似乎完美优化:

   0x0000000000400690 <+64>:    mov    (%rax),%rax
   0x0000000000400693 <+67>:    cmp    %rbx,%rax
   0x0000000000400696 <+70>:    jne    0x400690 <main()+64>

For main1:

(gdb) disas main1
Dump of assembler code for function main1():
   0x00000000004007b0 <+0>: push   %rbp
   0x00000000004007b1 <+1>: mov    $0x18,%edi
   0x00000000004007b6 <+6>: push   %rbx
   0x00000000004007b7 <+7>: sub    $0x18,%rsp
   0x00000000004007bb <+11>:    mov    %rsp,%rbx
   0x00000000004007be <+14>:    mov    %rsp,(%rsp)
   0x00000000004007c2 <+18>:    mov    %rsp,0x8(%rsp)
   0x00000000004007c7 <+23>:    callq  0x400630 <_Znwm@plt>
   0x00000000004007cc <+28>:    cmp    $0xfffffffffffffff0,%rax
   0x00000000004007d0 <+32>:    je     0x4007d6 <main1()+38>
   0x00000000004007d2 <+34>:    movb   $0x61,0x10(%rax)
   0x00000000004007d6 <+38>:    mov    %rax,%rdi
   0x00000000004007d9 <+41>:    mov    %rsp,%rsi
   0x00000000004007dc <+44>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>
   0x00000000004007e1 <+49>:    mov    (%rsp),%rdi
   0x00000000004007e5 <+53>:    cmp    %rbx,%rdi
   0x00000000004007e8 <+56>:    je     0x400818 <main1()+104>
   0x00000000004007ea <+58>:    mov    %rdi,%rax
   0x00000000004007ed <+61>:    nopl   (%rax)
   0x00000000004007f0 <+64>:    mov    (%rax),%rax
   0x00000000004007f3 <+67>:    cmp    %rbx,%rax
   0x00000000004007f6 <+70>:    jne    0x4007f0 <main1()+64>
   0x00000000004007f8 <+72>:    mov    (%rdi),%rbp
   0x00000000004007fb <+75>:    callq  0x4005f0 <_ZdlPv@plt>
   0x0000000000400800 <+80>:    cmp    %rbx,%rbp
   0x0000000000400803 <+83>:    je     0x400818 <main1()+104>
   0x0000000000400805 <+85>:    nopl   (%rax)
   0x0000000000400808 <+88>:    mov    %rbp,%rdi
   0x000000000040080b <+91>:    mov    (%rdi),%rbp
   0x000000000040080e <+94>:    callq  0x4005f0 <_ZdlPv@plt>
   0x0000000000400813 <+99>:    cmp    %rbx,%rbp
   0x0000000000400816 <+102>:   jne    0x400808 <main1()+88>
   0x0000000000400818 <+104>:   add    $0x18,%rsp
   0x000000000040081c <+108>:   xor    %eax,%eax
   0x000000000040081e <+110>:   pop    %rbx
   0x000000000040081f <+111>:   pop    %rbp
   0x0000000000400820 <+112>:   retq   

The code for the loop is similar, it is:

循环的代码类似,它是:

   0x00000000004007f0 <+64>:    mov    (%rax),%rax
   0x00000000004007f3 <+67>:    cmp    %rbx,%rax
   0x00000000004007f6 <+70>:    jne    0x4007f0 <main1()+64>

But there is alot of extra stuff around the loop. Apparently, extra code has made the things WORSE.

但循环周围还有很多额外的东西。显然,额外的代码使事情变得更糟糕。

#2


8  

I've compiled the following slightly hacky code using g++ 4.7.2 with -O3 -std=c++11, and got identical assembly for both functions:

我使用带有-O3 -std = c ++ 11的g ++ 4.7.2编译了以下稍微hacky的代码,并为这两个函数获得了相同的程序集:

#include <list>
#include <string>

using namespace std;

struct Employee: public string { string addr; };

string FindAddr1(list<Employee> emps, string name)
{
  for (list<Employee>::const_iterator i = emps.begin(); i != emps.end(); i++)
  {
    if( *i == name )
    {
      return i->addr;
    }
  }
  return "";
}

string FindAddr2(list<Employee> emps, string name)
{
  list<Employee>::const_iterator end(emps.end());
  for (list<Employee>::const_iterator i = emps.begin(); i != end; i++)
  {
    if( *i == name )
    {
      return i->addr;
    }
  }
  return "";
}

In any event, I think the choice between the two versions should be primarily based on grounds of readability. Without profiling data, micro-optimizations like this to me look premature.

无论如何,我认为两个版本之间的选择应主要基于可读性。如果没有分析数据,像我这样的微优化看起来还为时过早。

#3


4  

Contrary to popular belief, I don't see any difference between VC++ and gcc in this respect. I did a quick check with both g++ 4.7.2 and MS C++ 17 (aka VC++ 2012).

与流行的看法相反,我认为VC ++和gcc在这方面没有任何区别。我用g ++ 4.7.2和MS C ++ 17(又名VC ++ 2012)快速检查了一下。

In both cases I compared the code generated with the code as in the question (with headers and such added to let it compile), to the following code:

在这两种情况下,我将生成的代码与问题中的代码进行比较(使用标题添加以便编译),代码如下:

string FindAddr(list<Employee> emps, string name) 
{
    auto end = emps.end();
    for (list<Employee>::iterator i = emps.begin(); i != end; i++)
    {
        if( *i == name )
        {
            return i->addr;
        }
    }
    return "";
}

In both cases the result was essentially identical for the two pieces of code. VC++ includes line-number comments in the code, which changed because of the extra line, but that was the only difference. With g++ the output files were identical.

在这两种情况下,两个代码的结果基本相同。 VC ++在代码中包含行号注释,由于额外的行而改变,但这是唯一的区别。使用g ++,输出文件是相同的。

Doing the same with std::vector instead of std::list, gave pretty much the same result -- no significant difference. For some reason, g++ did switch the order of operands for one instruction, from cmp esi, DWORD PTR [eax+4] to cmp DWORD PTR [eax+4], esi, but (again) this is utterly irrelevant.

对std :: vector而不是std :: list做同样的事情给出了几乎相同的结果 - 没有显着差异。出于某种原因,g ++确实切换了一条指令的操作数顺序,从cmp esi,DWORD PTR [eax + 4]到cmp DWORD PTR [eax + 4],esi,但是(再次)这完全无关紧要。

Bottom line: no, you're not likely to gain anything from manually hoisting the code out of the loop with a modern compiler (at least with optimization enabled -- I was using /O2b2 with VC++ and /O3 with g++; comparing optimization with optimization turned off seems pretty pointless to me).

结论:不,你不可能通过现代编译器手动将代码从循环中提取出来(至少启用了优化 - 我使用/ O2b2和VC ++和/ O3和g ++;比较优化与关闭优化似乎对我来说毫无意义)。

#4


3  

A couple of things... the first is that in general the cost of building an iterator (in Release mode, unchecked allocators) is minimal. They are usually wrappers around a pointer. With checked allocators (default in VS) you might have some cost, but if you really need the performance, after testing rebuild with unchecked allocators.

一些事情......首先,通常构建迭代器的成本(在发布模式下,未经检查的分配器)是最小的。它们通常是指针周围的包装器。使用已检查的分配器(在VS中为默认值),您可能会有一些成本,但如果确实需要性能,则在使用未经检查的分配器测试重建之后。

The code need not be as ugly as what you posted:

代码不必像你发布的那样难看:

for (list<Employee>::const_iterator it=emps.begin(), end=emps.end(); 
                                    it != end; ++it )

The main decision on whether you want to use one or the other approaches should be in terms of what operations are being applied to the container. If the container might be changing it's size then you might want to recompute the end iterator in each iteration. If not, you can just precompute once and reuse as in the code above.

关于是否要使用一种或另一种方法的主要决定应该是对容器应用的操作。如果容器可能正在改变它的大小,那么您可能希望在每次迭代中重新计算结束迭代器。如果没有,您可以预先计算一次并重复使用,如上面的代码所示。

#5


2  

Containers like vector returns variable, which stores pointer to the end, on end() call, that optimized. If you've written container which does some lookups, etc on end() call consider writing

像vector这样的容器返回变量,它在end()调用中存储指向结尾的指针,该变量已经过优化。如果你已经编写了容器,它会在end()调用中执行一些查找等,请考虑编写

for (list<Employee>::const_iterator i = emps.begin(), end = emps.end(); i != end; ++i)
{
...
}

for speed

#6


2  

If you really need the performance, you let your shiny new C++11 compiler write it for you:

如果你真的需要性能,那么让你的闪亮的新C ++ 11编译器为你编写它:

for (const auto &i : emps) {
    /* ... */
}

Yes, this is tongue-in-cheek (sort of). Herb's example here is now out of date. But since your compiler doesn't support it yet, let's get to the real question:

是的,这是诙谐的(有点)。赫伯的例子现在已经过时了。但由于你的编译器还不支持它,让我们来看看真正的问题:

Is this a kind of construction I could rely on a compiler to optimize?

这是一种我可以依靠编译器进行优化的构造吗?

My rule of thumb is that the compiler writers are way smarter than I am. I can't rely on a compiler to optimize any one piece of code, because it might choose to optimize something else that has a bigger impact. The only way to know for sure is to try out both approaches on your compiler on your system and see what happens. Check your profiler results. If the call to .end() sticks out, save it in a separate variable. Otherwise, don't worry about it.

我的经验法则是编译器编写者比我更聪明。我不能依赖编译器来优化任何一段代码,因为它可能会选择优化其他具有更大影响的东西。确切知道的唯一方法是在系统上的编译器上尝试这两种方法,看看会发生什么。检查您的探查器结果。如果对.end()的调用突出,请将其保存在单独的变量中。否则,不要担心。

#7


0  

Use std algorithms

He's right of course; calling end can instantiate and destroy a temporary object, which is generally bad.

他是对的;调用end可以实例化和销毁一个临时对象,这通常很糟糕。

Of course, the compiler can optimise this away in a lot of cases.

当然,编译器可以在很多情况下优化它。

There is a better and more robust solution: encapsulate your loops.

有一个更好,更强大的解决方案:封装你的循环。

The example you gave is in fact std::find, give or take the return value. Many other loops also have std algorithms, or at least something similar enough that you can adapt - my utility library has a transform_if implementation, for example.

你给出的例子实际上是std :: find,give或take返回值。许多其他循环也有std算法,或者至少类似的东西你可以适应 - 我的实用程序库有一个transform_if实现,例如。

So, hide loops in a function and take a const& to end. Same fix as your example, but much much cleaner.

因此,隐藏函数中的循环并使const结束。与您的示例相同,但更清洁。