C＃中的++ i和i ++之间是否存在任何性能差异？

Is there any performance difference between using something like

使用类似的东西是否有任何性能差异

for(int i = 0; i < 10; i++) { ... }

and

for(int i = 0; i < 10; ++i) { ... }

or is the compiler able to optimize in such a way that they are equally fast in the case where they are functionally equivalent?

或者编译器是否能够以这样的方式进行优化,使它们在功能相同的情况下同样快速?

Edit: This was asked because I had a discussion with a co-worker about it, not because I think its a useful optimization in any practical sense. It is largely academic.

编辑:这是因为我与同事讨论过这个问题,而不是因为我认为它在任何实际意义上都是有用的优化。它主要是学术性的。

9 个解决方案

#1

There is no difference in the generated intermediate code for ++i and i++ in this case. Given this program:

在这种情况下,生成的++ i和i ++中间代码没有区别。鉴于此计划:

class Program
{
    const int counter = 1024 * 1024;
    static void Main(string[] args)
    {
        for (int i = 0; i < counter; ++i)
        {
            Console.WriteLine(i);
        }

        for (int i = 0; i < counter; i++)
        {
            Console.WriteLine(i);
        }
    }
}

The generated IL code is the same for both loops:

两个循环生成的IL代码相同:

  IL_0000:  ldc.i4.0
  IL_0001:  stloc.0
  // Start of first loop
  IL_0002:  ldc.i4.0
  IL_0003:  stloc.0
  IL_0004:  br.s       IL_0010
  IL_0006:  ldloc.0
  IL_0007:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_000c:  ldloc.0
  IL_000d:  ldc.i4.1
  IL_000e:  add
  IL_000f:  stloc.0
  IL_0010:  ldloc.0
  IL_0011:  ldc.i4     0x100000
  IL_0016:  blt.s      IL_0006
  // Start of second loop
  IL_0018:  ldc.i4.0
  IL_0019:  stloc.0
  IL_001a:  br.s       IL_0026
  IL_001c:  ldloc.0
  IL_001d:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_0022:  ldloc.0
  IL_0023:  ldc.i4.1
  IL_0024:  add
  IL_0025:  stloc.0
  IL_0026:  ldloc.0
  IL_0027:  ldc.i4     0x100000
  IL_002c:  blt.s      IL_001c
  IL_002e:  ret

That said, it's possible (although highly unlikely) that the JIT compiler can do some optimizations in certain contexts that will favor one version over the other. If there is such an optimization, though, it would likely only affect the final (or perhaps the first) iteration of a loop.

也就是说,JIT编译器可以(尽管极不可能)在某些上下文中进行一些优化,这些上下文有利于一个版本而不是另一个版本。但是,如果存在这样的优化,则可能仅影响循环的最终(或可能是第一次)迭代。

In short, there will be no difference in the runtime of simple pre-increment or post-increment of the control variable in the looping construct that you've described.

简而言之,在您描述的循环结构中,控制变量的简单预增量或后增量的运行时间没有区别。

#2

Ah... Open again. OK. Here's the deal.

啊......再打开一次。好。这是交易。

ILDASM is a start, but not an end. The key is: What will the JIT generate for assembly code?

ILDASM是一个开始,但不是目的。关键是:JIT将为汇编代码生成什么?

Here's what you want to do.

这是你想要做的。

Take a couple samples of what you are trying to look at. Obviously you can wall-clock time them if you want - but I assume you want to know more than that.

拿几个你想要看的样品。显然,如果你愿意,你可以按时间计时 - 但我认为你想了解更多。

Here's what's not obvious. The C# compiler generates some MSIL sequences that are non-optimal in a lot of situations. The JIT it tuned to deal with these and quirks from other languages. The problem: Only 'quirks' someone has noticed have been tuned.

这是不明显的。 C#编译器会生成一些在很多情况下都不是最优的MSIL序列。它调整的JIT用于处理来自其他语言的这些和怪癖。问题是:只有有人注意到的'怪癖'已被调整。

You really want to make a sample that has your implementations to try, returns back up to main (or wherever), Sleep()s, or something where you can attach a debugger, then run the routines again.

您真的想要制作一个具有您的实现的示例,返回到main(或任何地方),Sleep()或可以附加调试器的内容,然后再次运行例程。

You DO NOT want to start the code under the debugger or the JIT will generate non-optimized code - and it sounds like you want to know how it will behave in a real environment. The JIT does this to maximize debug info and minimize the current source location from 'jumping around'. Never start a perf evaluation under the debugger.

您不希望在调试器下启动代码,或者JIT将生成非优化代码 - 听起来您想知道它在真实环境中的行为方式。 JIT执行此操作以最大化调试信息并最小化当前源位置“跳转”。永远不要在调试器下启动性能评估。

OK. So once the code has run once (ie: The JIT has generated code for it), then attach the debugger during the sleep (or whatever). Then look at the x86/x64 that was generated for the two routines.

好。因此,一旦代码运行一次(即:JIT已为其生成代码),则在睡眠期间(或其他)附加调试器。然后查看为这两个例程生成的x86 / x64。

My gut tells me that if you are using ++i/i++ as you described - ie: in a stand alone expression where the rvalue result is not re-used - there won't be a difference. But won't it be fun to go find out and see all the neat stuff! :)

我的直觉告诉我,如果你正如你所描述的那样使用++ i / i ++ - 即:在一个独立的表达式中,rvalue结果不会被重复使用 - 那就没有区别了。但是去看看所有整洁的东西不是很有趣! :)

#3

As Jim Mischel has shown, the compiler will generate identical MSIL for the two ways of writing the for-loop.

正如Jim Mischel所示,编译器将为两种写入for循环的方式生成相同的MSIL。

But that is it then: there is no reason to speculate about the JIT or perform speed-measurements. If the two lines of code generate identical MSIL, not only will they perform identically, they are effectively identical.

但就是这样:没有理由推测JIT或进行速度测量。如果两行代码生成相同的MSIL,它们不仅执行相同,而且实际上是相同的。

No possible JIT would be able to distinguish between the loops, so the generated machine code must necessarily be identical, too.

没有可能的JIT能够区分循环,因此生成的机器代码也必须是相同的。

#4

If you're asking this question, you're trying to solve the wrong problem.

如果你问这个问题,那你就试图解决错误的问题。

The first question to ask is "how to I improve customer satisfaction with my software by making it run faster?" and the answer is almost never "use ++i instead of i++" or vice versa.

要问的第一个问题是“如何通过让我的软件运行更快来提高客户对软件的满意度?”答案几乎从不“使用++ i代替i ++”,反之亦然。

From Coding Horror's post "Hardware is Cheap, Programmers are Expensive":

来自Coding Horror的帖子“硬件便宜,程序员很贵”:

Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
-- M.A. Jackson

优化规则:规则1:不要这样做。规则2(仅限专家):不要这样做。 - M.A.杰克逊

I read rule 2 to mean "first write clean, clear code that meets your customer's needs, then speed it up where it's too slow". It's highly unlikely that ++i vs. i++ is going to be the solution.

我将规则2理解为“首先编写干净,清晰的代码以满足客户的需求,然后在速度太慢的情况下加速”。 ++ i与i ++不太可能成为解决方案。

#5

Guys, guys, the "answers" are for C and C++.

伙计们,伙计们,“答案”适用于C和C ++。

C# is a different animal.

C#是一种不同的动物。

Use ILDASM to look at the compiled output to verify if there is an MSIL difference.

使用ILDASM查看已编译的输出以验证是否存在MSIL差异。

#6

Have a concrete piece of code and CLR release in mind? If so, benchmark it. If not, forget about it. Micro-optimization, and all that... Besides, you can't even be sure different CLR release will produce the same result.

有一段具体的代码和CLR版本吗?如果是这样,请对其进如果没有,请忘掉它。微优化,以及所有......除此之外,您甚至无法确定不同的CLR版本是否会产生相同的结果。

#7

In addition to other answers, there can be a difference if your i is not an int. In C++, if it is an object of a class that has operators ++() and ++(int) overloaded, then it can make a difference, and possibly a side effect. Performance of ++i should be better in this case (dependant on the implementation).

除了其他答案,如果你的i不是int,可能会有所不同。在C ++中,如果它是一个类的对象,其中运算符++()和++(int)被重载,那么它可以产生差异,并可能产生副作用。在这种情况下,++ i的性能应该更好(取决于实现)。

#8

According to this answer, i++ uses one CPU instruction more than ++i. But whether this results in a performance difference, I don't know.

根据这个答案,i ++比++ i使用一个CPU指令。但是,这是否会导致性能差异,我不知道。

Since either loop can easily be rewritten to use either a post-increment or a pre-increment, I guess that the compiler will always use the more efficient version.

由于可以轻松地重写任一循环以使用后增量或预增量,我猜编译器将始终使用更高效的版本。

#9

  static void Main(string[] args) {
     var sw = new Stopwatch(); sw.Start();
     for (int i = 0; i < 2000000000; ++i) { }
     //int i = 0;
     //while (i < 2000000000){++i;}
     Console.WriteLine(sw.ElapsedMilliseconds);

Average from 3 runs:
for with i++: 1307 for with ++i: 1314

3次运行的平均值:对于i ++:1307对于++ i:1314

while with i++ : 1261 while with ++i : 1276

与i ++:1261同时使用++ i:1276

That's a Celeron D at 2,53 Ghz. Each iteration took about 1.6 CPU cycles. That either means that the CPU was executing more than 1 instruction each cycle or that the JIT compiler unrolled the loops. The difference between i++ and ++i was only 0.01 CPU cycles per iteration, probably caused by the OS services in the background.

这是赛扬D在2,53 Ghz。每次迭代大约需要1.6个CPU周期。这或者意味着CPU每个周期执行多于1条指令,或者JIT编译器展开循环。 i ++和++ i之间的差异是每次迭代只有0.01个CPU周期,可能是由后台的OS服务引起的。

#1