为什么不同(!=,)快于等于(=,==)?

时间:2021-07-05 20:56:04

I've seen comments on SO saying "<> is faster than =" or "!= faster than ==" in an if() statement.

我在SO(if>)语句中看到过对SO的评论说“<>比=更快”或“!=快于==”。

I'd like to know why is that so. Could you show an example in asm?

我想知道为什么会这样。你能在asm中展示一个例子吗?

Thanks! :)

EDIT:

Source

Here is what he did.

这就是他所做的。

  function Check(var MemoryData:Array of byte;MemorySignature:Array of byte;Position:integer):boolean;
   var i:byte;
   begin
    Result := True; //moved at top. Your function always returned 'True'. This is what you wanted?
    for i := 0 to Length(MemorySignature) - 1 do //are you sure??? Perhaps you want High(MemorySignature) here... 
    begin
{!}  if MemorySignature[i] <> $FF then //speedup - '<>' evaluates faster than '='
     begin
      Result:=memorydata[i + position] <> MemorySignature[i]; //speedup.
      if not Result then 
        Break; //added this! - speedup. We already know the result. So, no need to scan till end.
     end;
    end;
   end;

12 个解决方案

#1


I'd claim that this is flat out wrong except perhaps in very special circumstances. Compilers can refactor one into the other effortlessly (by just switching the if and else cases).

除非在非常特殊的情况下,否则我会声称这是错误的。编译器可以毫不费力地将一个重构为另一个(通过切换if和else的情况)。

#2


It could have something to do with branch prediction on the CPU. Static branch prediction would predict that a branch simply wouldn't be taken and fetch the next instruction. However, hardly anybody uses that anymore. Other than that, I'd say it's bull because the comparisons should be identical.

它可能与CPU上的分支预测有关。静态分支预测将预测不会采用分支并获取下一条指令。但是,几乎没有人再使用它了。除此之外,我会说这是公牛,因为比较应该是相同的。

#3


I think there's some confusion in your previous question about what the algorithm was that you were trying to implement, and therefore in what the claimed "speedup" purports to do.

我认为您之前的问题中存在一些混淆,即您试图实施的算法是什么,因此声称“加速”声称要做什么。

Here's some disassembly from Delphi 2007. optimization on. (Note, optimization off changed the code a little, but not in a relevant way.

这是Delphi 2007的一些反汇编。优化。 (注意,优化关闭稍微改变了代码,但不是以相关的方式。

Unit70.pas.31: for I := 0 to 100 do
004552B5 33C0             xor eax,eax
Unit70.pas.33: if i = j then
004552B7 3B02             cmp eax,[edx]
004552B9 7506             jnz $004552c1
Unit70.pas.34: k := k+1;
004552BB FF05D0DC4500     inc dword ptr [$0045dcd0]
Unit70.pas.35: if i <> j then
004552C1 3B02             cmp eax,[edx]
004552C3 7406             jz $004552cb
Unit70.pas.36: l := l + 1;
004552C5 FF05D4DC4500     inc dword ptr [$0045dcd4]
Unit70.pas.37: end;
004552CB 40               inc eax
Unit70.pas.31: for I := 0 to 100 do
004552CC 83F865           cmp eax,$65
004552CF 75E6             jnz $004552b7
Unit70.pas.38: end;
004552D1 C3               ret 

As you can see, the only difference between the two cases is a jz vs. a jnz instruction. These WILL run at the same speed. what's likely to affect things much more is how often the branch is taken, and if the entire loop fits into cache.

如您所见,两种情况之间的唯一区别是jz与jnz指令。这些将以相同的速度运行。可能更多地影响事物的是分支被占用的频率,以及整个循环是否适合缓存。

#4


For .Net languages

If you look at the IL from the string.op_Equality and string.op_Inequality methods, you will see that both internall call string.Equals.

如果从string.op_Equality和string.op_Inequality方法查看IL,您将看到两个内部调用string.Equals。

But the op_Inequality inverts the result. This is two IL-statements more.

但是op_Inequality反转了结果。这是两个IL声明。

I would say they the performance is the same, with maybe a small (very small, very very small) better performance for the == statement. But I believe that the optimizer & JIT compiler will remove this.

我会说它们的性能是相同的,对于==语句可能有一个小的(非常小的,非常小的)更好的性能。但我相信优化器和JIT编译器会删除它。

#5


Spontaneous though; most other things in your code will affect performance more than the choice between == and != (or = and <> depending on language).

虽然自发;代码中的大多数其他内容会影响性能,而不是==和!=(或=和<>取决于语言)之间的选择。

When I ran a test in C# over 1000000 iterations of comparing strings (containing the alphabet, a-z, with the last two letters reversed in one of them), the difference was between 0 an 1 milliseconds.

当我在C#中进行1000000次迭代比较字符串测试时(包含字母表,a-z,其中一个字母反转最后两个字母),差异在0到1毫秒之间。

It has been said before: write code for readability; change into more performant code when it has been established that it will make a difference.

之前已经说过:编写可读性代码;当已经确定它将产生影响时,转换为更高性能的代码。

Edit: repeated the same test with byte arrays; same thing; the performance difference is neglectible.

编辑:用字节数组重复相同的测试;一样;性能差异是可以忽略的。

#6


It could also be a result of misinterpretation of an experiment.

这也可能是对实验的误解的结果。

Most compilers/optimizers assume a branch is taken by default. If you invert the operator and the if-then-else order, and the branch that is now taken is the ELSE clause, that might cause an additional speed effect in highly calculating code (*)

大多数编译器/优化器都假设默认采用分支。如果反转运算符和if-then-else顺序,并且现在采用的分支是ELSE子句,那么可能会在高度计算代码中导致额外的速度效应(*)

(*) obviously you need to do a lot of operations for that. But it can matter for the tightest loops in e.g. codecs or image analysis/machine vision where you have 50MByte/s of data to trawl through. .... and then I even only stoop to this level for the really heavily reusable code. For ordinary business code it is not worth it.

(*)显然你需要为此做很多操作。但是对于例如最紧密的环路来说,它可能很重要。编解码器或图像分析/机器视觉,您可以在其中传输50MByte / s的数据。 ....然后我甚至只是为了真正重复使用的代码而屈服于这个级别。对于普通的商业代码,它是不值得的。

#7


I'd claim this was flat out wrong full stop. The test for equality is always the same as the test for inequality. With string (or complex structure testing), you're always going to break at exactly the same point. Until that break point is reached, then the answer for equality is unknown.

我声称这是错误的完全停止。平等的测试总是与不平等的测试相同。使用字符串(或复杂的结构测试),您总是会在完全相同的点上中断。在达到这个突破点之前,平等的答案是未知的。

#8


I strongly doubt there is any speed difference. For integral types for example you are getting a CMP instruction and either JZ (Jump if zero) or JNZ (Jump if not zero), depending on whether you used = or ≠. There is no speed difference here and I'd expect that to hold true at higher levels too.

我强烈怀疑是否存在任何速度差异。例如,对于整数类型,您将获得CMP指令和JZ(如果为零则跳转)或JNZ(如果不为零则跳转),具体取决于您使用的是=还是≠。这里没有速度差异,我希望在更高的水平上保持正确。

#9


If you can provide a small example that clearly shows a difference, then I'm sure the Stack Overflow community could explain why. However, I think you might have difficulty constructing a clear example. I don't think there will be any performance difference noticeable at any reasonable scale.

如果你能提供一个清楚显示差异的小例子,那么我确信Stack Overflow社区可以解释原因。但是,我认为你可能难以构建一个明确的例子。我认为在任何合理的规模上都不会有明显的性能差异。

#10


Well it could be or it couldn't be, that is the question :-) The thing is this is highly depending on the programming language you are using. Since all your statements will eventually end up as instructions to the CPU, the one that uses the least amount of instruction to achieve the result will be the fastest.

好吧它可能是或者它不可能,这就是问题:-)事情是这取决于你正在使用的编程语言。由于所有语句最终都将作为CPU的指令结束,因此使用最少量指令来实现结果的语句将是最快的。

For example if you say bits x is equal to bits y, you could use the instruction that does an XOR using both bits as an input, if the result is anything but 0 it is not the same. So how would you know that the result is anything but 0? By using the instruction that returns true if you say input a is bigger than 0.

例如,如果你说位x等于位y,你可以使用两个位作为输入进行异或的指令,如果结果是0,则它不是相同的。那么你怎么知道结果不是0呢?如果您说输入a大于0,则使用返回true的指令。

So this is already 2 instructions you use to do it, but since most CPU's have an instruction that does compare in a single cycle it is a bad example.

所以这已经是你用来做的2条指令了,但是由于大多数CPU都有一个在单个周期内进行比较的指令,这是一个不好的例子。

The point I am making is still the same, you can't make this generally statements without providing the programming language and the CPU architecture.

我提出的观点仍然是相同的,如果不提供编程语言和CPU架构,就无法做出一般性的陈述。

#11


This list (assuming it's on x86) of ASM instructions might help:

此列表(假设它在x86上)的ASM指令可能有所帮助:

(Disclaimer, I have nothing more than very basic experience with writing assembler so I could be off the mark)

(免责声明,我只有非常基本的编写汇编程序的经验,所以我可能会脱离标记)

However it obviously depends purely on what assembly instructions the Delphi compiler is producing. Without seeing that output then it's guesswork. I'm going to keep my Donald Knuth quote in as caring about this kind of thing for all but a niche set of applications (games, mobile devices, high performance server apps, safety critical software, missile launchers etc.) is the thing you worry about last in my view.

然而,它显然完全取决于Delphi编译器生成的汇编指令。没有看到输出,那就是猜测。我将保留我的唐纳德克努特的报价,因为关心这类事情,除了一系列应用程序(游戏,移动设备,高性能服务器应用程序,安全关键软件,导弹发射器等)就是你的事情。在我看来,最后担心。

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

“我们应该忘记小的效率,比如大约97%的时间:过早的优化是所有邪恶的根源。”

If you're writing one of those or similar then obviously you do care, but you didn't specify it.

如果您正在写其中一个或类似的,那么显然您会关心,但您没有指定它。

#12


Just guessing, but given you want to preserve the logic, you cannot just replace

只是猜测,但鉴于你想保留逻辑,你不能只是替换

if A = B then

with

if A <> B then

To conserve the logic, the original code must have been something like

为了保存逻辑,原始代码必须是类似的

if not (A = B) then

or

if A <> B then
else

and that may truely be a little bit slower than the test on inequality.

这可能比不平等的测试慢一点。

#1


I'd claim that this is flat out wrong except perhaps in very special circumstances. Compilers can refactor one into the other effortlessly (by just switching the if and else cases).

除非在非常特殊的情况下,否则我会声称这是错误的。编译器可以毫不费力地将一个重构为另一个(通过切换if和else的情况)。

#2


It could have something to do with branch prediction on the CPU. Static branch prediction would predict that a branch simply wouldn't be taken and fetch the next instruction. However, hardly anybody uses that anymore. Other than that, I'd say it's bull because the comparisons should be identical.

它可能与CPU上的分支预测有关。静态分支预测将预测不会采用分支并获取下一条指令。但是,几乎没有人再使用它了。除此之外,我会说这是公牛,因为比较应该是相同的。

#3


I think there's some confusion in your previous question about what the algorithm was that you were trying to implement, and therefore in what the claimed "speedup" purports to do.

我认为您之前的问题中存在一些混淆,即您试图实施的算法是什么,因此声称“加速”声称要做什么。

Here's some disassembly from Delphi 2007. optimization on. (Note, optimization off changed the code a little, but not in a relevant way.

这是Delphi 2007的一些反汇编。优化。 (注意,优化关闭稍微改变了代码,但不是以相关的方式。

Unit70.pas.31: for I := 0 to 100 do
004552B5 33C0             xor eax,eax
Unit70.pas.33: if i = j then
004552B7 3B02             cmp eax,[edx]
004552B9 7506             jnz $004552c1
Unit70.pas.34: k := k+1;
004552BB FF05D0DC4500     inc dword ptr [$0045dcd0]
Unit70.pas.35: if i <> j then
004552C1 3B02             cmp eax,[edx]
004552C3 7406             jz $004552cb
Unit70.pas.36: l := l + 1;
004552C5 FF05D4DC4500     inc dword ptr [$0045dcd4]
Unit70.pas.37: end;
004552CB 40               inc eax
Unit70.pas.31: for I := 0 to 100 do
004552CC 83F865           cmp eax,$65
004552CF 75E6             jnz $004552b7
Unit70.pas.38: end;
004552D1 C3               ret 

As you can see, the only difference between the two cases is a jz vs. a jnz instruction. These WILL run at the same speed. what's likely to affect things much more is how often the branch is taken, and if the entire loop fits into cache.

如您所见,两种情况之间的唯一区别是jz与jnz指令。这些将以相同的速度运行。可能更多地影响事物的是分支被占用的频率,以及整个循环是否适合缓存。

#4


For .Net languages

If you look at the IL from the string.op_Equality and string.op_Inequality methods, you will see that both internall call string.Equals.

如果从string.op_Equality和string.op_Inequality方法查看IL,您将看到两个内部调用string.Equals。

But the op_Inequality inverts the result. This is two IL-statements more.

但是op_Inequality反转了结果。这是两个IL声明。

I would say they the performance is the same, with maybe a small (very small, very very small) better performance for the == statement. But I believe that the optimizer & JIT compiler will remove this.

我会说它们的性能是相同的,对于==语句可能有一个小的(非常小的,非常小的)更好的性能。但我相信优化器和JIT编译器会删除它。

#5


Spontaneous though; most other things in your code will affect performance more than the choice between == and != (or = and <> depending on language).

虽然自发;代码中的大多数其他内容会影响性能,而不是==和!=(或=和<>取决于语言)之间的选择。

When I ran a test in C# over 1000000 iterations of comparing strings (containing the alphabet, a-z, with the last two letters reversed in one of them), the difference was between 0 an 1 milliseconds.

当我在C#中进行1000000次迭代比较字符串测试时(包含字母表,a-z,其中一个字母反转最后两个字母),差异在0到1毫秒之间。

It has been said before: write code for readability; change into more performant code when it has been established that it will make a difference.

之前已经说过:编写可读性代码;当已经确定它将产生影响时,转换为更高性能的代码。

Edit: repeated the same test with byte arrays; same thing; the performance difference is neglectible.

编辑:用字节数组重复相同的测试;一样;性能差异是可以忽略的。

#6


It could also be a result of misinterpretation of an experiment.

这也可能是对实验的误解的结果。

Most compilers/optimizers assume a branch is taken by default. If you invert the operator and the if-then-else order, and the branch that is now taken is the ELSE clause, that might cause an additional speed effect in highly calculating code (*)

大多数编译器/优化器都假设默认采用分支。如果反转运算符和if-then-else顺序,并且现在采用的分支是ELSE子句,那么可能会在高度计算代码中导致额外的速度效应(*)

(*) obviously you need to do a lot of operations for that. But it can matter for the tightest loops in e.g. codecs or image analysis/machine vision where you have 50MByte/s of data to trawl through. .... and then I even only stoop to this level for the really heavily reusable code. For ordinary business code it is not worth it.

(*)显然你需要为此做很多操作。但是对于例如最紧密的环路来说,它可能很重要。编解码器或图像分析/机器视觉,您可以在其中传输50MByte / s的数据。 ....然后我甚至只是为了真正重复使用的代码而屈服于这个级别。对于普通的商业代码,它是不值得的。

#7


I'd claim this was flat out wrong full stop. The test for equality is always the same as the test for inequality. With string (or complex structure testing), you're always going to break at exactly the same point. Until that break point is reached, then the answer for equality is unknown.

我声称这是错误的完全停止。平等的测试总是与不平等的测试相同。使用字符串(或复杂的结构测试),您总是会在完全相同的点上中断。在达到这个突破点之前,平等的答案是未知的。

#8


I strongly doubt there is any speed difference. For integral types for example you are getting a CMP instruction and either JZ (Jump if zero) or JNZ (Jump if not zero), depending on whether you used = or ≠. There is no speed difference here and I'd expect that to hold true at higher levels too.

我强烈怀疑是否存在任何速度差异。例如,对于整数类型,您将获得CMP指令和JZ(如果为零则跳转)或JNZ(如果不为零则跳转),具体取决于您使用的是=还是≠。这里没有速度差异,我希望在更高的水平上保持正确。

#9


If you can provide a small example that clearly shows a difference, then I'm sure the Stack Overflow community could explain why. However, I think you might have difficulty constructing a clear example. I don't think there will be any performance difference noticeable at any reasonable scale.

如果你能提供一个清楚显示差异的小例子,那么我确信Stack Overflow社区可以解释原因。但是,我认为你可能难以构建一个明确的例子。我认为在任何合理的规模上都不会有明显的性能差异。

#10


Well it could be or it couldn't be, that is the question :-) The thing is this is highly depending on the programming language you are using. Since all your statements will eventually end up as instructions to the CPU, the one that uses the least amount of instruction to achieve the result will be the fastest.

好吧它可能是或者它不可能,这就是问题:-)事情是这取决于你正在使用的编程语言。由于所有语句最终都将作为CPU的指令结束,因此使用最少量指令来实现结果的语句将是最快的。

For example if you say bits x is equal to bits y, you could use the instruction that does an XOR using both bits as an input, if the result is anything but 0 it is not the same. So how would you know that the result is anything but 0? By using the instruction that returns true if you say input a is bigger than 0.

例如,如果你说位x等于位y,你可以使用两个位作为输入进行异或的指令,如果结果是0,则它不是相同的。那么你怎么知道结果不是0呢?如果您说输入a大于0,则使用返回true的指令。

So this is already 2 instructions you use to do it, but since most CPU's have an instruction that does compare in a single cycle it is a bad example.

所以这已经是你用来做的2条指令了,但是由于大多数CPU都有一个在单个周期内进行比较的指令,这是一个不好的例子。

The point I am making is still the same, you can't make this generally statements without providing the programming language and the CPU architecture.

我提出的观点仍然是相同的,如果不提供编程语言和CPU架构,就无法做出一般性的陈述。

#11


This list (assuming it's on x86) of ASM instructions might help:

此列表(假设它在x86上)的ASM指令可能有所帮助:

(Disclaimer, I have nothing more than very basic experience with writing assembler so I could be off the mark)

(免责声明,我只有非常基本的编写汇编程序的经验,所以我可能会脱离标记)

However it obviously depends purely on what assembly instructions the Delphi compiler is producing. Without seeing that output then it's guesswork. I'm going to keep my Donald Knuth quote in as caring about this kind of thing for all but a niche set of applications (games, mobile devices, high performance server apps, safety critical software, missile launchers etc.) is the thing you worry about last in my view.

然而,它显然完全取决于Delphi编译器生成的汇编指令。没有看到输出,那就是猜测。我将保留我的唐纳德克努特的报价,因为关心这类事情,除了一系列应用程序(游戏,移动设备,高性能服务器应用程序,安全关键软件,导弹发射器等)就是你的事情。在我看来,最后担心。

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

“我们应该忘记小的效率,比如大约97%的时间:过早的优化是所有邪恶的根源。”

If you're writing one of those or similar then obviously you do care, but you didn't specify it.

如果您正在写其中一个或类似的,那么显然您会关心,但您没有指定它。

#12


Just guessing, but given you want to preserve the logic, you cannot just replace

只是猜测,但鉴于你想保留逻辑,你不能只是替换

if A = B then

with

if A <> B then

To conserve the logic, the original code must have been something like

为了保存逻辑,原始代码必须是类似的

if not (A = B) then

or

if A <> B then
else

and that may truely be a little bit slower than the test on inequality.

这可能比不平等的测试慢一点。