I'm currently doing a University project which is marked heavily upon the speed and efficiency of my solution. Minor changes I make to the code have massive impacts, as the particular function I am writing is called many hundreds of thousands of times.
我目前正在做一个大学项目,这个项目在很大程度上取决于我的解决方案的速度和效率。我对代码做的微小修改会产生巨大的影响,因为我正在编写的特定函数会被调用成千上万次。
I have written the main functionality of my project now, and am currently in the process of optimising everything I possibly can. One particular part of my code that I am questioning looks like this:
我现在已经编写了我的项目的主要功能,并且目前正在优化我所能做到的一切。我所质疑的代码的一个特定部分是这样的:
array[i] *= -1;
Which I was considering optimising to:
我正在考虑优化:
array[i] = 0 - array[i];
Would changing this code actually affect the speed? Is a subtraction operation faster than a multiplication operation? Or is this kind of issue a thing of the past?
改变这段代码真的会影响速度吗?减法运算比乘法运算快吗?还是这类问题已经成为过去了?
6 个解决方案
#1
17
Overlooking the fact that you should probably use this instead:
忽略了这样一个事实:你可能应该使用它:
array[i] = -array[i];
as it's much clearer IMO since it directly states intent, lets check what the compiler does (GCC 4.7.2 on x86-64) for this program:
因为它直接表明了意图,所以在IMO上要清楚得多,让我们检查一下编译器对这个程序做了什么(x86-64上的GCC 4.7.2):
#include <stdio.h>
#include <time.h>
int main(void)
{
time_t t = time(NULL);
t *= -1;
return 0;
}
gcc -S mult.c -o 1.s
And for this:
对于这个:
#include <stdio.h>
#include <time.h>
int main(void)
{
time_t t = time(NULL);
t = 0 - t;
return 0;
}
gcc -S sub.c -o 2.s
Now compare the two assembly outputs:
现在比较两个汇编输出:
diff 1.s 2.s
Nothing is printed. The compiler generated the same exact code for both versions. So the answer is: it doesn't matter what you use. The compiler will choose whatever is fastest. This is a pretty easy optimization to make (if you can even call it an optimization), so we can assume that virtually every compiler out there will pick the fastest way to do it for a given CPU architecture.
没有打印出来。编译器为两个版本生成相同的代码。所以答案是:使用什么并不重要。编译器会选择最快的。这是一种非常简单的优化(如果您甚至可以称之为优化),因此我们可以假定,对于给定的CPU体系结构,几乎所有的编译器都将选择最快的方式进行优化。
For reference, the generated code is:
所产生的代码为:
int main() { time_t t = time(NULL); mov edi,0x0 call 12 mov QWORD PTR [rbp-0x8],rax t *= -1; neg QWORD PTR [rbp-0x8] t = 0 - t; neg QWORD PTR [rbp-0x8] return 0; mov eax,0x0 }
In both cases, it uses NEG to negate the value. t *= -1
and t = 0 - t
both generate:
在这两种情况下,它都使用NEG来否定值。t *= -1和t = 0 - t都生成:
neg QWORD PTR [rbp-0x8]
#2
13
There's only one sane way of going about optimization, and that is by measuring the performance of your application. A good profiler will be able to tell you a lot, but simply timing the execution of your program and various modifications can also be of great help. I'd go with the profiler first, though to find where the bottlenecks are.
优化只有一种合理的方式,那就是度量应用程序的性能。一个好的剖析器可以告诉您很多信息,但是简单地选择程序的执行时间和各种修改也会有很大的帮助。我将首先使用分析器,尽管要找到瓶颈在哪里。
As for your specific question, as others have pointed out this will be highly architecture dependant.
至于您的具体问题,正如其他人指出的那样,这将高度依赖于体系结构。
#3
5
Compilers are smart enough to convert this to an efficient operation. For example
编译器足够聪明,可以将其转换为有效的操作。例如
C source
C源
void f()
{
int a = 7, b = 7;
a *= -1;
b = -b;
}
gives using gcc -S a.c
使用gcc -S a.c给出
.file "a.c"
.text
.globl f
.type f, @function
f:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $7, -8(%rbp) ; assign 7
movl $7, -4(%rbp) ; assign 7
negl -8(%rbp) ; negate variable
negl -4(%rbp) ; negate variable
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size f, .-f
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
This is on a PC using Ubuntu 12.04 and gcc 4.6.3. Your architecture might be different.
这是在一台使用Ubuntu 12.04和gcc 4.6.3的PC上。您的架构可能不同。
#4
3
Multiplication will be slower on just about every device. It's a more complex operation.
几乎每台设备的乘法运算速度都会变慢。这是一个更复杂的操作。
However your compiler may well be smart enough to do the transformation itsself. And on modern CPUs operations can get overlapped in such a way that the extra time for the instruction doesn't result in the execution time being increased as it's overlapped with other work. And most likely it's so small a difference as to be unmeasurable without significant effort unless you are doing it many millions of times.
然而,您的编译器可能非常聪明,能够自己进行转换。在现代cpu上,操作可以重叠,这样指令的额外时间不会增加执行时间,因为它与其他工作重叠。而且很有可能这种差异是如此之小,以至于不需要付出多大的努力就无法衡量,除非你做了数百万次。
In general write clear code first, and optimize it if it is necessary later. If you want the negative value of a variable write "-value" rather than "-1*value" as it more accurately reflects your intent, rather than just being a way to calculate it.
一般来说,首先编写清晰的代码,然后在必要时进行优化。如果你想要一个变量的负值写“-value”而不是“-1*value”,因为它更准确地反映了你的意图,而不是仅仅是一种计算它的方法。
#5
2
Here's what gcc 4.6.1 does with -O:
下面是gcc 4.6.1对-O的处理:
double a1(double b) { return -b; } // xors sign bit with constant, 2 instr
// movsd .LC0(%rip), %xmm1 (instr 1)
// xorpd %xmm1, %xmm0 (instr 2)
// ret (not counted)
double a2(double b) { return -1.0*b; } // xors sign bit with constant, 2 instr
// same code as above
double a3(double b) { return 0.0-b; } // substract b from 0, 3 instructions
// xorpd %xmm1, %xmm1
// subsd %xmm0, %xmm1
// movapd %xmm1, %xmm0 (+ret)
int a4(int a){return -a;} // neg rax (+ret) 1 instruction
int a5(int a){return a*(-1);} // neg rax
int a6(int a){return 0-a;} // neg rax
double a7(double b) { return 0-b;} // same as a3() -- 3 instructions
Thus, the suggested optimization makes it worse on this compiler (depending on array type).
因此,建议的优化会使这个编译器变得更糟(取决于数组类型)。
Then about the question of multiplications being slower than additions. A rule of thumb is that if multiplications are as fast as additions, we are talking about DSP architectures or DSP extensions: Texas C64, Arm NEON, Arm VFP, MMX, SSE. The same goes for many floating point extensions, starting from Pentium, where both FADD and FMUL have a latency of 3 cycles and throughput of 1 instruction per cycle. ARM integer core also executes multiplications in 1 cycle.
然后是乘法比加法慢的问题。经验法则是,如果乘法的速度和加法一样快,我们讨论的是DSP架构或DSP扩展:Texas C64、Arm NEON、Arm VFP、MMX、SSE。许多浮点扩展也是如此,从奔腾(Pentium)开始,FADD和FMUL的延迟为3个周期,每个周期的吞吐量为1条指令。ARM integer core也在一个周期内执行乘法。
#6
1
Ok, trying to clean up my mess, and transform my sillyness into useful knowledge, not only for myself, but others as well.
好吧,试着清理我的混乱,把我的愚蠢转变成有用的知识,不仅是对我自己,还有其他人。
Main conclusion and summary:
This kind of optimization is done automatically by the compiler, in this case both approaches got compiled to one single ASM instruction on x86. (see above posts) Don't make the compiler's work tougher than it has to be, just do what the logic implies.
这种优化是由编译器自动完成的,在这种情况下,两种方法都编译为x86上的一个ASM指令。(参见上面的文章)不要让编译器的工作变得比它必须的要难,只要按照逻辑去做就可以了。
Several answers show that this is compiled to the exact same instruction in both ways.
有几个答案表明,这是按照相同的指令编译的。
TL;DR
博士TL;
To remedy the blunder I made regarding this topic, I decided to dedicate some efforts to clear this up for myself - and for those who suffer from mental outages like I did when answering this question with a miraculously bad answer...
为了弥补我在这个问题上犯的错误,我决定付出一些努力来为我自己——以及那些像我回答这个问题时一样遭受精神崩溃的人清理这个问题。
Negating a number depends on the architecture, and how data is represented.
否定一个数字取决于架构,以及数据是如何表示的。
Sign and magnitude representation
Somehow I assumed that this implementation is used - it is not. This represents numbers as one sign bit, and all the rest for the value. So it can represent numbers from -2n-1-1 to 2n-1-1, and has a negative zero value too. In this kind of representation, it would be enough to flip the sign bit:
不知何故,我假定这个实现被使用了——它不是。它将数字表示为一个符号位,其余的都表示该值。所以它可以表示从-2n-1到2n-1的数,并且有一个负值。在这种表示法中,只要把符号位翻转一下就足够了:
input ^ -0; // as the negative zero has all bits but the MSB as zero
One's complement representation
A one's complement integer representation represents negative numbers as the bitwise negation of the positive representation. This is however not really used, from the 8080 on, two's compliment is used. A strange consequence of this representation is the negative zero, which can give a lot of troubles. Also, the numbers represented range from -2n-1-1 to 2n-1-1 where n is the number of bits the numbers are stored on.
一个人的补体整数表示表示负数,表示正表示的位否定。但这并不是真正的使用,从8080年开始,就用了二人的赞美。这个表达式的一个奇怪的结果是- 0,这会带来很多麻烦。同样,表示的数字的范围从-2n-1到2n-1,其中n是数字存储的位数。
In this case, the quickest "manual" way of negating a number would be to flip all the bits representing the sign:
在这种情况下,最快速的“手工”否定数字的方法是翻转表示符号的所有位元:
input ^ 0xFFFFFFFF; //assuming 32 bits architecture
or
或
input ^ -0; //as negative zero is a "full one" binary value
Two's complement representation
The more widely (always?) used representation is the two's complement system. It represents numbers from -2n-1 to 2n-1-1, and has only one zero value. It represents the positive range as their ordinary binary representation. However, adding 1 to 2n-1-1 (represented by having 1 in all the bits other than the MSB) will result in -2n-1 (represented a 1 at MSB, and all other bits zero).
更广泛(总是)使用的表示法是二补系统。它表示从-2n-1到2n-1的数字,并且只有一个0值。它表示正的范围作为它们的普通二进制表示。但是,将1添加到2n-1-1(表示在除MSB之外的所有位中都有1)将导致-2n-1(在MSB中表示为1,所有其他位为零)。
Negating a two's complement number manually would need negating all the bits and adding 1:
手工否定一个2的补数需要否定所有的位并加上1:
(input ^ -1) + 1 //as -1 is represented by all bits as 1
However as the range of negative values is broader than that of the positive values, the most negative number does not have a positive counterpart in this representation, this has to be taken into count, when dealing with these numbers! Inverting the most negative value would result in itself, just as it happens with zero (for sake of simplicity, in 8 bits)
但是,由于负数的取值范围比正数的取值范围更广,在这个表达式中,最负数没有一个正数对应,所以在处理这些数字时,必须考虑到这一点!将最负的值反演就会产生自身,就像0一样(为了简单起见,取8位)
most negative number: -128, represented as 10000000
inverting all bits: 01111111
adding one: 10000000 -> -128 again
But please, everyone* remember: premature optimization is the root of all evil! (and with the optimizers these are a thing of the past on any resourceful architectures)
但请大家记住:过早优化是万恶之源!(使用这些优化器,在任何资源丰富的体系结构中,这些都是过去的事情)
*: the OP is already through this, so this is for all others, like me.
OP已经通过了,所以这是给所有其他人的,像我。
(Note to self: being (prematurely) silly is the root of all the (rightful) downvotes.)
(对自己说:做(过早地)傻是所有(合法的)低选票的根源。
#1
17
Overlooking the fact that you should probably use this instead:
忽略了这样一个事实:你可能应该使用它:
array[i] = -array[i];
as it's much clearer IMO since it directly states intent, lets check what the compiler does (GCC 4.7.2 on x86-64) for this program:
因为它直接表明了意图,所以在IMO上要清楚得多,让我们检查一下编译器对这个程序做了什么(x86-64上的GCC 4.7.2):
#include <stdio.h>
#include <time.h>
int main(void)
{
time_t t = time(NULL);
t *= -1;
return 0;
}
gcc -S mult.c -o 1.s
And for this:
对于这个:
#include <stdio.h>
#include <time.h>
int main(void)
{
time_t t = time(NULL);
t = 0 - t;
return 0;
}
gcc -S sub.c -o 2.s
Now compare the two assembly outputs:
现在比较两个汇编输出:
diff 1.s 2.s
Nothing is printed. The compiler generated the same exact code for both versions. So the answer is: it doesn't matter what you use. The compiler will choose whatever is fastest. This is a pretty easy optimization to make (if you can even call it an optimization), so we can assume that virtually every compiler out there will pick the fastest way to do it for a given CPU architecture.
没有打印出来。编译器为两个版本生成相同的代码。所以答案是:使用什么并不重要。编译器会选择最快的。这是一种非常简单的优化(如果您甚至可以称之为优化),因此我们可以假定,对于给定的CPU体系结构,几乎所有的编译器都将选择最快的方式进行优化。
For reference, the generated code is:
所产生的代码为:
int main() { time_t t = time(NULL); mov edi,0x0 call 12 mov QWORD PTR [rbp-0x8],rax t *= -1; neg QWORD PTR [rbp-0x8] t = 0 - t; neg QWORD PTR [rbp-0x8] return 0; mov eax,0x0 }
In both cases, it uses NEG to negate the value. t *= -1
and t = 0 - t
both generate:
在这两种情况下,它都使用NEG来否定值。t *= -1和t = 0 - t都生成:
neg QWORD PTR [rbp-0x8]
#2
13
There's only one sane way of going about optimization, and that is by measuring the performance of your application. A good profiler will be able to tell you a lot, but simply timing the execution of your program and various modifications can also be of great help. I'd go with the profiler first, though to find where the bottlenecks are.
优化只有一种合理的方式,那就是度量应用程序的性能。一个好的剖析器可以告诉您很多信息,但是简单地选择程序的执行时间和各种修改也会有很大的帮助。我将首先使用分析器,尽管要找到瓶颈在哪里。
As for your specific question, as others have pointed out this will be highly architecture dependant.
至于您的具体问题,正如其他人指出的那样,这将高度依赖于体系结构。
#3
5
Compilers are smart enough to convert this to an efficient operation. For example
编译器足够聪明,可以将其转换为有效的操作。例如
C source
C源
void f()
{
int a = 7, b = 7;
a *= -1;
b = -b;
}
gives using gcc -S a.c
使用gcc -S a.c给出
.file "a.c"
.text
.globl f
.type f, @function
f:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $7, -8(%rbp) ; assign 7
movl $7, -4(%rbp) ; assign 7
negl -8(%rbp) ; negate variable
negl -4(%rbp) ; negate variable
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size f, .-f
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
This is on a PC using Ubuntu 12.04 and gcc 4.6.3. Your architecture might be different.
这是在一台使用Ubuntu 12.04和gcc 4.6.3的PC上。您的架构可能不同。
#4
3
Multiplication will be slower on just about every device. It's a more complex operation.
几乎每台设备的乘法运算速度都会变慢。这是一个更复杂的操作。
However your compiler may well be smart enough to do the transformation itsself. And on modern CPUs operations can get overlapped in such a way that the extra time for the instruction doesn't result in the execution time being increased as it's overlapped with other work. And most likely it's so small a difference as to be unmeasurable without significant effort unless you are doing it many millions of times.
然而,您的编译器可能非常聪明,能够自己进行转换。在现代cpu上,操作可以重叠,这样指令的额外时间不会增加执行时间,因为它与其他工作重叠。而且很有可能这种差异是如此之小,以至于不需要付出多大的努力就无法衡量,除非你做了数百万次。
In general write clear code first, and optimize it if it is necessary later. If you want the negative value of a variable write "-value" rather than "-1*value" as it more accurately reflects your intent, rather than just being a way to calculate it.
一般来说,首先编写清晰的代码,然后在必要时进行优化。如果你想要一个变量的负值写“-value”而不是“-1*value”,因为它更准确地反映了你的意图,而不是仅仅是一种计算它的方法。
#5
2
Here's what gcc 4.6.1 does with -O:
下面是gcc 4.6.1对-O的处理:
double a1(double b) { return -b; } // xors sign bit with constant, 2 instr
// movsd .LC0(%rip), %xmm1 (instr 1)
// xorpd %xmm1, %xmm0 (instr 2)
// ret (not counted)
double a2(double b) { return -1.0*b; } // xors sign bit with constant, 2 instr
// same code as above
double a3(double b) { return 0.0-b; } // substract b from 0, 3 instructions
// xorpd %xmm1, %xmm1
// subsd %xmm0, %xmm1
// movapd %xmm1, %xmm0 (+ret)
int a4(int a){return -a;} // neg rax (+ret) 1 instruction
int a5(int a){return a*(-1);} // neg rax
int a6(int a){return 0-a;} // neg rax
double a7(double b) { return 0-b;} // same as a3() -- 3 instructions
Thus, the suggested optimization makes it worse on this compiler (depending on array type).
因此,建议的优化会使这个编译器变得更糟(取决于数组类型)。
Then about the question of multiplications being slower than additions. A rule of thumb is that if multiplications are as fast as additions, we are talking about DSP architectures or DSP extensions: Texas C64, Arm NEON, Arm VFP, MMX, SSE. The same goes for many floating point extensions, starting from Pentium, where both FADD and FMUL have a latency of 3 cycles and throughput of 1 instruction per cycle. ARM integer core also executes multiplications in 1 cycle.
然后是乘法比加法慢的问题。经验法则是,如果乘法的速度和加法一样快,我们讨论的是DSP架构或DSP扩展:Texas C64、Arm NEON、Arm VFP、MMX、SSE。许多浮点扩展也是如此,从奔腾(Pentium)开始,FADD和FMUL的延迟为3个周期,每个周期的吞吐量为1条指令。ARM integer core也在一个周期内执行乘法。
#6
1
Ok, trying to clean up my mess, and transform my sillyness into useful knowledge, not only for myself, but others as well.
好吧,试着清理我的混乱,把我的愚蠢转变成有用的知识,不仅是对我自己,还有其他人。
Main conclusion and summary:
This kind of optimization is done automatically by the compiler, in this case both approaches got compiled to one single ASM instruction on x86. (see above posts) Don't make the compiler's work tougher than it has to be, just do what the logic implies.
这种优化是由编译器自动完成的,在这种情况下,两种方法都编译为x86上的一个ASM指令。(参见上面的文章)不要让编译器的工作变得比它必须的要难,只要按照逻辑去做就可以了。
Several answers show that this is compiled to the exact same instruction in both ways.
有几个答案表明,这是按照相同的指令编译的。
TL;DR
博士TL;
To remedy the blunder I made regarding this topic, I decided to dedicate some efforts to clear this up for myself - and for those who suffer from mental outages like I did when answering this question with a miraculously bad answer...
为了弥补我在这个问题上犯的错误,我决定付出一些努力来为我自己——以及那些像我回答这个问题时一样遭受精神崩溃的人清理这个问题。
Negating a number depends on the architecture, and how data is represented.
否定一个数字取决于架构,以及数据是如何表示的。
Sign and magnitude representation
Somehow I assumed that this implementation is used - it is not. This represents numbers as one sign bit, and all the rest for the value. So it can represent numbers from -2n-1-1 to 2n-1-1, and has a negative zero value too. In this kind of representation, it would be enough to flip the sign bit:
不知何故,我假定这个实现被使用了——它不是。它将数字表示为一个符号位,其余的都表示该值。所以它可以表示从-2n-1到2n-1的数,并且有一个负值。在这种表示法中,只要把符号位翻转一下就足够了:
input ^ -0; // as the negative zero has all bits but the MSB as zero
One's complement representation
A one's complement integer representation represents negative numbers as the bitwise negation of the positive representation. This is however not really used, from the 8080 on, two's compliment is used. A strange consequence of this representation is the negative zero, which can give a lot of troubles. Also, the numbers represented range from -2n-1-1 to 2n-1-1 where n is the number of bits the numbers are stored on.
一个人的补体整数表示表示负数,表示正表示的位否定。但这并不是真正的使用,从8080年开始,就用了二人的赞美。这个表达式的一个奇怪的结果是- 0,这会带来很多麻烦。同样,表示的数字的范围从-2n-1到2n-1,其中n是数字存储的位数。
In this case, the quickest "manual" way of negating a number would be to flip all the bits representing the sign:
在这种情况下,最快速的“手工”否定数字的方法是翻转表示符号的所有位元:
input ^ 0xFFFFFFFF; //assuming 32 bits architecture
or
或
input ^ -0; //as negative zero is a "full one" binary value
Two's complement representation
The more widely (always?) used representation is the two's complement system. It represents numbers from -2n-1 to 2n-1-1, and has only one zero value. It represents the positive range as their ordinary binary representation. However, adding 1 to 2n-1-1 (represented by having 1 in all the bits other than the MSB) will result in -2n-1 (represented a 1 at MSB, and all other bits zero).
更广泛(总是)使用的表示法是二补系统。它表示从-2n-1到2n-1的数字,并且只有一个0值。它表示正的范围作为它们的普通二进制表示。但是,将1添加到2n-1-1(表示在除MSB之外的所有位中都有1)将导致-2n-1(在MSB中表示为1,所有其他位为零)。
Negating a two's complement number manually would need negating all the bits and adding 1:
手工否定一个2的补数需要否定所有的位并加上1:
(input ^ -1) + 1 //as -1 is represented by all bits as 1
However as the range of negative values is broader than that of the positive values, the most negative number does not have a positive counterpart in this representation, this has to be taken into count, when dealing with these numbers! Inverting the most negative value would result in itself, just as it happens with zero (for sake of simplicity, in 8 bits)
但是,由于负数的取值范围比正数的取值范围更广,在这个表达式中,最负数没有一个正数对应,所以在处理这些数字时,必须考虑到这一点!将最负的值反演就会产生自身,就像0一样(为了简单起见,取8位)
most negative number: -128, represented as 10000000
inverting all bits: 01111111
adding one: 10000000 -> -128 again
But please, everyone* remember: premature optimization is the root of all evil! (and with the optimizers these are a thing of the past on any resourceful architectures)
但请大家记住:过早优化是万恶之源!(使用这些优化器,在任何资源丰富的体系结构中,这些都是过去的事情)
*: the OP is already through this, so this is for all others, like me.
OP已经通过了,所以这是给所有其他人的,像我。
(Note to self: being (prematurely) silly is the root of all the (rightful) downvotes.)
(对自己说:做(过早地)傻是所有(合法的)低选票的根源。