Using integer math alone, I'd like to "safely" average two unsigned ints in C++.
仅使用整数数学,我希望在c++中“安全地”平均两个无符号int。
What I mean by "safely" is avoiding overflows (and anything else that can be thought of).
我所说的“安全”是指避免溢出(以及其他任何可以想到的东西)。
For instance, averaging 200 and 5000 is easy:
例如,平均200和5000是很容易的:
unsigned int a = 200;
unsigned int b = 5000;
unsigned int average = (a + b) / 2; // Equals: 2600 as intended
But in the case of 4294967295 and 5000 then:
但在4294967295和5000的情况下:
unsigned int a = 4294967295;
unsigned int b = 5000;
unsigned int average = (a + b) / 2; // Equals: 2499 instead of 2147486147
The best I've come up with is:
我想到的最好的办法是:
unsigned int a = 4294967295;
unsigned int b = 5000;
unsigned int average = (a / 2) + (b / 2); // Equals: 2147486147 as expected
Are there better ways?
有更好的方法吗?
10 个解决方案
#1
46
Your last approach seems promising. You can improve on that by manually considering the lowest bits of a and b:
你最后的方法似乎很有前途。你可以通过手工考虑a和b的最低比特来改进:
unsigned int average = (a / 2) + (b / 2) + (a & b & 1);
This gives the correct results in case both a and b are odd.
这给出了正确的结果,以防a和b都是奇数。
#2
26
unsigned int average = low + ((high - low) / 2);
EDIT
编辑
Here's a related article: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
这里有一篇相关文章:http://googleresearch.blogspot.com/2006/06/extra- read-all- out- out- near - .html
#3
17
Your method is not correct if both numbers are odd eg 5 and 7, average is 6 but your method #3 returns 5.
如果两个数都是奇数,你的方法是不正确的。
Try this:
试试这个:
average = (a>>1) + (b>>1) + (a & b & 1)
with math operators only:
数学运算符只有:
average = a/2 + b/2 + (a%2) * (b%2)
#4
9
If you don't mind a little x86 inline assembly (GNU C syntax), you can take advantage of supercat's suggestion to use rotate-with-carry after an add to put the high 32 bits of the full 33-bit result into a register.
如果您不介意一个小小的x86内联程序集(GNU C语法),您可以利用supercat的建议,在添加之后使用rotate-with-carry,将全部33位结果的高32位放入寄存器中。
Of course, you usually should mind using inline-asm, because it defeats some optimizations (https://gcc.gnu.org/wiki/DontUseInlineAsm). But here we go anyway:
当然,您通常应该注意使用inline-asm,因为它会破坏一些优化(https://gcc.gnu.org/wiki/DontUseInlineAsm)。但不管怎样,我们还是这样:
// works for 64-bit long as well on x86-64, and doesn't depend on calling convention
unsigned average(unsigned x, unsigned y)
{
unsigned result;
asm("add %[x], %[res]\n\t"
"rcr %[res]"
: [res] "=r" (result) // output
: [y] "%0"(y), // input: in the same reg as results output. Commutative with next operand
[x] "rme"(x) // input: reg, mem, or immediate
: // no clobbers. ("cc" is implicit on x86)
);
return result;
}
The %
modifier to tell the compiler the args are commutative doesn't actually help make better asm in the case I tried, calling the function with y being a constant or pointer-deref (memory operand). Probably using a matching constraint for an output operand defeats that, since you can't use it with read-write operands.
在我尝试的情况下,告诉编译器args是可交换的的%修饰符实际上并不能使asm变得更好,它调用函数时y是常量,或者说是指针指针(内存操作数)。可能对输出操作数使用匹配约束会导致错误,因为您不能对读写操作数使用它。
As you can see on the Godbolt compiler explorer, this compiles correctly, and so does a version where we change the operands to unsigned long
, with the same inline asm. clang3.9 makes a mess of it, though, and decides to use the "m"
option for the "rme"
constraint, so it stores to memory and uses a memory operand.
正如您在Godbolt编译器资源管理器上看到的那样,这个编译器是正确的,并且在一个版本中,我们将操作数更改为unsigned long,具有相同的内联asm。然而,clang3.9把它搞得一团糟,并决定对“rme”约束使用“m”选项,因此它将存储到内存中并使用内存操作数。
RCR-by-one is not too slow, but it's still 3 uops on Skylake, with 2 cycle latency. It's great on AMD CPUs, where RCR has single-cycle latency. (Source: Agner Fog's instruction tables, see also the x86 tag wiki for x86 performance links). It's still better than @sellibitze's version, but worse than @Sheldon's order-dependent version. (See code on Godbolt)
rcr -by- 1不是很慢,但是在Skylake上仍然有3个uops,有2个周期延迟。它在AMD cpu上很好,RCR有单周期延迟。(来源:Agner Fog的指令表,参见x86标记wiki中的x86性能链接)。它仍然比@sellibitze的版本好,但比@Sheldon的依赖订单的版本差。(参见代码Godbolt)
But remember that inline-asm defeats optimizations like constant-propagation, so any pure-C++ version will be better in that case.
但是请记住,inline-asm击败了像常量传播这样的优化,所以在这种情况下,任何pure- c++版本都会更好。
#5
6
And the correct answer is...
正确的答案是……
(A&B)+((A^B)>>1)
#6
4
What you have is fine, with the minor detail that it will claim that the average of 3 and 3 is 2. I'm guessing that you don't want that; fortunately, there's an easy fix:
你得到的是好的,有一个小细节它会说3和3的平均值是2。我猜你并不想这样;幸运的是,有一个简单的解决办法:
unsigned int average = a/2 + b/2 + (a & b & 1);
This just bumps the average back up in the case that both divisions were truncated.
这只会在两个部分都被截断的情况下,使平均值反弹。
#7
2
If the code is for an embedded micro, and if speed is critical, assembly language may be helpful. On many microcontrollers, the result of the add would naturally go into the carry flag, and instructions exist to shift it back into a register. On an ARM, the average operation (source and dest. in registers) could be done in two instructions; any C-language equivalent would likely yield at least 5, and probably a fair bit more than that.
如果代码是为嵌入式微处理器编写的,并且速度非常重要,那么汇编语言可能会有所帮助。在许多微控制器上,添加的结果自然会进入进位标志,并有指令将其移回寄存器。在ARM上,平均操作(源寄存器和destin寄存器)可以在两个指令中完成;任何与c语言对等的语言都可能产生至少5个,而且可能比这多出一点。
Incidentally, on machines with shorter word sizes, the differences can be even more substantial. On an 8-bit PIC-18 series, averaging two 32-bit numbers would take twelve instructions. Doing the shifts, add, and correction, would take 5 instructions for each shift, eight for the add, and eight for the correction, so 26 (not quite a 2.5x difference, but probably more significant in absolute terms).
顺便说一句,在字长较短的机器上,差别可能更大。在8位图18系列中,平均两个32位数字需要12条指令。做轮班、加法和校正,每个班需要5个指令,8个是增加的,8个是修正的,所以26个(不是2.5x的差值,但绝对值可能更大)。
#8
1
Use a 64-bit unsigned int as the placeholder for the sum, cast back to int after dividing by 2. Questionable whether this is 'better', but you certainly avoid the overflow issue with minimal effort.
使用一个64位的无符号int作为和的占位符,在除以2后返回到int。这是否“更好”值得怀疑,但您当然可以用最少的努力避免溢出问题。
#9
0
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
decimal avg = 0;
for (int i = 0; i < array.Length; i++){
avg = (array[i] - avg) / (i+1) + avg;
}
expects avg == 5.0 for this test
期望avg == 5.0用于此测试
#10
-2
(((a&b << 1) + (a^b)) >> 1)
is also a nice way.
(((方式< < 1)+(^ b))> > 1)也是一种很好的方式。
Courtesy: http://www.ragestorm.net/blogs/?p=29
礼貌:http://www.ragestorm.net/blogs/?p=29
#1
46
Your last approach seems promising. You can improve on that by manually considering the lowest bits of a and b:
你最后的方法似乎很有前途。你可以通过手工考虑a和b的最低比特来改进:
unsigned int average = (a / 2) + (b / 2) + (a & b & 1);
This gives the correct results in case both a and b are odd.
这给出了正确的结果,以防a和b都是奇数。
#2
26
unsigned int average = low + ((high - low) / 2);
EDIT
编辑
Here's a related article: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
这里有一篇相关文章:http://googleresearch.blogspot.com/2006/06/extra- read-all- out- out- near - .html
#3
17
Your method is not correct if both numbers are odd eg 5 and 7, average is 6 but your method #3 returns 5.
如果两个数都是奇数,你的方法是不正确的。
Try this:
试试这个:
average = (a>>1) + (b>>1) + (a & b & 1)
with math operators only:
数学运算符只有:
average = a/2 + b/2 + (a%2) * (b%2)
#4
9
If you don't mind a little x86 inline assembly (GNU C syntax), you can take advantage of supercat's suggestion to use rotate-with-carry after an add to put the high 32 bits of the full 33-bit result into a register.
如果您不介意一个小小的x86内联程序集(GNU C语法),您可以利用supercat的建议,在添加之后使用rotate-with-carry,将全部33位结果的高32位放入寄存器中。
Of course, you usually should mind using inline-asm, because it defeats some optimizations (https://gcc.gnu.org/wiki/DontUseInlineAsm). But here we go anyway:
当然,您通常应该注意使用inline-asm,因为它会破坏一些优化(https://gcc.gnu.org/wiki/DontUseInlineAsm)。但不管怎样,我们还是这样:
// works for 64-bit long as well on x86-64, and doesn't depend on calling convention
unsigned average(unsigned x, unsigned y)
{
unsigned result;
asm("add %[x], %[res]\n\t"
"rcr %[res]"
: [res] "=r" (result) // output
: [y] "%0"(y), // input: in the same reg as results output. Commutative with next operand
[x] "rme"(x) // input: reg, mem, or immediate
: // no clobbers. ("cc" is implicit on x86)
);
return result;
}
The %
modifier to tell the compiler the args are commutative doesn't actually help make better asm in the case I tried, calling the function with y being a constant or pointer-deref (memory operand). Probably using a matching constraint for an output operand defeats that, since you can't use it with read-write operands.
在我尝试的情况下,告诉编译器args是可交换的的%修饰符实际上并不能使asm变得更好,它调用函数时y是常量,或者说是指针指针(内存操作数)。可能对输出操作数使用匹配约束会导致错误,因为您不能对读写操作数使用它。
As you can see on the Godbolt compiler explorer, this compiles correctly, and so does a version where we change the operands to unsigned long
, with the same inline asm. clang3.9 makes a mess of it, though, and decides to use the "m"
option for the "rme"
constraint, so it stores to memory and uses a memory operand.
正如您在Godbolt编译器资源管理器上看到的那样,这个编译器是正确的,并且在一个版本中,我们将操作数更改为unsigned long,具有相同的内联asm。然而,clang3.9把它搞得一团糟,并决定对“rme”约束使用“m”选项,因此它将存储到内存中并使用内存操作数。
RCR-by-one is not too slow, but it's still 3 uops on Skylake, with 2 cycle latency. It's great on AMD CPUs, where RCR has single-cycle latency. (Source: Agner Fog's instruction tables, see also the x86 tag wiki for x86 performance links). It's still better than @sellibitze's version, but worse than @Sheldon's order-dependent version. (See code on Godbolt)
rcr -by- 1不是很慢,但是在Skylake上仍然有3个uops,有2个周期延迟。它在AMD cpu上很好,RCR有单周期延迟。(来源:Agner Fog的指令表,参见x86标记wiki中的x86性能链接)。它仍然比@sellibitze的版本好,但比@Sheldon的依赖订单的版本差。(参见代码Godbolt)
But remember that inline-asm defeats optimizations like constant-propagation, so any pure-C++ version will be better in that case.
但是请记住,inline-asm击败了像常量传播这样的优化,所以在这种情况下,任何pure- c++版本都会更好。
#5
6
And the correct answer is...
正确的答案是……
(A&B)+((A^B)>>1)
#6
4
What you have is fine, with the minor detail that it will claim that the average of 3 and 3 is 2. I'm guessing that you don't want that; fortunately, there's an easy fix:
你得到的是好的,有一个小细节它会说3和3的平均值是2。我猜你并不想这样;幸运的是,有一个简单的解决办法:
unsigned int average = a/2 + b/2 + (a & b & 1);
This just bumps the average back up in the case that both divisions were truncated.
这只会在两个部分都被截断的情况下,使平均值反弹。
#7
2
If the code is for an embedded micro, and if speed is critical, assembly language may be helpful. On many microcontrollers, the result of the add would naturally go into the carry flag, and instructions exist to shift it back into a register. On an ARM, the average operation (source and dest. in registers) could be done in two instructions; any C-language equivalent would likely yield at least 5, and probably a fair bit more than that.
如果代码是为嵌入式微处理器编写的,并且速度非常重要,那么汇编语言可能会有所帮助。在许多微控制器上,添加的结果自然会进入进位标志,并有指令将其移回寄存器。在ARM上,平均操作(源寄存器和destin寄存器)可以在两个指令中完成;任何与c语言对等的语言都可能产生至少5个,而且可能比这多出一点。
Incidentally, on machines with shorter word sizes, the differences can be even more substantial. On an 8-bit PIC-18 series, averaging two 32-bit numbers would take twelve instructions. Doing the shifts, add, and correction, would take 5 instructions for each shift, eight for the add, and eight for the correction, so 26 (not quite a 2.5x difference, but probably more significant in absolute terms).
顺便说一句,在字长较短的机器上,差别可能更大。在8位图18系列中,平均两个32位数字需要12条指令。做轮班、加法和校正,每个班需要5个指令,8个是增加的,8个是修正的,所以26个(不是2.5x的差值,但绝对值可能更大)。
#8
1
Use a 64-bit unsigned int as the placeholder for the sum, cast back to int after dividing by 2. Questionable whether this is 'better', but you certainly avoid the overflow issue with minimal effort.
使用一个64位的无符号int作为和的占位符,在除以2后返回到int。这是否“更好”值得怀疑,但您当然可以用最少的努力避免溢出问题。
#9
0
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
decimal avg = 0;
for (int i = 0; i < array.Length; i++){
avg = (array[i] - avg) / (i+1) + avg;
}
expects avg == 5.0 for this test
期望avg == 5.0用于此测试
#10
-2
(((a&b << 1) + (a^b)) >> 1)
is also a nice way.
(((方式< < 1)+(^ b))> > 1)也是一种很好的方式。
Courtesy: http://www.ragestorm.net/blogs/?p=29
礼貌:http://www.ragestorm.net/blogs/?p=29