I am doing some sse vector3 math.
我正在做一些sse vector3数学。
Generally, I set the 4th digit of my vector to 1.0f, as this makes most of my math work, but sometimes I need to set it to 0.0f.
通常,我将矢量的第四位设置为1.0f,因为这使我的大部分数学工作,但有时我需要将其设置为0.0f。
So I want to change something like: (32.4f, 21.2f, -4.0f, 1.0f) to (32.4f, 21.2f, -4.0f, 0.0f)
所以我想改变像:(32.4f,21.2f,-4.0f,1.0f)到(32.4f,21.2f,-4.0f,0.0f)
I was wondering what the best method to doing so would be:
我想知道这样做的最佳方法是:
- Convert to 4 floats, set 4th float, send back to SSE
- 转换为4个浮点数,设置第4个浮点数,发送回SSE
- xor a register with itself, then do 2 shufps
- xor一个自己的寄存器,然后做2 shufps
- Do all the SSE math with 1.0f and then set the variables to what they should be when finished.
- 用1.0f完成所有SSE数学运算,然后将变量设置为完成时应该是什么。
- Other?
- 其他?
Note: The vector is already in a SSE register when I need to change it.
注意:当我需要更改它时,向量已经在SSE寄存器中。
5 个解决方案
#1
4
Assuming your original vector is in xmm0:
假设你的原始向量是xmm0:
; xmm0 = [x y z w]
xorps %xmm1, %xmm1 ; [0 0 0 0]
pcmpeqs %xmm2, %xmm2 ; [1 1 1 1]
movss %xmm1, %xmm2 ; [0 1 1 1]
pshufd $0x20, %xmm1, %xmm2 ; [1 1 1 0]
andps %xmm2, %xmm0 ; [x y z 0]
should be fast since it does not access memory.
应该很快,因为它不访问内存。
#2
5
AND with a constant mask.
并使用常量掩码。
In assembly ...
在集会......
myMask:
.long 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000
...
andps myMask, %xmm#
where # = {0, 1, 2, ....}
#= {0,1,2,....}
Hope this helps.
希望这可以帮助。
#3
2
If you want to do it without memory access, you could realize that the value 1 has a zero word in it, and the value zero is all zeroes. So, you can just copy the zero word to the other. If you have the 1 in the highest dword, pshufhw xmm0, xmm0, 0xa4
should do the trick:
如果你想在没有内存访问的情况下这样做,你可以意识到值1中有一个零字,零值全为零。所以,你可以将零字复制到另一个。如果你有最高dword中的1,pshufhw xmm0,xmm0,0xa4应该可以做到这一点:
(gdb) ni
4 pshufhw $0xa4, %xmm0, %xmm0
(gdb) p $xmm0.v4_float
$4 = {32.4000015, 21.2000008, -4, 1}
(gdb) ni
5 ret
(gdb) p $xmm0.v4_float
$5 = {32.4000015, 21.2000008, -4, 0}
The similar trick for the other locations is left as an excercise to the reader :)
其他位置的类似技巧留给读者一个练习:)
#5
-1
Why not multiply your vector element wise with [1 1 1 0]? I'm pretty sure there is an SSE instruction for element wise multiplication.
为什么不将你的向量元素与[1 1 1 0]相乘?我很确定有一个用于元素乘法的SSE指令。
Then to go back to a vector with a 1 in the 4th dimension, just add [0 0 0 1]. Again there is an SSE instruction for that, too.
然后返回到第四维中带1的向量,只需添加[0 0 0 1]。同样也有一个SSE指令。
#1
4
Assuming your original vector is in xmm0:
假设你的原始向量是xmm0:
; xmm0 = [x y z w]
xorps %xmm1, %xmm1 ; [0 0 0 0]
pcmpeqs %xmm2, %xmm2 ; [1 1 1 1]
movss %xmm1, %xmm2 ; [0 1 1 1]
pshufd $0x20, %xmm1, %xmm2 ; [1 1 1 0]
andps %xmm2, %xmm0 ; [x y z 0]
should be fast since it does not access memory.
应该很快,因为它不访问内存。
#2
5
AND with a constant mask.
并使用常量掩码。
In assembly ...
在集会......
myMask:
.long 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000
...
andps myMask, %xmm#
where # = {0, 1, 2, ....}
#= {0,1,2,....}
Hope this helps.
希望这可以帮助。
#3
2
If you want to do it without memory access, you could realize that the value 1 has a zero word in it, and the value zero is all zeroes. So, you can just copy the zero word to the other. If you have the 1 in the highest dword, pshufhw xmm0, xmm0, 0xa4
should do the trick:
如果你想在没有内存访问的情况下这样做,你可以意识到值1中有一个零字,零值全为零。所以,你可以将零字复制到另一个。如果你有最高dword中的1,pshufhw xmm0,xmm0,0xa4应该可以做到这一点:
(gdb) ni
4 pshufhw $0xa4, %xmm0, %xmm0
(gdb) p $xmm0.v4_float
$4 = {32.4000015, 21.2000008, -4, 1}
(gdb) ni
5 ret
(gdb) p $xmm0.v4_float
$5 = {32.4000015, 21.2000008, -4, 0}
The similar trick for the other locations is left as an excercise to the reader :)
其他位置的类似技巧留给读者一个练习:)
#4
#5
-1
Why not multiply your vector element wise with [1 1 1 0]? I'm pretty sure there is an SSE instruction for element wise multiplication.
为什么不将你的向量元素与[1 1 1 0]相乘?我很确定有一个用于元素乘法的SSE指令。
Then to go back to a vector with a 1 in the 4th dimension, just add [0 0 0 1]. Again there is an SSE instruction for that, too.
然后返回到第四维中带1的向量,只需添加[0 0 0 1]。同样也有一个SSE指令。