SSE将寄存器设置为0.0和1.0的最佳方法是什么？

I am doing some sse vector3 math.

我正在做一些sse vector3数学。

Generally, I set the 4th digit of my vector to 1.0f, as this makes most of my math work, but sometimes I need to set it to 0.0f.

通常，我将矢量的第四位设置为1.0f，因为这使我的大部分数学工作，但有时我需要将其设置为0.0f。

So I want to change something like: (32.4f, 21.2f, -4.0f, 1.0f) to (32.4f, 21.2f, -4.0f, 0.0f)

所以我想改变像：（32.4f，21.2f，-4.0f，1.0f）到（32.4f，21.2f，-4.0f，0.0f）

I was wondering what the best method to doing so would be:

我想知道这样做的最佳方法是：

Convert to 4 floats, set 4th float, send back to SSE
转换为4个浮点数，设置第4个浮点数，发送回SSE
xor a register with itself, then do 2 shufps
xor一个自己的寄存器，然后做2 shufps
Do all the SSE math with 1.0f and then set the variables to what they should be when finished.
用1.0f完成所有SSE数学运算，然后将变量设置为完成时应该是什么。
Other?
其他？

Note: The vector is already in a SSE register when I need to change it.

注意：当我需要更改它时，向量已经在SSE寄存器中。

5 个解决方案

#1

Assuming your original vector is in xmm0:

假设你的原始向量是xmm0：

; xmm0 = [x y z w]
xorps %xmm1, %xmm1         ; [0 0 0 0]
pcmpeqs %xmm2, %xmm2       ; [1 1 1 1] 
movss %xmm1, %xmm2         ; [0 1 1 1]
pshufd $0x20, %xmm1, %xmm2 ; [1 1 1 0]
andps %xmm2, %xmm0         ; [x y z 0]

should be fast since it does not access memory.

应该很快，因为它不访问内存。

#2

AND with a constant mask.

并使用常量掩码。

In assembly ...

在集会......

myMask:
.long 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000

...
andps  myMask, %xmm#

where # = {0, 1, 2, ....}

＃= {0,1,2，....}

Hope this helps.

希望这可以帮助。

#3

If you want to do it without memory access, you could realize that the value 1 has a zero word in it, and the value zero is all zeroes. So, you can just copy the zero word to the other. If you have the 1 in the highest dword, pshufhw xmm0, xmm0, 0xa4 should do the trick:

如果你想在没有内存访问的情况下这样做，你可以意识到值1中有一个零字，零值全为零。所以，你可以将零字复制到另一个。如果你有最高dword中的1，pshufhw xmm0，xmm0,0xa4应该可以做到这一点：

(gdb) ni
4       pshufhw $0xa4, %xmm0, %xmm0
(gdb) p $xmm0.v4_float
$4 = {32.4000015, 21.2000008, -4, 1}
(gdb) ni
5       ret
(gdb) p $xmm0.v4_float
$5 = {32.4000015, 21.2000008, -4, 0}

The similar trick for the other locations is left as an excercise to the reader :)

其他位置的类似技巧留给读者一个练习:)

#4

pinsrw?

pinsrw？

#5

-1

Why not multiply your vector element wise with [1 1 1 0]? I'm pretty sure there is an SSE instruction for element wise multiplication.

为什么不将你的向量元素与[1 1 1 0]相乘？我很确定有一个用于元素乘法的SSE指令。

Then to go back to a vector with a 1 in the 4th dimension, just add [0 0 0 1]. Again there is an SSE instruction for that, too.

然后返回到第四维中带1的向量，只需添加[0 0 0 1]。同样也有一个SSE指令。

#1