SSE将寄存器设置为0.0和1.0的最佳方法是什么?

时间:2021-09-12 13:53:22

I am doing some sse vector3 math.

我正在做一些sse vector3数学。

Generally, I set the 4th digit of my vector to 1.0f, as this makes most of my math work, but sometimes I need to set it to 0.0f.

通常,我将矢量的第四位设置为1.0f,因为这使我的大部分数学工作,但有时我需要将其设置为0.0f。

So I want to change something like: (32.4f, 21.2f, -4.0f, 1.0f) to (32.4f, 21.2f, -4.0f, 0.0f)

所以我想改变像:(32.4f,21.2f,-4.0f,1.0f)到(32.4f,21.2f,-4.0f,0.0f)

I was wondering what the best method to doing so would be:

我想知道这样做的最佳方法是:

  1. Convert to 4 floats, set 4th float, send back to SSE
  2. 转换为4个浮点数,设置第4个浮点数,发送回SSE
  3. xor a register with itself, then do 2 shufps
  4. xor一个自己的寄存器,然后做2 shufps
  5. Do all the SSE math with 1.0f and then set the variables to what they should be when finished.
  6. 用1.0f完成所有SSE数学运算,然后将变量设置为完成时应该是什么。
  7. Other?
  8. 其他?

Note: The vector is already in a SSE register when I need to change it.

注意:当我需要更改它时,向量已经在SSE寄存器中。

5 个解决方案

#1


4  

Assuming your original vector is in xmm0:

假设你的原始向量是xmm0:

; xmm0 = [x y z w]
xorps %xmm1, %xmm1         ; [0 0 0 0]
pcmpeqs %xmm2, %xmm2       ; [1 1 1 1] 
movss %xmm1, %xmm2         ; [0 1 1 1]
pshufd $0x20, %xmm1, %xmm2 ; [1 1 1 0]
andps %xmm2, %xmm0         ; [x y z 0]

should be fast since it does not access memory.

应该很快,因为它不访问内存。

#2


5  

AND with a constant mask.

并使用常量掩码。

In assembly ...

在集会......

myMask:
.long 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000

...
andps  myMask, %xmm#

where # = {0, 1, 2, ....}

#= {0,1,2,....}

Hope this helps.

希望这可以帮助。

#3


2  

If you want to do it without memory access, you could realize that the value 1 has a zero word in it, and the value zero is all zeroes. So, you can just copy the zero word to the other. If you have the 1 in the highest dword, pshufhw xmm0, xmm0, 0xa4 should do the trick:

如果你想在没有内存访问的情况下这样做,你可以意识到值1中有一个零字,零值全为零。所以,你可以将零字复制到另一个。如果你有最高dword中的1,pshufhw xmm0,xmm0,0xa4应该可以做到这一点:

(gdb) ni
4       pshufhw $0xa4, %xmm0, %xmm0
(gdb) p $xmm0.v4_float
$4 = {32.4000015, 21.2000008, -4, 1}
(gdb) ni
5       ret
(gdb) p $xmm0.v4_float
$5 = {32.4000015, 21.2000008, -4, 0}

The similar trick for the other locations is left as an excercise to the reader :)

其他位置的类似技巧留给读者一个练习:)

#4


1  

pinsrw?

pinsrw?

#5


-1  

Why not multiply your vector element wise with [1 1 1 0]? I'm pretty sure there is an SSE instruction for element wise multiplication.

为什么不将你的向量元素与[1 1 1 0]相乘?我很确定有一个用于元素乘法的SSE指令。

Then to go back to a vector with a 1 in the 4th dimension, just add [0 0 0 1]. Again there is an SSE instruction for that, too.

然后返回到第四维中带1的向量,只需添加[0 0 0 1]。同样也有一个SSE指令。

#1


4  

Assuming your original vector is in xmm0:

假设你的原始向量是xmm0:

; xmm0 = [x y z w]
xorps %xmm1, %xmm1         ; [0 0 0 0]
pcmpeqs %xmm2, %xmm2       ; [1 1 1 1] 
movss %xmm1, %xmm2         ; [0 1 1 1]
pshufd $0x20, %xmm1, %xmm2 ; [1 1 1 0]
andps %xmm2, %xmm0         ; [x y z 0]

should be fast since it does not access memory.

应该很快,因为它不访问内存。

#2


5  

AND with a constant mask.

并使用常量掩码。

In assembly ...

在集会......

myMask:
.long 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000

...
andps  myMask, %xmm#

where # = {0, 1, 2, ....}

#= {0,1,2,....}

Hope this helps.

希望这可以帮助。

#3


2  

If you want to do it without memory access, you could realize that the value 1 has a zero word in it, and the value zero is all zeroes. So, you can just copy the zero word to the other. If you have the 1 in the highest dword, pshufhw xmm0, xmm0, 0xa4 should do the trick:

如果你想在没有内存访问的情况下这样做,你可以意识到值1中有一个零字,零值全为零。所以,你可以将零字复制到另一个。如果你有最高dword中的1,pshufhw xmm0,xmm0,0xa4应该可以做到这一点:

(gdb) ni
4       pshufhw $0xa4, %xmm0, %xmm0
(gdb) p $xmm0.v4_float
$4 = {32.4000015, 21.2000008, -4, 1}
(gdb) ni
5       ret
(gdb) p $xmm0.v4_float
$5 = {32.4000015, 21.2000008, -4, 0}

The similar trick for the other locations is left as an excercise to the reader :)

其他位置的类似技巧留给读者一个练习:)

#4


1  

pinsrw?

pinsrw?

#5


-1  

Why not multiply your vector element wise with [1 1 1 0]? I'm pretty sure there is an SSE instruction for element wise multiplication.

为什么不将你的向量元素与[1 1 1 0]相乘?我很确定有一个用于元素乘法的SSE指令。

Then to go back to a vector with a 1 in the 4th dimension, just add [0 0 0 1]. Again there is an SSE instruction for that, too.

然后返回到第四维中带1的向量,只需添加[0 0 0 1]。同样也有一个SSE指令。