There are some slides here that discuss these SSE4.1 instructions, but I am still not sure what they're good for when using GCC's vector types.
这里有一些幻灯片讨论了SSE4.1指令,但我仍然不确定它们在使用GCC的向量类型时的用处。
When I create a vector type in GCC C, in the following way:
当我用GCC C创建一个向量类型时,如下所示:
typedef char v16s8 __attribute__ ((vector_size (16)));
v16s8 a = {2,-1,3,4,2,-3,1,5,6,-3,1,0,2,3,-4,2};
int putin = 99;
And then I decide to put in "putin" using one of these two methods:
然后我决定用这两种方法中的一种加入“普京”
test[1] = putin;
test = __builtin_ia32_vec_set_v16qi (test, putin, 1);
The top command generates one single movb instruction, but the second generates a movdqa, then the pinsrb, then movaps, then movdqa.
top命令生成一个movb指令,但是第二个命令生成一个movdqa,然后是pinsrb,然后是movaps,然后是movdqa。
Would it be correct to assume the pinsrb command is only useful when you wish to preserve the original vector and create a new one with the byte changed, therefore accomplishing 2 things (duplication and element insertion) in one command?
假设pinsrb命令只在您希望保留原始向量并创建一个字节更改的新向量时才有用,从而在一个命令中完成两件事情(重复和元素插入),这是否正确?
Another hypothesis: my test code is worthless because GCC is really just putting the byte in its internal type and never loading it back into the original xmm register. But I don't know how best to test this either.
另一个假设是:我的测试代码毫无价值,因为GCC实际上只是将字节放入其内部类型,而不再将其加载回原来的xmm寄存器中。但我也不知道如何最好地测试它。
1 个解决方案
#1
3
If the value is still in memory, the compiler may very well decide to use simple mov
. You should play around to see what happens if it's already in a register. Also don't forget to enable optimizations, not to mention the relevant SSE instruction set (-msse4.1
).
如果值仍然在内存中,编译器很可能会决定使用简单的mov。如果它已经在寄存器中,您应该到处看看会发生什么。也不要忘记启用优化,更不要说相关的SSE指令集(-msse4.1)。
Given this code (next time please post complete code yourself, too):
有了这段代码(下次也请自己发完整的代码):
typedef char v16s8 __attribute__ ((vector_size (16)));
void foo(v16s8* arg)
{
v16s8 test = {2,-1,3,4,2,-3,1,5,6,-3,1,0,2,3,-4,2};
int putin = 99;
test[1] = putin;
*arg = test;
}
gcc 5.2 with -O2 -msse4.1
produces:
使用-O2 -msse4.1生产:
movdqa .LC0(%rip), %xmm0
movl $99, %eax
pinsrb $1, %eax, %xmm0
movaps %xmm0, (%rdi)
ret
(See using gcc explorer).
(请参阅使用gcc explorer)。
#1
3
If the value is still in memory, the compiler may very well decide to use simple mov
. You should play around to see what happens if it's already in a register. Also don't forget to enable optimizations, not to mention the relevant SSE instruction set (-msse4.1
).
如果值仍然在内存中,编译器很可能会决定使用简单的mov。如果它已经在寄存器中,您应该到处看看会发生什么。也不要忘记启用优化,更不要说相关的SSE指令集(-msse4.1)。
Given this code (next time please post complete code yourself, too):
有了这段代码(下次也请自己发完整的代码):
typedef char v16s8 __attribute__ ((vector_size (16)));
void foo(v16s8* arg)
{
v16s8 test = {2,-1,3,4,2,-3,1,5,6,-3,1,0,2,3,-4,2};
int putin = 99;
test[1] = putin;
*arg = test;
}
gcc 5.2 with -O2 -msse4.1
produces:
使用-O2 -msse4.1生产:
movdqa .LC0(%rip), %xmm0
movl $99, %eax
pinsrb $1, %eax, %xmm0
movaps %xmm0, (%rdi)
ret
(See using gcc explorer).
(请参阅使用gcc explorer)。