I am working on a m/c Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz
It supports SSE4.2.
我正在做一个m/c Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz,它支持SSE4.2。
I have written C code to perform XOR operation over string bits. But I want to write corresponding SIMD code and check for performance improvement. Here is my code
我已经编写了C代码来对字符串位执行XOR操作。但是我想编写相应的SIMD代码并检查性能改进。这是我的代码
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#define LENGTH 10
unsigned char xor_val[LENGTH];
void oper_xor(unsigned char *r1, unsigned char *r2)
{
unsigned int i;
for (i = 0; i < LENGTH; ++i)
{
xor_val[i] = (unsigned char)(r1[i] ^ r2[i]);
printf("%d",xor_val[i]);
}
}
int main() {
int i;
time_t start, stop;
double cur_time;
start = clock();
oper_xor("1110001111", "0000110011");
stop = clock();
cur_time = ((double) stop-start) / CLOCKS_PER_SEC;
printf("Time used %f seconds.\n", cur_time / 100);
for (i = 0; i < LENGTH; ++i)
printf("%d",xor_val[i]);
printf("\n");
return 0;
}
On compiling and running a sample code I am getting output shown below. Time is 00 here but in actual project it is consuming sufficient time.
在编译和运行示例代码时,我将得到如下所示的输出。这里的时间是00,但是在实际的项目中,它消耗了足够的时间。
gcc xor_scalar.c -o xor_scalar
pan88: ./xor_scalar
1110111100 Time used 0.000000 seconds.
1110111100
How can I start writing a corresponding SIMD code for SSE4.2
如何开始为SSE4.2编写相应的SIMD代码?
1 个解决方案
#1
3
The Intel Compiler and any OpenMP compiler support #pragma simd
and #pragma omp simd
, respectively. These are your best bet to get the compiler to do SIMD codegen for you. If that fails, you can use intrinsics or, as a means of last resort, inline assembly.
英特尔编译器和任何OpenMP编译器支持#pragma simd和#pragma omp simd。这是让编译器为您执行SIMD codegen的最佳选择。如果失败了,您可以使用“内在”或“作为最后的手段”内联汇编。
Note the the printf
function calls will almost certainly interfere with vectorization, so you should remove them from any loops in which you want to see SIMD.
注意,printf函数调用几乎肯定会干扰矢量化,因此您应该将它们从您希望看到SIMD的任何循环中删除。
#1
3
The Intel Compiler and any OpenMP compiler support #pragma simd
and #pragma omp simd
, respectively. These are your best bet to get the compiler to do SIMD codegen for you. If that fails, you can use intrinsics or, as a means of last resort, inline assembly.
英特尔编译器和任何OpenMP编译器支持#pragma simd和#pragma omp simd。这是让编译器为您执行SIMD codegen的最佳选择。如果失败了,您可以使用“内在”或“作为最后的手段”内联汇编。
Note the the printf
function calls will almost certainly interfere with vectorization, so you should remove them from any loops in which you want to see SIMD.
注意,printf函数调用几乎肯定会干扰矢量化,因此您应该将它们从您希望看到SIMD的任何循环中删除。