So, I am facing some difficulties using openMp. I am beginner and I do not know what I am doing wrong. This is a project for one of my courses at University, so I do not seek for the solution, rather for a hint or an explanation.
因此,我在使用openMp时遇到了一些困难。我是初学者,我不知道我做错了什么。这是我大学课程的一个项目,所以我不寻求解决方案,而是寻求一个提示或解释。
The project is to calculate the hamming distance between 2 strings that belong in different sets (lets say setA and setB). Those two sets may contain 100,1000 or 10000 strings which each one of them is composed by the same length of chars.
项目是计算属于不同集合的两个字符串之间的汉明距离(假设setA和setB)。这两个集合可能包含100、1000或10000个字符串,每个字符串由相同长度的字符组成。
My problem is that despite the fact that I have decreased the execution time of the parallel program it still takes more time than the serial algorithm.
我的问题是,尽管我已经减少了并行程序的执行时间,但它仍然比串行算法花费更多的时间。
So, I attach my codes for showing what I have done so far.
所以,我附上我的代码来展示我到目前为止所做的。
serial C code.
串行的C代码。
void main(int argc,char **argv)
{
//initialize sets' number and string's length
int m=atoi(argv[1]),n=atoi(argv[2]),I=atoi(argv[3]);
int i=0,j=0,l=0,TotalHammingDistance=0,count;
//creation of 2-dimentional matrices for setA and setB
char **setA = malloc(m * sizeof(char *)); // Allocate row pointers
for(i = 0; i < m; i++)
setA[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
char **setB = malloc(n * sizeof(char *)); // Allocate row pointers
for(i = 0; i < n; i++)
setB[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
// initialize matrices with random string (0 and 1)
for (i=0;i<m;i++){
for(j=0;j<I;j++){
setA[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setA[i][I]='\0';
}
for (i=0;i<n;i++){
for(j=0;j<I;j++){
setB[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setB[i][I]='\0';
}
//creation of m*n matrix to store all hamming distances and initialize it
int **HamDist = malloc(m * sizeof(int *)); // Allocate row pointers
for(i = 0; i < m; i++)
HamDist[i] = malloc(n * sizeof(int));
for(i=0;i<m;i++){
for(j=0;j<n;j++){
HamDist[i][j]=0;
}
}
clock_t start=clock();
//Calculate hamming distance for all combinations of the strings
for (i=0;i<m;i++){
for(j=0;j<n;j++){
count=0;
for(l=0;l<=I;l++) {
if (setA[i][l] != setB[j][l])
count++;
}
HamDist[i][j]=count;
TotalHammingDistance+=HamDist[i][j];
}
}
clock_t end =clock();
double hamm_time=(double)(end-start)/CLOCKS_PER_SEC;
printf("\n|Total Hamming execution time= %f",hamm_time);
printf("\n\n*|The Total Hamming Distance is: %d\n",TotalHammingDistance );
}
OpenMp C code
OpenMp C代码
void main(int argc,char **argv)
{
//initialize sets' number and string's length
int m=atoi(argv[1]),n=atoi(argv[2]),I=atoi(argv[3]);
int i=0,j=0,TotalHammingDistance=0, tid,nthreads,chunk;
//creation of 2-dimentional matrices for setA and setB
char **setA = malloc(m * sizeof(char *)); // Allocate row pointers
for(i = 0; i < m; i++)
setA[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
char **setB = malloc(n * sizeof(char *)); // Allocate row pointers
for(i = 0; i < n; i++)
setB[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
// initialize matrices with random string (0 and 1)
for (i=0;i<m;i++){
for(j=0;j<I;j++){
setA[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setA[i][I]='\0';
}
for (i=0;i<n;i++){
for(j=0;j<I;j++){
setB[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setB[i][I]='\0';
}
//creation of m*n matrix to store all hamming distances and initialize it
uint16_t **HamDist = malloc(m * sizeof(uint16_t *)); // Allocate row pointers
for(i = 0; i < m; i++)
HamDist[i] = malloc(n * sizeof(uint16_t));
for(i=0;i<m;i++){
for(j=0;j<n;j++){
HamDist[i][j]=0;
}
}
printf("\n HamDist set \n" );
int count=0;
clock_t start=clock();
omp_set_num_threads(2);
#pragma omp parallel shared(setA, setB,HamDist )
{
int k,p,l,count=0;
#pragma omp for schedule(dynamic, 10000)
for (k=0;k<m;k++){
for(p=0;p<n;p++){
count=0;
for(l=0;l<=I;l++){
if (setA[k][l] != setB[p][l]){
count++;
}
}
HamDist[k][p]=count;
}
}
}
clock_t end =clock();
double per_time=(double)(end-start)/CLOCKS_PER_SEC;
printf("\n|Total time for two sets= %f",per_time);
/**/
for (i=0;i<m;i++){
for(j=0;j<n;j++){
TotalHammingDistance+=HamDist[i][j];
}
}
printf("\n|Total execution time= %f",per_time);
printf("\n\n*|The Total Hamming Distance is: %d\n",TotalHammingDistance );
}
The execution time that I receive is around 42.011104 for the openmp program and about 32.876482 for the serial algorithm (m=n=10000 and I= 100, where m,n describes the number of strings at each set and I is the string length)
openmp程序的执行时间约为42.011104,串行算法的执行时间约为32.876482 (m=n=10000, I= 100,其中m,n表示每个集合的字符串数,I表示字符串长度)
I strongly believe that the parallel program should takes less execution time. Any idea??
我坚信并行程序应该减少执行时间。任何想法?
Thanks in advance!
提前谢谢!
1 个解决方案
#1
2
Measuring multiprocessor performance is a bit more complicated but we can do a good approximation of "Does it work or not?" with time(1)
. If I do it with your code as it is (with GCC gcc-4.8.real (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5 invoked with gcc -W -Wall -Wextra -O3 -fopenmp openmptest.c -o openmptest
) I got
测量多处理器性能有点复杂,但是我们可以用时间(1)做一个“它是否有效”的近似。如果我使用您的代码(使用GCC GCC -4.8)。real (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5使用gcc -W -Wall -Wextra -O3 - fopenmptest来调用。c -o openmptest
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 9.620011
|Total execution time= 9.620011
*|The Total Hamming Distance is: 1248788142
real 0m9.815s
user 0m9.700s
sys 0m0.116s
Where both, real and user are roughly the same value and also roughly the same as the normal version. If I remove schedule(dynamic, 10000)
completely and let Openmp decide for itself I get
其中,real和user的值大致相同,与正常版本的值也大致相同。如果我完全删除schedule(dynamic, 10000)并让Openmp自己决定我得到
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 9.187761
|Total execution time= 9.187761
*|The Total Hamming Distance is: 1248788142
real 0m4.819s
user 0m9.265s
sys 0m0.112s
That is 5/9 instead of 9/9. If I set omp_set_num_threads(2)
to 4 instead (I have four CPUs here.) I get
5/9而不是9/9。如果我将omp_set_num_threads(2)设置为4(这里有4个cpu)。我得到
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 11.438243
|Total execution time= 11.438243
*|The Total Hamming Distance is: 1248788142
real 0m3.080s
user 0m11.540s
sys 0m0.104s
That is 3/11 < 5/9 < 9/9. So it works as expected if you let OpenMP do it itself. Removing omp_set_num_threads()
gave no difference to the last try.
即3/11 < 5/9 < 9。如果你让OpenMP自己做,它会像预期的那样工作。删除omp_set_num_threads()对最后一次尝试没有影响。
You have a very simple program where OpenMP's defaults work quite well. Fine-tuning OpenMP is a science in and of itself but for example @Davislor 's comment about using reduction
seems to be a good one to start with.
您有一个非常简单的程序,其中OpenMP的默认值工作得非常好。微调OpenMP本身就是一门科学,但是@Davislor关于使用reduce的评论似乎是一个很好的开始。
BTW: You also have a lot of warnings, one of them is about shadowing count
which you declared two times, one before the loop and one inside. You should get rid of all the warnings. It happens more often than not that a very significant information is hidden in between those dozens of warnings.
顺便说一句:你也有很多警告,其中一个是关于阴影计数的,你声明了两次,一个在循环之前,一个在循环内部。你应该去掉所有的警告。在这几十个警告之间隐藏着非常重要的信息,这种情况经常发生。
#1
2
Measuring multiprocessor performance is a bit more complicated but we can do a good approximation of "Does it work or not?" with time(1)
. If I do it with your code as it is (with GCC gcc-4.8.real (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5 invoked with gcc -W -Wall -Wextra -O3 -fopenmp openmptest.c -o openmptest
) I got
测量多处理器性能有点复杂,但是我们可以用时间(1)做一个“它是否有效”的近似。如果我使用您的代码(使用GCC GCC -4.8)。real (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5使用gcc -W -Wall -Wextra -O3 - fopenmptest来调用。c -o openmptest
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 9.620011
|Total execution time= 9.620011
*|The Total Hamming Distance is: 1248788142
real 0m9.815s
user 0m9.700s
sys 0m0.116s
Where both, real and user are roughly the same value and also roughly the same as the normal version. If I remove schedule(dynamic, 10000)
completely and let Openmp decide for itself I get
其中,real和user的值大致相同,与正常版本的值也大致相同。如果我完全删除schedule(dynamic, 10000)并让Openmp自己决定我得到
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 9.187761
|Total execution time= 9.187761
*|The Total Hamming Distance is: 1248788142
real 0m4.819s
user 0m9.265s
sys 0m0.112s
That is 5/9 instead of 9/9. If I set omp_set_num_threads(2)
to 4 instead (I have four CPUs here.) I get
5/9而不是9/9。如果我将omp_set_num_threads(2)设置为4(这里有4个cpu)。我得到
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 11.438243
|Total execution time= 11.438243
*|The Total Hamming Distance is: 1248788142
real 0m3.080s
user 0m11.540s
sys 0m0.104s
That is 3/11 < 5/9 < 9/9. So it works as expected if you let OpenMP do it itself. Removing omp_set_num_threads()
gave no difference to the last try.
即3/11 < 5/9 < 9。如果你让OpenMP自己做,它会像预期的那样工作。删除omp_set_num_threads()对最后一次尝试没有影响。
You have a very simple program where OpenMP's defaults work quite well. Fine-tuning OpenMP is a science in and of itself but for example @Davislor 's comment about using reduction
seems to be a good one to start with.
您有一个非常简单的程序,其中OpenMP的默认值工作得非常好。微调OpenMP本身就是一门科学,但是@Davislor关于使用reduce的评论似乎是一个很好的开始。
BTW: You also have a lot of warnings, one of them is about shadowing count
which you declared two times, one before the loop and one inside. You should get rid of all the warnings. It happens more often than not that a very significant information is hidden in between those dozens of warnings.
顺便说一句:你也有很多警告,其中一个是关于阴影计数的,你声明了两次,一个在循环之前,一个在循环内部。你应该去掉所有的警告。在这几十个警告之间隐藏着非常重要的信息,这种情况经常发生。