Here's the basic premise of my code:
这是我的代码的基本前提:
while(norm_of_error > tol){
#pragma omp parallel for
for(i = 1; i <= N*N; i++){
//printf("thread id: %d\n",omp_get_thread_num());
:
int val = based on i
:
#pragma omp critical
x[i-1] = val;
}
#pragma omp barrier
iter++;
}
In short, I am solving Ax = b using the Jacobi iterative method. My problem is that, with the printf()
statement uncommented, the norm_of_error
tends to zero and the while loop ends. However, by simply commenting out the printf()
statement, this doesn't happen. Can anyone give me a hint as to why the printf()
statement has any impact? I'm guessing that the issue has to do with the call to omp_get_thread_num(), but I don't see why that would make any difference.
简而言之,我使用Jacobi迭代方法求解Ax = b。我的问题是,在取消注释printf()语句的情况下,norm_of_error趋于零并且while循环结束。但是,通过简单地注释掉printf()语句,就不会发生这种情况。任何人都可以给我一个提示,为什么printf()语句有任何影响?我猜这个问题与omp_get_thread_num()的调用有关,但是我不明白为什么会有任何区别。
Edit: I changed the printf()
statement to printf("hi\n");
and the code works. Comment that out, and the code doesn't work.
编辑:我将printf()语句更改为printf(“hi \ n”);并且代码有效。注释掉,代码不起作用。
3 个解决方案
#1
3
Code that works with a printf()
statement present, but fails when it is removed, is usually a sign of some invalid operation affecting memory in the program somewhere (e.g. falling off the end of an array, dereferencing NULL
, etc). The misbehaving code may be in some other section of the program entirely (e.g. not within the function that contains the printf()
statement)
与printf()语句一起使用但在删除时失败的代码通常是某些无效操作的标志,这些操作会影响某个程序中的内存(例如,从数组的末尾掉下来,取消引用NULL等)。行为不当的代码可能完全在程序的其他部分(例如,不在包含printf()语句的函数内)
That is even more likely when the offending printf()
statement is something obviously innocent, and without any side effects that can affect behaviour of other code (such as printf("Hi\n")
).
当违规的printf()语句显然是无辜的,并且没有任何可能影响其他代码行为的副作用(例如printf(“Hi \ n”))时,这更有可能发生。
The reason is that the presence of the extra printf()
does actually affect layout of memory for the program as a whole. So the offending code (which may be in some completely unrelated part of the program) still overwrites memory, but the consequence changes (e.g. overwriting some data the program is permitted to change, rather than some area of memory that causes the operating system to terminate the program).
原因是额外的printf()的存在实际上影响了整个程序的内存布局。因此,违规代码(可能在程序中某些完全不相关的部分)仍会覆盖内存,但后果会发生变化(例如,覆盖某些数据,允许程序更改,而不是导致操作系统终止的某些内存区域)该程序)。
This is true whether or not the code is multithreaded.
无论代码是否为多线程,都是如此。
Without complete code that illustrates the problem (i.e. a small sample that someone else can compile, build, and run to get the same symptom) it is not possible to be more specific.
如果没有完整的代码来说明问题(即其他人可以编译,构建和运行以获得相同症状的小样本),则不可能更具体。
#2
1
Keep in mind that C and C++ are different languages.
请记住,C和C ++是不同的语言。
The C FAQ has as section on strange problems:
C FAQ有关于奇怪问题的部分:
comp.lang.c FAQ list · Question 16.5
comp.lang.c常见问题列表·问题16.5
Q: This program runs perfectly on one machine, but I get weird results on another. Stranger still, adding or removing a debugging printout changes the symptoms...
问:这个程序在一台机器上运行完美,但我在另一台机器上得到了奇怪的结果。更奇怪的是,添加或删除调试打印输出会改变症状......
A: Lots of things could be going wrong; here are a few of the more common things to check:
答:很多事情都可能出错;这里有一些比较常见的事情要检查:
- uninitialized local variables [footnote] (see also question 7.1)
未初始化的局部变量[脚注](另见问题7.1)
- integer overflow, especially on 16-bit machines, especially of an intermediate result when doing things like a * b / c (see also question 3.14)
整数溢出,特别是在16位机器上,特别是在执行像* b / c这样的事情时的中间结果(另见问题3.14)
- undefined evaluation order (see questions 3.1 through 3.4)
未定义的评估顺序(见问题3.1至3.4)
- omitted declaration of external functions, especially those which return something other than int, or have ``narrow'' or variable arguments (see questions 1.25, 11.3, 14.2, and 15.1)
省略外部函数的声明,特别是那些返回int以外的函数,或者具有“narrow”或变量参数的函数(参见问题1.25,11.3,14.2和15.1)
- dereferenced null pointers (see section 5)
解除引用的空指针(参见第5节)
- improper malloc/free use: assuming malloc'ed memory contains 0, assuming freed storage persists, freeing something twice, corrupting the malloc arena (see also questions 7.19 and 7.20)
不正确的malloc /免费使用:假设malloc的内存包含0,假设释放存储持续存在,释放两次,破坏malloc竞技场(另见问题7.19和7.20)
- pointer problems in general (see also questions 16.7 and 16.8)
指针问题一般(另见问题16.7和16.8)
- mismatch between printf format and arguments, especially trying to print long ints using %d (see questions 12.7 and 12.9)
printf格式和参数之间不匹配,特别是尝试使用%d打印长整数(参见问题12.7和12.9)
- trying to allocate more memory than an unsigned int can count, especially on machines with limited memory (see also questions 7.16 and 19.23)
尝试分配比unsigned int更多的内存可以计算,特别是在内存有限的机器上(参见问题7.16和19.23)
- array bounds problems, especially of small, temporary buffers, perhaps used for constructing strings with sprintf [footnote] (see also questions 7.1, 12.21, and 19.28)
数组边界问题,尤其是小的临时缓冲区,可能用于使用sprintf构造字符串[脚注](另见问题7.1,12.21和19.28)
- invalid assumptions about the mapping of typedefs, especially size_t (see question 7.15)
关于typedef映射的无效假设,尤其是size_t(参见问题7.15)
- floating point problems (see questions 14.1 and 14.4a)
浮点问题(见问题14.1和14.4a)
- anything you thought was a clever exploitation of the way you believe code is generated for your specific system
你认为的任何东西都巧妙地利用了你认为特定系统生成代码的方式
Proper use of function prototypes can catch several of these problems; lint would catch several more. See also questions 16.3, 16.4, and 18.4.
正确使用函数原型可以解决其中的几个问题; lint会抓到几个。另见问题16.3,16.4和18.4。
#3
1
You haven't posted your code so we can't know for sure, but this typically arises because you are trying to share data amongst threads without adequately indicating that the data is to be shared.
您尚未发布您的代码,因此我们无法确切知道,但这通常是因为您尝试在线程之间共享数据而未充分指示要共享数据。
With the printf
removed, your program is loading the data into a register, and when it needs the data again, it remembers the value in the register rather than fetching it from memory again, thus it doesn't see any changes your other threads may have made.
删除printf后,程序将数据加载到寄存器中,当它再次需要数据时,它会记住寄存器中的值而不是再次从内存中取出它,因此它看不到其他线程可能发生的任何更改取得了。
With the printf
in place, your program doesn't hold the data in register -- maybe it can't afford to spend a register that way, or it can't determine that a function call is incapable of changing the data (sure, it's just printf
, but it might not be special cased, and even if it is, the compiler is better at finding loopholes that might allow printf
to change data than you are) -- so it rereads the data from memory after the call to printf
, and thus sees whatever prior changes that have been made in other threads.
有了printf,你的程序就不会将数据保存在寄存器中 - 也许它不能以这种方式使用寄存器,或者无法确定函数调用无法更改数据(当然,它只是printf,但它可能不是特殊的,即使它是,编译器更好地找到可能允许printf更改数据的漏洞) - 所以它在调用printf后重新读取内存中的数据,因此可以看到在其他线程中进行的任何先前更改。
Another thing the printf
could change is timing: I/O statements are pretty slow as compared to computation, and there is likely some amount of synchronization happening inside the I/O library; your print
might be acting as a pseudo-barrier that is preventing a race condition from occurring.
printf可能改变的另一个问题是时间:与计算相比,I / O语句非常慢,并且I / O库中可能会发生一些同步;您的印刷品可能会起到阻止竞争条件发生的伪障碍的作用。
#1
3
Code that works with a printf()
statement present, but fails when it is removed, is usually a sign of some invalid operation affecting memory in the program somewhere (e.g. falling off the end of an array, dereferencing NULL
, etc). The misbehaving code may be in some other section of the program entirely (e.g. not within the function that contains the printf()
statement)
与printf()语句一起使用但在删除时失败的代码通常是某些无效操作的标志,这些操作会影响某个程序中的内存(例如,从数组的末尾掉下来,取消引用NULL等)。行为不当的代码可能完全在程序的其他部分(例如,不在包含printf()语句的函数内)
That is even more likely when the offending printf()
statement is something obviously innocent, and without any side effects that can affect behaviour of other code (such as printf("Hi\n")
).
当违规的printf()语句显然是无辜的,并且没有任何可能影响其他代码行为的副作用(例如printf(“Hi \ n”))时,这更有可能发生。
The reason is that the presence of the extra printf()
does actually affect layout of memory for the program as a whole. So the offending code (which may be in some completely unrelated part of the program) still overwrites memory, but the consequence changes (e.g. overwriting some data the program is permitted to change, rather than some area of memory that causes the operating system to terminate the program).
原因是额外的printf()的存在实际上影响了整个程序的内存布局。因此,违规代码(可能在程序中某些完全不相关的部分)仍会覆盖内存,但后果会发生变化(例如,覆盖某些数据,允许程序更改,而不是导致操作系统终止的某些内存区域)该程序)。
This is true whether or not the code is multithreaded.
无论代码是否为多线程,都是如此。
Without complete code that illustrates the problem (i.e. a small sample that someone else can compile, build, and run to get the same symptom) it is not possible to be more specific.
如果没有完整的代码来说明问题(即其他人可以编译,构建和运行以获得相同症状的小样本),则不可能更具体。
#2
1
Keep in mind that C and C++ are different languages.
请记住,C和C ++是不同的语言。
The C FAQ has as section on strange problems:
C FAQ有关于奇怪问题的部分:
comp.lang.c FAQ list · Question 16.5
comp.lang.c常见问题列表·问题16.5
Q: This program runs perfectly on one machine, but I get weird results on another. Stranger still, adding or removing a debugging printout changes the symptoms...
问:这个程序在一台机器上运行完美,但我在另一台机器上得到了奇怪的结果。更奇怪的是,添加或删除调试打印输出会改变症状......
A: Lots of things could be going wrong; here are a few of the more common things to check:
答:很多事情都可能出错;这里有一些比较常见的事情要检查:
- uninitialized local variables [footnote] (see also question 7.1)
未初始化的局部变量[脚注](另见问题7.1)
- integer overflow, especially on 16-bit machines, especially of an intermediate result when doing things like a * b / c (see also question 3.14)
整数溢出,特别是在16位机器上,特别是在执行像* b / c这样的事情时的中间结果(另见问题3.14)
- undefined evaluation order (see questions 3.1 through 3.4)
未定义的评估顺序(见问题3.1至3.4)
- omitted declaration of external functions, especially those which return something other than int, or have ``narrow'' or variable arguments (see questions 1.25, 11.3, 14.2, and 15.1)
省略外部函数的声明,特别是那些返回int以外的函数,或者具有“narrow”或变量参数的函数(参见问题1.25,11.3,14.2和15.1)
- dereferenced null pointers (see section 5)
解除引用的空指针(参见第5节)
- improper malloc/free use: assuming malloc'ed memory contains 0, assuming freed storage persists, freeing something twice, corrupting the malloc arena (see also questions 7.19 and 7.20)
不正确的malloc /免费使用:假设malloc的内存包含0,假设释放存储持续存在,释放两次,破坏malloc竞技场(另见问题7.19和7.20)
- pointer problems in general (see also questions 16.7 and 16.8)
指针问题一般(另见问题16.7和16.8)
- mismatch between printf format and arguments, especially trying to print long ints using %d (see questions 12.7 and 12.9)
printf格式和参数之间不匹配,特别是尝试使用%d打印长整数(参见问题12.7和12.9)
- trying to allocate more memory than an unsigned int can count, especially on machines with limited memory (see also questions 7.16 and 19.23)
尝试分配比unsigned int更多的内存可以计算,特别是在内存有限的机器上(参见问题7.16和19.23)
- array bounds problems, especially of small, temporary buffers, perhaps used for constructing strings with sprintf [footnote] (see also questions 7.1, 12.21, and 19.28)
数组边界问题,尤其是小的临时缓冲区,可能用于使用sprintf构造字符串[脚注](另见问题7.1,12.21和19.28)
- invalid assumptions about the mapping of typedefs, especially size_t (see question 7.15)
关于typedef映射的无效假设,尤其是size_t(参见问题7.15)
- floating point problems (see questions 14.1 and 14.4a)
浮点问题(见问题14.1和14.4a)
- anything you thought was a clever exploitation of the way you believe code is generated for your specific system
你认为的任何东西都巧妙地利用了你认为特定系统生成代码的方式
Proper use of function prototypes can catch several of these problems; lint would catch several more. See also questions 16.3, 16.4, and 18.4.
正确使用函数原型可以解决其中的几个问题; lint会抓到几个。另见问题16.3,16.4和18.4。
#3
1
You haven't posted your code so we can't know for sure, but this typically arises because you are trying to share data amongst threads without adequately indicating that the data is to be shared.
您尚未发布您的代码,因此我们无法确切知道,但这通常是因为您尝试在线程之间共享数据而未充分指示要共享数据。
With the printf
removed, your program is loading the data into a register, and when it needs the data again, it remembers the value in the register rather than fetching it from memory again, thus it doesn't see any changes your other threads may have made.
删除printf后,程序将数据加载到寄存器中,当它再次需要数据时,它会记住寄存器中的值而不是再次从内存中取出它,因此它看不到其他线程可能发生的任何更改取得了。
With the printf
in place, your program doesn't hold the data in register -- maybe it can't afford to spend a register that way, or it can't determine that a function call is incapable of changing the data (sure, it's just printf
, but it might not be special cased, and even if it is, the compiler is better at finding loopholes that might allow printf
to change data than you are) -- so it rereads the data from memory after the call to printf
, and thus sees whatever prior changes that have been made in other threads.
有了printf,你的程序就不会将数据保存在寄存器中 - 也许它不能以这种方式使用寄存器,或者无法确定函数调用无法更改数据(当然,它只是printf,但它可能不是特殊的,即使它是,编译器更好地找到可能允许printf更改数据的漏洞) - 所以它在调用printf后重新读取内存中的数据,因此可以看到在其他线程中进行的任何先前更改。
Another thing the printf
could change is timing: I/O statements are pretty slow as compared to computation, and there is likely some amount of synchronization happening inside the I/O library; your print
might be acting as a pseudo-barrier that is preventing a race condition from occurring.
printf可能改变的另一个问题是时间:与计算相比,I / O语句非常慢,并且I / O库中可能会发生一些同步;您的印刷品可能会起到阻止竞争条件发生的伪障碍的作用。