程序提前终止与valgrind memcheck。

I have a C-programm (a lot of numerics and too long to post) which I compile with

我有一个C-programm(很多数字和太长的帖子)，我用它来编译。

gcc -g -O0 program.c -o program

I am trying to debug it using gdb and valgrind memcheck. After some changes on the code I found that

我正在尝试使用gdb和valgrind memcheck调试它。在对代码进行了一些修改之后，我发现了这一点。

valgrind --tool=memcheck --log-file=output.log ./program

gives

给了

==15866== Memcheck, a memory error detector
==15866== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==15866== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==15866== Command: ./program
==15866== Parent PID: 3362
==15866== 
==15866== Warning: client switching stacks?  SP change: 0xbe88bcd8 --> 0xbe4e1f70
==15866==          to suppress, use: --max-stackframe=3841384 or greater
==15866== Invalid write of size 4
==15866==    at 0x804B7BE: main (program.c:1396)
==15866==  Address 0xbe4e1f74 is on thread 1's stack
==15866== 
==15866== Invalid write of size 4
==15866==    at 0x804B7C2: main (program.c:1396)
==15866==  Address 0xbe4e1f70 is on thread 1's stack
==15866== 
==15866== Invalid read of size 4
==15866==    at 0x4320011: on_exit (on_exit.c:34)
==15866==    by 0x43064D2: (below main) (libc-start.c:226)
==15866==  Address 0xbe4e1f70 is on thread 1's stack
==15866== 
==15866== Invalid read of size 4
==15866==    at 0x4320022: on_exit (on_exit.c:37)
==15866==    by 0x43064D2: (below main) (libc-start.c:226)
==15866==  Address 0xbe4e1f74 is on thread 1's stack

and many more of this kind.

还有更多这样的人。

valgrind --tool=memcheck --max-stackframe=3841384 --log-file=output.log ./program

does not print any errors. But what puzzles me is that with both valgrind calls the program exits early (without error messages) and does not do the computation it is supposed to do. The behaviour with same compiler options but run without valgrind is entirely different and looks pretty normal. I suspect a memory error however and want to use valgrind to find it. My question therefore: What kind of error can make a program bahave so differently when executed with valgrind? And if these are memory related errors how can I identify it? Note that it is clear to me that I can "debug by hand" to locate it. But can I maybe run gdb with valgrind to see where it exits.

不打印任何错误。但令我困惑的是，在两个valgrind的调用中，程序会提前退出(没有错误消息)，也不会执行它应该做的计算。具有相同编译器选项但运行没有valgrind的行为是完全不同的，看起来很正常。我怀疑是内存错误，想用valgrind来找到它。因此，我的问题是:什么样的错误会使程序在与valgrind一起执行时变得如此不同?如果这些是记忆相关的错误我怎么识别呢?请注意，我很清楚，我可以“手动调试”来定位它。但是我可以用valgrind来运行gdb来查看它的出口。

1 个解决方案

#1

I originally answered in the comments:

我最初的回答是:

You're probably causing a stack overflow. Are you allocating "large" arrays on the stack? E.g. double myArray[10000000]; If so then you should replace such allocations with heap memory using malloc and free.

您可能导致了堆栈溢出。您是否在堆栈上分配了“大”数组?如双myArray[10000000];如果是这样，那么您应该使用malloc和free来替换这些使用堆内存的分配。

I wrote a short c-program to intentionally cause a stack overflow like this and check what valgrind reports:

我写了一个简短的c程序，故意造成这样的堆栈溢出，并检查valgrind报告:

#include <stdio.h>

int main(){

  // imax*sizeof(double) is too big for the stack.
  int imax = 10000000;
  double test[imax];

  // I do a little math to prevent the stack overflow from being optimized away if -O3 is used.
  test[0]=0;
  test[1]=1;
  for(int i=2; i<imax; i++)
    test[i]=0.5*(test[i-1]+test[i-2]);
  printf("%e\n", test[imax-1]);

}

Sure enough, valgrind comes up with:

当然，valgrind提出了:

==83869== Warning: client switching stacks?  SP change: 0x104802930 --> 0xffbb7520
==83869==          to suppress, use: --max-stackframe=80000016 or greater
==83869== Invalid write of size 8
==83869==    at 0x100000ED0: main (in ./a.out)
==83869==  Address 0xffbb7520 is on thread 1's stack

along with tons of other error messages, and eventually quits with a Segmentation fault: 11

与大量其他错误消息一起，最终以一个分割错误退出:11。

#1