Edit Full source is here:
编辑完整的源代码如下:
http://code.seanwoods.com/reynard.fossil.cgi/artifact/0cc9cbfbe021c2ba86dcb4d0cf6ada52f0a80063
http://code.seanwoods.com/reynard.fossil.cgi/artifact/0cc9cbfbe021c2ba86dcb4d0cf6ada52f0a80063
Calling program here:
调用程序:
http://code.seanwoods.com/reynard.fossil.cgi/artifact/891405e62c95349aaf461dfb8ba82259f77fac9b
http://code.seanwoods.com/reynard.fossil.cgi/artifact/891405e62c95349aaf461dfb8ba82259f77fac9b
I've got a relatively simple memory allocation that's failing. The application is not particularly complicated although it does allocate memory in a few places. It's C, not C++. I'm positive this is an issue allocating memory, not freeing memory.
我有一个相对简单的内存分配失败。应用程序并不特别复杂,尽管它确实在一些地方分配内存。这是C,c++。我确信这是一个分配内存的问题,而不是释放内存。
Here's the code:
这是代码:
printf(":2 %d %d\n", initial_len, initial_len * sizeof(char));
o->data = (char*) malloc(initial_len * sizeof(char));
printf(":3 \n");
Upon execution, I get:
在执行时,我:
:1
:2 1024 1024
*** glibc detected *** ./menv: corrupted double-linked list: 0x0000000001d14400 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x76d76)[0x7f680cfc4d76]
/lib/x86_64-linux-gnu/libc.so.6(+0x771ed)[0x7f680cfc51ed]
/lib/x86_64-linux-gnu/libc.so.6(+0x794d4)[0x7f680cfc74d4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x70)[0x7f680cfc9b90]
./menv[0x403971]
./menv[0x40391d]
./menv[0x4030ec]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f680cf6cead]
./menv[0x401369]
======= Memory map: ========
00400000-00405000 r-xp 00000000 08:03 2621441 /home/swoods/code/reynard/modules/stdlib/menv
00605000-00606000 rw-p 00005000 08:03 2621441 /home/swoods/code/reynard/modules/stdlib/menv
00606000-00706000 rw-p 00000000 00:00 0
01cfd000-01d3d000 rw-p 00000000 00:00 0 [heap]
7f6808000000-7f6808021000 rw-p 00000000 00:00 0
7f6808021000-7f680c000000 ---p 00000000 00:00 0
7f680cd38000-7f680cd4d000 r-xp 00000000 08:05 10354962 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f680cd4d000-7f680cf4d000 ---p 00015000 08:05 10354962 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f680cf4d000-7f680cf4e000 rw-p 00015000 08:05 10354962 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f680cf4e000-7f680d0ce000 r-xp 00000000 08:05 10354980 /lib/x86_64-linux-gnu/libc-2.13.so
7f680d0ce000-7f680d2ce000 ---p 00180000 08:05 10354980 /lib/x86_64-linux-gnu/libc-2.13.so
7f680d2ce000-7f680d2d2000 r--p 00180000 08:05 10354980 /lib/x86_64-linux-gnu/libc-2.13.so
7f680d2d2000-7f680d2d3000 rw-p 00184000 08:05 10354980 /lib/x86_64-linux-gnu/libc-2.13.so
7f680d2d3000-7f680d2d8000 rw-p 00000000 00:00 0
7f680d2d8000-7f680d2da000 r-xp 00000000 08:05 10354973 /lib/x86_64-linux-gnu/libdl-2.13.so
7f680d2da000-7f680d4da000 ---p 00002000 08:05 10354973 /lib/x86_64-linux-gnu/libdl-2.13.so
7f680d4da000-7f680d4db000 r--p 00002000 08:05 10354973 /lib/x86_64-linux-gnu/libdl-2.13.so
7f680d4db000-7f680d4dc000 rw-p 00003000 08:05 10354973 /lib/x86_64-linux-gnu/libdl-2.13.so
7f680d4dc000-7f680d4fc000 r-xp 00000000 08:05 10354984 /lib/x86_64-linux-gnu/ld-2.13.so
7f680d6df000-7f680d6e2000 rw-p 00000000 00:00 0
7f680d6f8000-7f680d6fb000 rw-p 00000000 00:00 0
7f680d6fb000-7f680d6fc000 r--p 0001f000 08:05 10354984 /lib/x86_64-linux-gnu/ld-2.13.so
7f680d6fc000-7f680d6fd000 rw-p 00020000 08:05 10354984 /lib/x86_64-linux-gnu/ld-2.13.so
7f680d6fd000-7f680d6fe000 rw-p 00000000 00:00 0
7ffff3bd6000-7ffff3bf7000 rw-p 00000000 00:00 0 [stack]
7ffff3bff000-7ffff3c00000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted
- The code compiles without issue.
- 代码编译时没有问题。
- When I run it "standalone," it crashes with the error above. I see
:2
but I don't see:3
, which tells me it's an error within malloc. (I hope I'm wrong.) - 当我“独立”运行它时,它会由于上面的错误而崩溃。我明白了:2,但我不明白:3,这告诉我这是malloc的一个错误。(我希望我是错的。)
- When I run the same binary through
valgrind
, it works as expected. - 当我在valgrind中运行相同的二进制代码时,它可以像预期的那样工作。
- It does not appear to be an issue with the variable declaration
o->data
, which is achar*
. If I declarechar* A; A =
instead ofo->data =
it still crashes. - 对于变量声明o->数据(即char*)来说,这似乎不是问题。如果我申报char* A;A =而不是o->数据=它仍然崩溃。
I would greatly appreciate any ideas as to how to troubleshoot/why this happens.
我将非常感谢关于如何排除故障/为什么会发生这种情况的任何想法。
Thanks!
谢谢!
2 个解决方案
#1
13
So, I think I found it. We may need to file this under "Sean needs to learn basic Valgrind skills." Here's how I solved it for any future observers.
所以,我想我找到了。我们可能需要在“西恩需要学习基本的Valgrind技能”的标题下进行归档。这是我如何为未来的观察者解决的。
- Okay, we're dealing with a really strange error thrown by a tried and tested library function, so it must be something specific to my setup. The algorithm is the same, so it must be the data.
- 好的,我们正在处理一个经过测试的库函数抛出的一个非常奇怪的错误,所以它必须是特定于我的设置的。算法是一样的,所以必须是数据。
- The dynamic memory implementation has an underlying data structure to track allocated memory, which happens to be a doubly linked list -- thus the message.
- 动态内存实现有一个底层数据结构来跟踪已分配的内存,这恰好是一个双向链表——因此消息。
- So, there must be a memory operation somewhere that corrupts this data structure in a subtle way.
- 因此,一定有一个内存操作以一种微妙的方式破坏这个数据结构。
- Okay, what tools do we have at our disposal? Valgrind is highly praised, let's try that. Strange, it works in Valgrind. Hmm.
- 我们有什么工具可以使用?Valgrind很受欢迎,我们来试试。很奇怪,它在Valgrind工作。嗯。
- Actually read what Valgrind is telling you. (This is where I didn't do my part.) It flags you with errors such as "Invalid write of size 1" along with a trace of the various labels/symbols where this shows up. Look for possible errors and adjust as necessary.
- 读一下Valgrind告诉你的东西。(这就是我没有尽我所能的地方。)它会向您标记一些错误,例如“无效的1大小的写入”,以及显示这些错误的各种标签/符号的跟踪。寻找可能的错误并根据需要进行调整。
- In this case, it was pointing me to an invocation of
memcpy()
in thehashtable_put
function ofhashtable.c
. The subtle hint is that I was passing the first argument to memcpy using the address-of operator&
, which caused the corruption. - 在本例中,它指示我在hashtable_put函数的hashtable_put函数中调用memcpy()。微妙的暗示是,我正在使用address-of操作符&将第一个参数传递给memcpy,这导致了错误。
- When I fixed that, Valgrind no longer complained.
- 当我解决了这个问题,Valgrind不再抱怨。
The moral of the story:
故事的寓意是:
- Don't ignore feedback from the tools. No news is [usually] good news, so if Valgrind spits out lots of info their is an increased likelihood of a problem.
- 不要忽视工具的反馈。没有消息是好消息,所以如果Valgrind有大量的信息,那么问题的可能性就会增加。
- Dynamic memory allocation bugs are subtle (dynamic in the true sense of the word) and can be affected by many variables. Valgrind puts things in the middle of your program and the memory library so it knows what's going on, so I think these affected the program's operation somehow.
- 动态内存分配bug是很微妙的(动态的),并且可以受到许多变量的影响。Valgrind把东西放到程序和内存库的中间,让它知道发生了什么,所以我认为这些会影响程序的运行。
The commit that has so far fixed the issue:
迄今为止已解决这个问题的承诺:
http://code.seanwoods.com/reynard.fossil.cgi/ci/bd6a5a23c1?sbs=0
http://code.seanwoods.com/reynard.fossil.cgi/ci/bd6a5a23c1?sbs=0
#2
-1
EDIT: As we have almost no idea what your struct o looks like and what datatype o->data
should be, we can only speculae what you're trying to do.
编辑:由于我们几乎不知道您的struct o是什么样子,也不知道o->数据应该是什么类型,我们只能推测您要做什么。
Please specify the o
struct definition, so we can help.
请指定o struct的定义,以便我们可以帮助。
#1
13
So, I think I found it. We may need to file this under "Sean needs to learn basic Valgrind skills." Here's how I solved it for any future observers.
所以,我想我找到了。我们可能需要在“西恩需要学习基本的Valgrind技能”的标题下进行归档。这是我如何为未来的观察者解决的。
- Okay, we're dealing with a really strange error thrown by a tried and tested library function, so it must be something specific to my setup. The algorithm is the same, so it must be the data.
- 好的,我们正在处理一个经过测试的库函数抛出的一个非常奇怪的错误,所以它必须是特定于我的设置的。算法是一样的,所以必须是数据。
- The dynamic memory implementation has an underlying data structure to track allocated memory, which happens to be a doubly linked list -- thus the message.
- 动态内存实现有一个底层数据结构来跟踪已分配的内存,这恰好是一个双向链表——因此消息。
- So, there must be a memory operation somewhere that corrupts this data structure in a subtle way.
- 因此,一定有一个内存操作以一种微妙的方式破坏这个数据结构。
- Okay, what tools do we have at our disposal? Valgrind is highly praised, let's try that. Strange, it works in Valgrind. Hmm.
- 我们有什么工具可以使用?Valgrind很受欢迎,我们来试试。很奇怪,它在Valgrind工作。嗯。
- Actually read what Valgrind is telling you. (This is where I didn't do my part.) It flags you with errors such as "Invalid write of size 1" along with a trace of the various labels/symbols where this shows up. Look for possible errors and adjust as necessary.
- 读一下Valgrind告诉你的东西。(这就是我没有尽我所能的地方。)它会向您标记一些错误,例如“无效的1大小的写入”,以及显示这些错误的各种标签/符号的跟踪。寻找可能的错误并根据需要进行调整。
- In this case, it was pointing me to an invocation of
memcpy()
in thehashtable_put
function ofhashtable.c
. The subtle hint is that I was passing the first argument to memcpy using the address-of operator&
, which caused the corruption. - 在本例中,它指示我在hashtable_put函数的hashtable_put函数中调用memcpy()。微妙的暗示是,我正在使用address-of操作符&将第一个参数传递给memcpy,这导致了错误。
- When I fixed that, Valgrind no longer complained.
- 当我解决了这个问题,Valgrind不再抱怨。
The moral of the story:
故事的寓意是:
- Don't ignore feedback from the tools. No news is [usually] good news, so if Valgrind spits out lots of info their is an increased likelihood of a problem.
- 不要忽视工具的反馈。没有消息是好消息,所以如果Valgrind有大量的信息,那么问题的可能性就会增加。
- Dynamic memory allocation bugs are subtle (dynamic in the true sense of the word) and can be affected by many variables. Valgrind puts things in the middle of your program and the memory library so it knows what's going on, so I think these affected the program's operation somehow.
- 动态内存分配bug是很微妙的(动态的),并且可以受到许多变量的影响。Valgrind把东西放到程序和内存库的中间,让它知道发生了什么,所以我认为这些会影响程序的运行。
The commit that has so far fixed the issue:
迄今为止已解决这个问题的承诺:
http://code.seanwoods.com/reynard.fossil.cgi/ci/bd6a5a23c1?sbs=0
http://code.seanwoods.com/reynard.fossil.cgi/ci/bd6a5a23c1?sbs=0
#2
-1
EDIT: As we have almost no idea what your struct o looks like and what datatype o->data
should be, we can only speculae what you're trying to do.
编辑:由于我们几乎不知道您的struct o是什么样子,也不知道o->数据应该是什么类型,我们只能推测您要做什么。
Please specify the o
struct definition, so we can help.
请指定o struct的定义,以便我们可以帮助。