为什么gdb在使用单独的调试符号文件时“不能计算CFA”?

I'm trying to invoke gdb with a stripped executable and a separate debug symbols file, on a core dump generated from running the stripped executable.

我正在尝试使用一个剥离的可执行文件和一个单独的调试符号文件来调用gdb，该文件位于运行剥离的可执行文件生成的核心转储上。

But when I use the separate debug symbols file, gdb is unable to give information on local variables for me.

但是当我使用单独的调试符号文件时，gdb无法为我提供关于本地变量的信息。

Here is a log showing entirely how I produce my 3 ELF files and the core file and then run them through gdb 3 times.

这是一个完整的日志，显示了我如何生成我的3个ELF文件和核心文件，然后通过gdb运行它们3次。

First I just run gdb with the stripped executable and of course can't see any file names or line numbers, and can't inspect variables.

首先，我只使用剥离的可执行文件运行gdb，当然不能看到任何文件名或行号，也不能检查变量。
Then I run gdb using the stripped executable and grabbing the debug symbols from the original unstripped executable. This works pretty well but does give a disturbing and apparently unwarranted warning about the core and executable possibly mismatching.

然后，我使用剥离的可执行文件运行gdb，并从原始的未剥离的可执行文件中获取调试符号。这非常有效，但它确实对核心和可执行文件可能不匹配提出了令人不安的、显然毫无根据的警告。
Finally I run gdb with the stripped executable and the separate debug file. This still gives filenames and line numbers, but I can't inspect local variables and I get a "can't compute CFA for this frame" error.

最后，我使用剥离的可执行文件和单独的调试文件运行gdb。这仍然给出文件名和行号，但是我不能检查本地变量，并且我得到了一个“无法为这个帧计算CFA”错误。

Here is the log:

这是日志:

2016-09-16 16:01:45 barry@somehost ~/proj/segfault/segfault
$ cat segfault.c
#include <stdio.h>
int main(int argc, char **argv) {
    char *badpointer = (char *)0x2398723;
    printf("badpointer: %s\n", badpointer);
    return 0;
}

2016-09-16 16:03:31 barry@somehost ~/proj/segfault/segfault
$ gcc -g -o segfault segfault.c

2016-09-16 16:03:37 barry@somehost ~/proj/segfault/segfault
$ objcopy --strip-debug segfault segfault.stripped

2016-09-16 16:03:40 barry@somehost ~/proj/segfault/segfault
$ objcopy --only-keep-debug segfault segfault.debug

2016-09-16 16:03:43 barry@somehost ~/proj/segfault/segfault
$ ./segfault.stripped
Segmentation fault (core dumped)

2016-09-16 16:03:48 barry@somehost ~/proj/segfault/segfault
$ ll /tmp/core.segfault.stripp.11
-rw------- 1 barry bsm-it 188416 2016-09-16 16:03 /tmp/core.segfault.stripp.11

2016-09-16 16:03:51 barry@somehost ~/proj/segfault/segfault
$ gdb ./segfault.stripped /tmp/core.segfault.stripp.11
GNU gdb (GDB) Fedora (7.0.1-50.fc12)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/barry/proj/segfault/segfault/segfault.stripped...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/a6/8dce9115a92508af92ac4ccac24b9f0cc34d71
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./segfault.stripped'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.11.2-3.x86_64
(gdb) bt
#0  0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
#1  0x00000035fec4ec4a in printf () from /lib64/libc.so.6
#2  0x00000000004004f4 in main ()
(gdb) up
#1  0x00000035fec4ec4a in printf () from /lib64/libc.so.6
(gdb) up
#2  0x00000000004004f4 in main ()
(gdb) p argc
No symbol table is loaded.  Use the "file" command.
(gdb) q

2016-09-16 16:04:19 barry@somehost ~/proj/segfault/segfault
$ gdb -q -e ./segfault.stripped -s ./segfault -c /tmp/core.segfault.stripp.11
Reading symbols from /home/barry/proj/segfault/segfault/segfault...done.

warning: core file may not match specified executable file.
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/a6/8dce9115a92508af92ac4ccac24b9f0cc34d71
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./segfault.stripped'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.11.2-3.x86_64
(gdb) bt
#0  0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
#1  0x00000035fec4ec4a in printf () from /lib64/libc.so.6
#2  0x00000000004004f4 in main (argc=1, argv=0x7fffd1c0a728) at segfault.c:4
(gdb) up
#1  0x00000035fec4ec4a in printf () from /lib64/libc.so.6
(gdb) up
#2  0x00000000004004f4 in main (argc=1, argv=0x7fffd1c0a728) at segfault.c:4
4       printf("badpointer: %s\n", badpointer);
(gdb) p argc
$1 = 1
(gdb) q

2016-09-16 16:04:39 barry@somehost ~/proj/segfault/segfault
$ gdb -q -e ./segfault.stripped -s ./segfault.debug -c /tmp/core.segfault.stripp.11
Reading symbols from /home/barry/proj/segfault/segfault/segfault.debug...done.

warning: core file may not match specified executable file.
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/a6/8dce9115a92508af92ac4ccac24b9f0cc34d71
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./segfault.stripped'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.11.2-3.x86_64
(gdb) bt
#0  0x00000035fec47cb7 in vfprintf () from /lib64/libc.so.6
#1  0x00000035fec4ec4a in printf () from /lib64/libc.so.6
#2  0x00000000004004f4 in main (argc=can't compute CFA for this frame
) at segfault.c:4
(gdb) up
#1  0x00000035fec4ec4a in printf () from /lib64/libc.so.6
(gdb) up
#2  0x00000000004004f4 in main (argc=can't compute CFA for this frame
) at segfault.c:4
4       printf("badpointer: %s\n", badpointer);
(gdb) p argc
can't compute CFA for this frame
(gdb) q

I have some questions about this:

我有一些问题:

Why does it display the warning "warning: core file may not match specified executable file.", even though I'm using the exact same executable path as was used when the core dump was originally generated?
为什么会显示警告“警告:核心文件可能与指定的可执行文件不匹配”?，即使我使用的是与核心转储最初生成时使用的完全相同的可执行路径?
Why does using the separate debug symbols (-s ./segfault.debug) result in the error "can't compute CFA for this frame" when attempting to inspect local variables?
当尝试检查局部变量时，为什么使用单独的调试符号(-s ./segfault.debug)导致错误“无法计算这个框架的CFA”?

What is a CFA anyway?

什么是CFA ?

Am I using an incorrect method to product the debug symbol file? I confirmed that using "objcopy --strip-debug" gives the same result as "strip -g". Am I using the right options to feed the debug info into gdb?

我是否使用了错误的方法来生成调试符号文件?我确认使用“objcopy—strip-debug”会得到与“strip -g”相同的结果。我是否使用正确的选项将调试信息输入gdb?

My intention is that the stripped executables will be installed on a binary-compatible production system and any core dumps generated due to segfaults can be copied back to the devel system where we can feed them into gdb with the debug info and analyse the crash position and stack variables. But as a first step I'm trying to sort out the issues with using separate debug info files on the devel system.

我的意图是剥离的可执行文件将被安装在一个二进制兼容的生产系统上，并且由于分段错误而产生的任何核心转储都可以复制回devel系统，在devel系统中，我们可以向gdb提供调试信息并分析崩溃位置和堆栈变量。但是作为第一步，我尝试在devel系统上使用单独的调试信息文件来解决问题。

It seems that using a separate debug symbols file causes the "can't compute CFA for this frame" error, even when a core file is not used.

似乎使用单独的调试符号文件会导致“无法为这个帧计算CFA”错误，即使没有使用核心文件。

My gcc version:

我的gcc版本:

2016-09-16 16:07:39 barry@somehost ~/proj/segfault/segfault
$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)

I suspect that gdb might be looking for symbols related to the variables in the segfault.debug file when objcopy actually only put them in the segfault.stripped file. If this is the case, perhaps some small adjustment to the options to objcopy could put those symbols in the place gdb is looking?

我怀疑gdb可能正在寻找与segfault.debug文件中的变量相关的符号，而objcopy实际上只将它们放在segfault中。剥夺了文件。如果是这样的话，也许对objcopy选项的一些小调整就可以把这些符号放到gdb要查找的地方?

2 个解决方案

#1

I commend you for wanting to keep a set of symbol files for everything that is deployed to the production server; in my opinion this is an often overlooked practice, but you will not regret it -- one day it will save you a lot of debugging trouble.

我赞扬您希望为部署到生产服务器的所有内容保留一组符号文件;在我看来，这是一个经常被忽视的实践，但您不会后悔——有一天它将为您省去许多调试麻烦。

As I have had similar issues in the past, I will try to answer some of your questions, although you have quite an ancient toolchain, if you don't mind me saying so, so I'm not sure how much that really applies here. I'll put up here anyway.

由于我过去也遇到过类似的问题，我将尝试回答您的一些问题，尽管您有一个相当古老的工具链，如果您不介意我这样说的话，所以我不确定这在这里有多少实际应用。我还是写在这里吧。

CFA = Canonical Frame Address. This is the base pointer to the stack frame that every local variable is addressed relative to. If you have done some traditional x86 assembly programming, the BP register was used for this. So "can't compute CFA for this frame" basically says "I know of these local variables, but I don't know where they are located on the stack".

标准帧地址。这是堆栈帧的基本指针，每个局部变量都是相对于堆栈帧进行寻址的。如果您已经完成了一些传统的x86汇编编程，那么将使用BP寄存器。所以"不能为这个框架计算CFA "基本上说"我知道这些局部变量，但我不知道它们在堆栈上的位置"

There used to be code in GDB that worked only for the DWARF-2 debugging format, and non-conformance triggered this particular error at least. That restriction was lifted some time ago, but that change won't be in your version.

GDB中曾经有只适用于矮人-2调试格式的代码，不一致至少触发了这个特殊的错误。这个限制在一段时间以前就被取消了，但是这个改变不会出现在您的版本中。

The other thing is there are debug information regarding how variables may be moved around is not always generated. This usually happens in newer compilers though, as they get better at optimizing.

另一件事是关于变量如何移动的调试信息并不总是生成的。这通常发生在更新的编译器中，因为它们在优化方面做得更好。

I was able to get rid of my problems by compiling like this:

我能够通过这样的编辑来解决我的问题:

gcc -g3 -gdwarf-2 -fvar-tracking -fvar-tracking-assignments -o segfault segfault.c

you can try to see if this solves your problem, too.

你可以试着看看这是否能解决你的问题。

Regarding the message about the location of the symbol file; it seems that the debugger wants to load it from the system directory. Maybe you have to link the executable to the symbol file with:

关于符号文件位置的消息;调试器似乎想要从系统目录中加载它。也许您必须将可执行文件链接到符号文件:

objcopy --add-gnu-debuglink=segfault.debug segfault

#2

I found this question while searching for an answer to the following part of the original question:

我在寻找以下问题的答案时发现了这个问题:

Why does it display the warning "warning: core file may not match specified executable file.", even though I'm using the exact same executable path as was used when the core dump was originally generated?

为什么会显示警告“警告:核心文件可能与指定的可执行文件不匹配”?，即使我使用的是与核心转储最初生成时使用的完全相同的可执行路径?

There was not an answer to this particular question but through experimentation and research I believe I have found the answer.

这个问题没有答案，但通过实验和研究，我相信我找到了答案。

Below is a transcript of using gdb to debug a core file. Notice that the "warning: core file may not match specified executable file." error appears when the executable file that caused the core is greater than 15 characters in length.

下面是使用gdb调试核心文件的文本。请注意，“警告:核心文件可能与指定的可执行文件不匹配。”

[~/t]$cat do_abort.c 
#include <stdlib.h>

int func4(int f) { if(f) {abort();} return 0;}
int func3(int f) { return func4(f); }
int func2(int f) { return func3(f); }
int func1(int f) { return func2(f); }
int main(void) {  return func1(1); }

[~/t]$gcc -g -o 123456789012345 do_abort.c 
[~/t]$./123456789012345 
Aborted (core dumped)
[~/t]$ll core*
-rw-------. 1 dev wheel 240K Apr 22 03:19 core.42697
[~/t]$gdb -q -c core.42697 123456789012345 
Reading symbols from /home/dev/t/123456789012345...done.
[New LWP 42697]
Core was generated by `./123456789012345'.
Program terminated with signal 6, Aborted.
#0  0x00007f0be67631d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56    return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f0be67631d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f0be67648c8 in __GI_abort () at abort.c:90
#2  0x0000000000400543 in func4 (f=1) at do_abort.c:3
#3  0x000000000040055f in func3 (f=1) at do_abort.c:4
#4  0x0000000000400576 in func2 (f=1) at do_abort.c:5
#5  0x000000000040058d in func1 (f=1) at do_abort.c:6
#6  0x000000000040059d in main () at do_abort.c:7
(gdb) quit
[~/t]$rm core.42697 
[~/t]$
[~/t]$mv 123456789012345 1234567890123456
[~/t]$./1234567890123456 
Aborted (core dumped)
[~/t]$ll core*
-rw-------. 1 dev wheel 240K Apr 22 03:20 core.42721
[~/t]$gdb -q -c core.42721 1234567890123456 
Reading symbols from /home/dev/t/1234567890123456...done.

warning: core file may not match specified executable file.
[New LWP 42721]
Core was generated by `./1234567890123456'.
Program terminated with signal 6, Aborted.
#0  0x00007f5b271fa1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56    return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f5b271fa1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f5b271fb8c8 in __GI_abort () at abort.c:90
#2  0x0000000000400543 in func4 (f=1) at do_abort.c:3
#3  0x000000000040055f in func3 (f=1) at do_abort.c:4
#4  0x0000000000400576 in func2 (f=1) at do_abort.c:5
#5  0x000000000040058d in func1 (f=1) at do_abort.c:6
#6  0x000000000040059d in main () at do_abort.c:7
(gdb) quit

[~/t]$mv 1234567890123456 123456789012345
[~/t]$gdb -q -c core.42721 123456789012345 
Reading symbols from /home/dev/t/123456789012345...done.
[New LWP 42721]
Core was generated by `./1234567890123456'.
Program terminated with signal 6, Aborted.
#0  0x00007f5b271fa1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56    return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f5b271fa1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f5b271fb8c8 in __GI_abort () at abort.c:90
#2  0x0000000000400543 in func4 (f=1) at do_abort.c:3
#3  0x000000000040055f in func3 (f=1) at do_abort.c:4
#4  0x0000000000400576 in func2 (f=1) at do_abort.c:5
#5  0x000000000040058d in func1 (f=1) at do_abort.c:6
#6  0x000000000040059d in main () at do_abort.c:7
(gdb) quit

Following through the gdb source code I discovered that the ELF core file structure only reserves sixteen bytes to hold the executable filename, pr_fname[16], including the nul terminator (reference):

通过gdb源代码，我发现ELF核心文件结构仅保留16个字节来保存可执行文件名pr_fname[16]，包括nul终止符(引用):

  35 struct elf_external_linux_prpsinfo32_ugid32
  36   {
  37     char pr_state;                      /* Numeric process state.  */
  38     char pr_sname;                      /* Char for pr_state.  */
  39     char pr_zomb;                       /* Zombie.  */
  40     char pr_nice;                       /* Nice val.  */
  41     char pr_flag[4];                    /* Flags.  */
  42     char pr_uid[4];
  43     char pr_gid[4];
  44     char pr_pid[4];
  45     char pr_ppid[4];
  46     char pr_pgrp[4];
  47     char pr_sid[4];
  48     char pr_fname[16];                  /* Filename of executable.  */
  49     char pr_psargs[80];                 /* Initial part of arg list.  */
  50   };

The "warning: core file may not match specified executable file." warning will be issued by gdb when the name of the executable passed on the command-line to gdb doesn't match the value stored in pr_fname[] in the core file (references here, here, and here).

“警告:核心文件可能与指定的可执行文件不匹配。”当命令行传递给gdb的可执行文件的名称与核心文件中的pr_fname[]中的值不匹配时，gdb将发出警告。

Using the demonstration I showed at the start of this answer, when the filename is 1234567890123456 the filename stored in the core file as pr_fname[] is 123456789012345 (truncated to 15 characters). If gdb is started using gdb -c core.XXXX 1234567890123456 then the warning will be issued. If gdb is started using gdb -c core.XXXX 123456789012345 then the warning will not be issued.

使用我在答案开头展示的演示，当文件名是1234567890123456时，作为pr_fname[]存储在核心文件中的文件名是123456789012345(截断为15个字符)。如果gdb使用gdb -c核心启动。然后发出警告。如果gdb使用gdb -c核心启动。XXXX 123456789012345则不会发出警告。

It should follow that in the example from the original question, if segfault.stripped was renamed to segfault.stripp and gdb was run using gdb ./segfault.stripp /tmp/core.segfault.stripp.11 then the warning should not be issued.

它应该从最初的问题，如果segfault。剥离被重命名为segfault。stripp和gdb使用gdb ./segfault运行。脱/ tmp / core.segfault.stripp。那么，不应该发出警告。

#1