使用gdb调试正在运行的守护程序

时间:2020-12-27 16:39:59

I am developing a high traffic network C server application that runs as a daemon. Under some circumstances, the app crashes (always without core). How I can debug the running daemon with gdb to find the place that generates the SIGSEGV?

我正在开发一个作为守护进程运行的高流量网络C服务器应用程序。在某些情况下,应用程序崩溃(始终没有核心)。如何使用gdb调试正在运行的守护进程以找到生成SIGSEGV的位置?

Explanatory notes:

  1. I know how to attach using gdb to a running process using attach command

    我知道如何使用attach命令将gdb附加到正在运行的进程

  2. After attaching to the process, it stops. If I run then "continue", gdb remains blocked if the program does not crash. If I press CTRL-C, the process is exiting and I am unable to simply detach gdb.

    在附加到过程后,它会停止。如果我运行然后“继续”,如果程序没有崩溃,gdb仍会被阻止。如果我按CTRL-C,进程正在退出,我无法简单地分离gdb。

So the question is: is there a way to continue the process without the gdb being stuck but being able to detach if the process does not crash?

所以问题是:有没有办法在没有gdb被卡住的情况下继续这个过程但是如果进程没有崩溃就可以分离?

2 个解决方案

#1


6  

Try async mode and "continue &":

尝试异步模式并“继续&”:

Save below to non-stop.gdb

保存下面为non-stop.gdb

set target-async on
set pagination off
set non-stop on

Then run:

$ gdb -x non-top.gdb
(gdb) !pgrep YOUR-DAEMON
1234
(gdb) attach 1234
(gdb) continue -a &
(gdb)

#2


3  

This page attach/detach says that the detach command would work inside gdb.

此页面附加/分离表示detach命令在gdb内部可以正常工作。

If you want to catch a segmentation fault in an application, you will have to run the application from the debugger. Then when the signal is caught you can use where or bt to see a stack trace of the application. Of course you can not continue the application after it faulted, how should it recover? If you expect to trigger the fault soon, you can attach to the running process and again await the fault in the debugger.

如果要在应用程序中捕获分段错误,则必须从调试器运行该应用程序。然后,当捕获到信号时,您可以使用where或bt来查看应用程序的堆栈跟踪。当然,在出现故障后你无法继续申请,它应该如何恢复?如果您希望尽快触发故障,可以连接到正在运行的进程并再次等待调试器中的故障。

If you want a stack trace after the fault occurred, then you really need a core file as there will be no process to attach to. Now if your daemon is started as part of the system it may be hard to get the configuration to dump core, plus you may not want other applications to leave core dumps all over the place. So then I'd advice to stop the system daemon and start it again in your user space, then you can allow it to dump core. If it is really essential that it starts up as part of the system, then see if the start-up of the daemon is confined to a single sub-shell and use ulimit -c in that sub-shell to set an appropriate maximum size for the core dump.

如果您想在故障发生后进行堆栈跟踪,那么您确实需要一个核心文件,因为没有要附加的进程。现在,如果你的守护进程是作为系统的一部分启动的,那么可能很难让配置转储核心,而且你可能不希望其他应用程序在整个地方留下核心转储。那么我建议停止系统守护程序并在用户空间中再次启动它,然后你可以允许它转储核心。如果它作为系统的一部分启动真的很重要,那么看看守护进程的启动是否仅限于一个子shell,并在该子shell中使用ulimit -c来设置适当的最大大小核心转储。

#1


6  

Try async mode and "continue &":

尝试异步模式并“继续&”:

Save below to non-stop.gdb

保存下面为non-stop.gdb

set target-async on
set pagination off
set non-stop on

Then run:

$ gdb -x non-top.gdb
(gdb) !pgrep YOUR-DAEMON
1234
(gdb) attach 1234
(gdb) continue -a &
(gdb)

#2


3  

This page attach/detach says that the detach command would work inside gdb.

此页面附加/分离表示detach命令在gdb内部可以正常工作。

If you want to catch a segmentation fault in an application, you will have to run the application from the debugger. Then when the signal is caught you can use where or bt to see a stack trace of the application. Of course you can not continue the application after it faulted, how should it recover? If you expect to trigger the fault soon, you can attach to the running process and again await the fault in the debugger.

如果要在应用程序中捕获分段错误,则必须从调试器运行该应用程序。然后,当捕获到信号时,您可以使用where或bt来查看应用程序的堆栈跟踪。当然,在出现故障后你无法继续申请,它应该如何恢复?如果您希望尽快触发故障,可以连接到正在运行的进程并再次等待调试器中的故障。

If you want a stack trace after the fault occurred, then you really need a core file as there will be no process to attach to. Now if your daemon is started as part of the system it may be hard to get the configuration to dump core, plus you may not want other applications to leave core dumps all over the place. So then I'd advice to stop the system daemon and start it again in your user space, then you can allow it to dump core. If it is really essential that it starts up as part of the system, then see if the start-up of the daemon is confined to a single sub-shell and use ulimit -c in that sub-shell to set an appropriate maximum size for the core dump.

如果您想在故障发生后进行堆栈跟踪,那么您确实需要一个核心文件,因为没有要附加的进程。现在,如果你的守护进程是作为系统的一部分启动的,那么可能很难让配置转储核心,而且你可能不希望其他应用程序在整个地方留下核心转储。那么我建议停止系统守护程序并在用户空间中再次启动它,然后你可以允许它转储核心。如果它作为系统的一部分启动真的很重要,那么看看守护进程的启动是否仅限于一个子shell,并在该子shell中使用ulimit -c来设置适当的最大大小核心转储。