linux 内核调试

时间:2021-05-12 09:21:41

内核中有多项用于调试的功能,但这些功能会造成额外输出,并导致性能下降,因此发行版本厂商通过都禁止发行版内核中的这些功能。但作为一名内核开发者,调试需求具有更高的优先级,因此应该构造并安装自己的内核,并打开这些调试选项。

一、内核中的调试选项

CONFIG_DEBUG_KERNEL
This option just makes other debugging options available; it should be turned on but does not, by itself, enable any features.
CONFIG_DEBUG_SLAB
This crucial option turns on several types of checks in the kernel memory allocation functions; with these checks enabled, it is possible to detect a number of memory overrun and missing initialization errors. Each byte of allocated memoryis set to 0xa5 before being handed to the caller and then set to 0x6b when it is freed. If you ever see either of those “poison” patterns repeating in output from your driver (or often in an oops listing), you’ll know exactly what sort of error to look for. When debugging is enabled, the kernel also places special guard values before and after every allocated memory object; if those values ever get changed, the kernel knows that somebody has overrun a memory allocation, and it complains loudly. Various checks for more obscure errors are enabled as well.
CONFIG_DEBUG_PAGEALLOC
Full pages are removed from the kernel address space when freed. This option can slow things down significantly, but it can also quickly point out certain kinds of memory corruption errors.
CONFIG_DEBUG_SPINLOCK
With this option enabled, the kernel catches operations on uninitialized spinlocks and various other errors (such as unlocking a lock twice).
CONFIG_DEBUG_SPINLOCK_SLEEP
This option enables a check for attempts to sleep while holding a spinlock. In fact, it complains if you call a function that could potentially sleep, even if the call in question would not sleep.
CONFIG_INIT_DEBUG
Items marked with __init (or __initdata) are discarded after system initialization or module load time. This option enables checks for code that attempts to access initialization-time memory after initialization is complete.
CONFIG_DEBUG_INFO
This option causes the kernel to be built with full debugging information included. You’ll need that information if you want to debug the kernel with gdb. You may also want to enable CONFIG_FRAME_POINTER if you plan to use gdb.
CONFIG_MAGIC_SYSRQ
Enables the “magic SysRq” key. We look at this key in the section “System Hangs,” later in this chapter.
CONFIG_DEBUG_*
CONFIG_DEBUG_STACK_USAGE
These options can help track down kernel stack overflows. A sure sign of a stack overflow is an oops listing without any sort of reasonable back trace. The first option adds explicit overflow checks to the kernel; the second causes the kernel to monitor stack usage and make some statistics available via the magic SysRq key.
CONFIG_KALLSYMS
This option (under “General setup/Standard features”) causes kernel symbol information to be built into the kernel; it is enabled by default. The symbol information is used in debugging contexts; without it, an oops listing can give you a kernel traceback only in hexadecimal, which is not very useful.

CONFIG_IKCONFIG
CONFIG_IKCONFIG_PROC
These options (found in the “General setup” menu) cause the full kernel configuration state to be built into the kernel and to be made available via /proc. Most kernel developers know which configuration they used and do not need these options (which make the kernel bigger). They can be useful, though, if you are trying to debug a problem in a kernel built by somebody else.
CONFIG_ACPI_DEBUG
Under “Power management/ACPI.” This option turns on verbose ACPI
(Advanced Configuration and Power Interface) debugging information, which can be useful if you suspect a problem related to ACPI.
CONFIG_DEBUG_DRIVER
Under “Device drivers.” Turns on debugging information in the driver core, which can be useful for tracking down problems in the low-level support code. We’ll look at the driver core in Chapter 14.
CONFIG_SCSI_CONSTANTS
This option, found under “Device drivers/SCSI device support,” builds in information for verbose SCSI error messages. If you are working on a SCSI driver, you probably want this option.
CONFIG_INPUT_EVBUG
This option (under “Device drivers/Input device support”) turns on verbose logging of input events. If you are working on a driver for an input device, this option may be helpful. Be aware of the security implications of this option, however: it logs everything you type, including your passwords.
CONFIG_PROFILING
This option is found under “Profiling support.” Profiling is normally used for system performance tuning, but it can also be useful for tracking down some kernel hangs and related problems.

二、通过打印调试

1、printk

与printf类似,差别之一是通过附加不同日志级别,让printk根据这些级别所表示的严重程度对消息分类。

 printk(KERN_DEBUG "Here I am: %s:%i\n", __FILE__, __LINE__);
printk(KERN_CRIT "I'm trashed; giving up on %p\n", ptr);

<linux/kernel.h>中定义了八种可用的日志级别

 #define KERN_EMERG "<0>" /* system is unusable */
#define KERN_ALERT "<1>" /* action must be taken immediately*/
#define KERN_CRIT "<2>" /* critical conditions */
#define KERN_ERR "<3>" /* error conditions */
#define KERN_WARNING "<4>" /* warning conditions */
#define KERN_NOTICE "<5>" /* normal but significant condition */
#define KERN_INFO "<6>" /* informational */
#define KERN_DEBUG "<7>" /* debug-level messages */

未指定优先级的printk采用默认级别DEFAULT_MESSAGE_LOGLEVEL,这个宏在kernel/printk.c中定义为一个整数。

 #define DEFAULT_MESSAGE_LOGLEVEL 4 /* KERN_WARNING */  

根据日志级别,内核可能会将消息打印到当前控制台上。当优先级小于console_loglevel时,消息才能显示出来。如果同时运行了klogd和syslogd,则无论console_loglevel为何值,内核消息都将追加到/var/log/messages中。如果klogd没有运行,这些消息就不会传递到用户空间,此时只能查看/proc/kmsg文件(使用dmesg命令轻松做到)。如果使用klogd,则应该了解它不会保存连续相同的信息行,它只会保存连续相同的第一行,并在最后打印这一行的重复次数。

变量console_loglevel的初始值为DEFAULT_CONSOLE_LOGLEVEL(kernel/printk.c),而且还可以通过sys_syslog系统调用修改。调用klogd时可以指定-c开关项来修改这个变量。注意,要修改当前值,必须先杀掉klogd,然后再用新的-c选项重启它。

 #define DEFAULT_CONSOLE_LOGLEVEL 7 /*anything MORE serious than KERN_DEBUG*/ 

也可以通过对文本文件/proc/sys/kernel/printk的访问来读取和修改控制台的日志级别。

 #echo  > /proc/sys/kernel/printk
#cat /proc/sys/kernel/printk

4个整数值分别是:当前的loglevel、默认loglevel、最小允许的loglevel、引导时的默认loglevel。

2、重定向控制台消息

可以在任何一个控制台上调用ioctl(TIOCLINUX)来指定接收消息的其他虚拟机终端。

setconsole程序,可以选择专门用来接收内核消息的控制台。

 * You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., Temple Place, Suite , Boston, MA -, USA.
*/ #include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/ioctl.h> int main(int argc, char **argv)
{
char bytes[] = {,}; /* 11 is the TIOCLINUX cmd number */ if (argc==) bytes[] = atoi(argv[]); /* the chosen console */
else {
fprintf(stderr, "%s: need a single arg\n",argv[]); exit();
}
if (ioctl(STDIN_FILENO, TIOCLINUX, bytes)<) { /* use stdin */
fprintf(stderr,"%s: ioctl(stdin, TIOCLINUX): %s\n",
argv[], strerror(errno));
exit();
}
exit();
}

3、消息如何被记录

printk函数将消息写到一个长度为__LOG_BUF_LEN字节的环形缓冲区,可在配置内核时为__LOG_BUF_LEN指定为4KB~1MB之间。

如果循环缓冲区填满了,printk就绕回缓冲区的开始出填写新的数据,将覆盖最早的数据。

对/proc/kmsg进行读操作,日志缓冲区被读取的数据就不再保留。

syslog系统调用能通过选项返回日志数据并保留数据。

dmesg命令可在不刷新缓冲区数据的情况下获得缓冲区内容。

4、开启和关闭消息

通过预处理指令开启和关闭调试信息

 #undef PDEBUG /* undef it, just in case */
#ifdef SCULL_DEBUG
# ifdef __KERNEL__
/* This one if debugging is on, and kernel space */
# define PDEBUG(fmt, args...) printk( KERN_DEBUG "scull: " fmt, ## args)
# else
/* This one for user space */
# define PDEBUG(fmt, args...) fprintf(stderr, fmt, ## args)
# endif
#else
# define PDEBUG(fmt, args...) /* not debugging: nothing */
#endif #undef PDEBUGG
#define PDEBUGG(fmt, args...) /* nothing: it's a placeholder */

相应的Makefile文件中添加以下代码

 # Comment/uncomment the following line to disable/enable debugging
DEBUG = y # Add your debugging flag (or not) to CFLAGS
ifeq ($(DEBUG),y)
DEBFLAGS = -O -g -DSCULL_DEBUG # "-O" is needed to expand inlines
else
DEBFLAGS = -O2
endif CFLAGS += $(DEBFLAGS)

5、速度限制

为了避免printk重复输出过快而阻塞系统,内核使用以下函数跳过部分输出:

 int printk_ratelimit(void); 

应用的例子

 if (printk_ratelimit( ))
printk(KERN_NOTICE "The printer is still on fire\n");

可以通过修改/proc/sys/kernel/printk_ratelimit(重开信息前应等待的秒数)和/proc/sys/kernel/printk_ratelimit_burst(在速度限制前可接受的信息数)来定制printk_ratelimit的行为。

6、打印设备编号

 #include <linux/kdev_t.h>  

 int print_dev_t(char *buffer, dev_t dev);
char *format_dev_t(char *buffer, dev_t dev);

这2个宏都是将设备编号打印到给定的缓冲区,唯一区别是前者返回打印字符数,后者返回缓冲区。