linux 3.10的kdump配置的小坑

时间:2021-12-05 08:18:24

之前在2.6系列linux内核中,当发现某个模块不要在保留内核中加载的时候,可以通过blacklist参数将其在/etc/kdump.conf中屏蔽

blacklist <list of kernel modules>

最近发现某个sas驱动存在问题,所以打算也这么屏蔽,结果,出错了:

[root@localhost ~]# service kdump restart
Redirecting to /bin/systemctl restart kdump.service
Job for kdump.service failed because the control process exited with error code. See "systemctl status kdump.service" and "journalctl -xe" for details.
[root@localhost ~]# systemctl status kdump.service
* kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2017-11-28 11:58:28 UTC; 10s ago
  Process: 60563 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
  Process: 60572 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
 Main PID: 60572 (code=exited, status=1/FAILURE)

Nov 28 11:58:28 localhost.localdomain kdumpctl[60572]: Deprecated kdump config option: blacklist. Refer to kdump.conf manpage for alternatives.
Nov 28 11:58:28 localhost.localdomain kdumpctl[60572]: Starting kdump: [FAILED]
Nov 28 11:58:28 localhost.localdomain systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
Nov 28 11:58:28 localhost.localdomain systemd[1]: Failed to start Crash recovery kernel arming.
Nov 28 11:58:28 localhost.localdomain systemd[1]: Unit kdump.service entered failed state.
Nov 28 11:58:28 localhost.localdomain systemd[1]: kdump.service failed.
[root@localhost ~]# journalctl -xe
Nov 28 11:58:28 localhost.localdomain kdumpctl[60563]: kexec: unloaded kdump kernel
Nov 28 11:58:28 localhost.localdomain kdumpctl[60563]: Stopping kdump: [OK]
Nov 28 11:58:28 localhost.localdomain kdumpctl[60572]: Deprecated kdump config option: blacklist. Refer to kdump.conf manpage for alternatives.
Nov 28 11:58:28 localhost.localdomain kdumpctl[60572]: Starting kdump: [FAILED]
Nov 28 11:58:28 localhost.localdomain systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
Nov 28 11:58:28 localhost.localdomain systemd[1]: Failed to start Crash recovery kernel arming.
-- Subject: Unit kdump.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit kdump.service has failed.
-- 
-- The result is failed.
Nov 28 11:58:28 localhost.localdomain systemd[1]: Unit kdump.service entered failed state.
Nov 28 11:58:28 localhost.localdomain systemd[1]: kdump.service failed.
Nov 28 11:58:28 localhost.localdomain polkitd[2087]: Unregistered Authentication Agent for unix-process:60547:533046 (system bus name :1.5128, object path /org/freedesktop/PolicyKit1/AuthenticationAgent
[root@localhost ~]# 

发现blacklist是过时的用法了,然后参照提示:

man kdump.conf 看到如下打印:

blacklist option was recently being used to prevent loading modules in initramfs. General terminology for blacklist has been that module is present in initramfs but it is not actu-
ally loaded in kernel. Hence retaining blacklist option creates more confusing behavior. It has been deprecated. 
Instead, use rd.driver.blacklist option on second kernel to blacklist a certain module. One can edit /etc/sysconfig/kdump.conf and edit KDUMP_COMMANDLINE_APPEND to pass kernel com-
mand line options. Refer to dracut.cmdline man page for more details on module blacklist option.

好吧,按照最新的要求,打算修改/etc/sysconfig/kdump.conf,发现这个文件不存在,当然配置文件路径不是关键,/etc/kdump.conf里面配置也行,

我按照manpage的提示,修改文件名是/etc/sysconfig/kdump,然后修改KDUMP_COMMANDLINE_APPEND这行命令,具体的格式参考:

man  dracut.cmdline

  rd.driver.blacklist=<drivername>[,<drivername>,...]
           do not load kernel module <drivername>. This parameter can be specified multiple times.

       rd.driver.pre=<drivername>[,<drivername>,...]
           force loading kernel module <drivername>. This parameter can be specified multiple times.

       rd.driver.post=<drivername>[,<drivername>,...]
           force loading kernel module <drivername> after all automatic loading modules have been loaded. This parameter can be specified multiple times.

 

另外需要注意的是,当修改了配置,就要重启kdump服务,而这个时候,由于修改了blacklist,会导致重启的时候比较慢,因为在涉及到配置文件变动时,如生成路径修改或blacklist内容增加,都需要重新生成kdump的RAM文件,不然其在发生问题时还是使用老的img RAM文件,这类文件在/boot下以kdump.img结尾的文件就是:

[root@localhost ~]# ls -l /boot/*kdump*
-rw------- 1 root root 16878919 Nov 29 01:02 /boot/initramfs-3.10.0-693.5.2.el7.x86_64kdump.img
-rw------- 1 root root 35261890 Nov 27 07:04 /boot/initramfs-3.10.0caq1.0kdump.img
-rw------- 1 root root 36508192 Nov 24 06:21 /boot/initramfs-3.10.0kdump.img
[root@localhost ~]# 

 最后需要注意的就是,当配置的保留内核在加载驱动或者运行的时候,遇到panic,这个时候就再也没有内核去接管它了,只能在屏幕上打印,或者接串口查看。之前遇到过保留内存不够的

情况下,保留内核自己出现oom了,导致无法收集到crash,查看当前的保留内存可以使用:

[root@localhost ~]# cat /sys/kernel/kexec_crash_size
536870912

查看保留内核是否加载,可以使用:
[root@localhost ~]# cat /sys/kernel/kexec_crash_loaded
1