kubernetes的master节点由于nat表、nf_conntrack 问题造成dockers启动不了处理

时间:2022-12-25 07:54:53

一.问题现象

操作系统是centos7.9,kubernetes 1.16的集群节点报节点不可用,尝试重启docker和kubelet服务发现启动不了

docker启动报错有Perhaps iptables or your kernel needs to be upgraded和Running iptables --wait -t nat -L -n failed提示。

二.排查和验证过程

$ iptables -nvL -t nat
iptables v1.4.21: can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.

发现以上这个错误

此时发现没有 nat 表,我们手动加载此驱动:

$ modprobe iptable_nat 
modprobe: ERROR: Error running install command for nf_conntrack modprobe: ERROR: could not insert 'iptable_nat': Operation not permitted

此时发现无法正常加载,然后再执行命令如下,添加 nf_conntrack 模块:

$ modprobe nf_conntrack 
modprobe: ERROR: Error running install command for nf_conntrack modprobe: ERROR: could not insert 'nf_conntrack': Operation not permitted

$ modinfo nf_conntrack
filename:       /lib/modules/3.10.0-327.el7.x86_64/kernel/net/netfilter/nf_conntrack.ko
license:        GPL
rhelversion:    7.2
srcversion:     CEF737212F3398E931CF137
depends:        
intree:         Y
vermagic:       3.10.0-327.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        79:AD:88:6A:11:3C:A0:22:35:26:33:6C:0F:82:5B:8A:94:29:6A:B3
sig_hashalgo:   sha256
parm:           tstamp:Enable connection tracking flow timestamping. (bool)
parm:           acct:Enable connection tracking flow accounting. (bool)
parm:           nf_conntrack_helper:Enable automatic conntrack helper assignment (default 1) (bool)
parm:           expect_hashsize:uint





#加载 nf_conntrack
$ insmod /lib/modules/3.10.0-327.el7.x86_64/kernel/net/netfilter/nf_conntrack.ko

 然后再安装 iptable_nat 、nf_conntrack 模块:

$ modprobe iptable_nat &&  modprobe nf_conntrack 

启动 Docker Daemon 即可:

$ systemctl start dockerk

kubelet启动动failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: “cgroupfs“  

查看如下文件

failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
那么第一步:
    1./etc/systemd/system/kubelet.service.d/10-kubeadm.conf
查看这个10-kubeadm.conf存不存在,如果不存在,那么直接创建一个:内容是
            # Note: This dropin only works with kubeadm and kubelet v1.11+
            [Service]
            Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
            Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
            # This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
            EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
            # This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
            # the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
            EnvironmentFile=-/etc/default/kubelet
            ExecStart=
            ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
然后剩下的就是:
kubectl join了
如果这个时候报错,那么就在:
            /var/lib/kubelet/下
创建kubeadm-flags.env文件
内容是:
            KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni –pod-infra-container-image=k8s.gcr.io/pause:3.1"
那么这样就解决了
 

以上查看没什么问题

查看docker的cgroup,发现是默认的,修改


cat > /etc/docker/daemon.json <<EOF
{"exec-opts": ["native.cgroupdriver=systemd"]}
EOF

 再次重启docker,发现启动正常。

备注(以下知道就行,用不到):

卸载 nf_conntrack,不能直接卸载:

$ modprobe -r nf_conntrack       
modprobe: FATAL: Module nf_conntrack is in use.

这时有其他组件正在使用这个模块,通过如下方式查看依赖列表:

$  lsmod|grep nf_             
nf_nat_masquerade_ipv4    16384  1 ipt_MASQUERADE
nf_conntrack_netlink    40960  0 
nfnetlink              16384  2 nf_conntrack_netlink
nf_conntrack_ipv4      16384  3 
nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
nf_nat_ipv4            16384  1 iptable_nat
nf_nat                 24576  3 nf_nat_ipv4,xt_nat,nf_nat_masquerade_ipv4
nf_conntrack          110592  6 nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4

此时我们需要逐步进行模块写在操作:

$ modprobe -r iptable_nat nf_conntrack_ipv4
$ modprobe -r ipt_MASQUERADE
$ modprobe -r nf_nat_masquerade_ipv4
$ modprobe -r nf_nat nf_nat_masquerade_ipv4
$ modprobe -r nf_conntrack

此时卸载完成,执行 iptables 时,则结果如下:

$ iptables -nvL -t nat
iptables v1.4.21: can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.