一.问题现象
操作系统是centos7.9,kubernetes 1.16的集群节点报节点不可用,尝试重启docker和kubelet服务发现启动不了
docker启动报错有Perhaps iptables or your kernel needs to be upgraded和Running iptables --wait -t nat -L -n failed提示。
二.排查和验证过程
$ iptables -nvL -t nat iptables v1.4.21: can't initialize iptables table `nat': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded.发现以上这个错误
此时发现没有 nat 表,我们手动加载此驱动:
$ modprobe iptable_nat modprobe: ERROR: Error running install command for nf_conntrack modprobe: ERROR: could not insert 'iptable_nat': Operation not permitted
此时发现无法正常加载,然后再执行命令如下,添加 nf_conntrack 模块:
$ modprobe nf_conntrack modprobe: ERROR: Error running install command for nf_conntrack modprobe: ERROR: could not insert 'nf_conntrack': Operation not permitted
$ modinfo nf_conntrack filename: /lib/modules/3.10.0-327.el7.x86_64/kernel/net/netfilter/nf_conntrack.ko license: GPL rhelversion: 7.2 srcversion: CEF737212F3398E931CF137 depends: intree: Y vermagic: 3.10.0-327.el7.x86_64 SMP mod_unload modversions signer: CentOS Linux kernel signing key sig_key: 79:AD:88:6A:11:3C:A0:22:35:26:33:6C:0F:82:5B:8A:94:29:6A:B3 sig_hashalgo: sha256 parm: tstamp:Enable connection tracking flow timestamping. (bool) parm: acct:Enable connection tracking flow accounting. (bool) parm: nf_conntrack_helper:Enable automatic conntrack helper assignment (default 1) (bool) parm: expect_hashsize:uint #加载 nf_conntrack $ insmod /lib/modules/3.10.0-327.el7.x86_64/kernel/net/netfilter/nf_conntrack.ko
然后再安装 iptable_nat 、nf_conntrack 模块:
$ modprobe iptable_nat && modprobe nf_conntrack
启动 Docker Daemon 即可:
$ systemctl start dockerk
kubelet启动动failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: “cgroupfs“
查看如下文件
failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
那么第一步:
1./etc/systemd/system/kubelet.service.d/10-kubeadm.conf
查看这个10-kubeadm.conf存不存在,如果不存在,那么直接创建一个:内容是
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
然后剩下的就是:
kubectl join了
如果这个时候报错,那么就在:
/var/lib/kubelet/下
创建kubeadm-flags.env文件
内容是:
KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni –pod-infra-container-image=k8s.gcr.io/pause:3.1"
那么这样就解决了
以上查看没什么问题
查看docker的cgroup,发现是默认的,修改
cat > /etc/docker/daemon.json <<EOF
{"exec-opts": ["native.cgroupdriver=systemd"]}
EOF
再次重启docker,发现启动正常。
备注(以下知道就行,用不到):
卸载 nf_conntrack,不能直接卸载:
$ modprobe -r nf_conntrack modprobe: FATAL: Module nf_conntrack is in use.这时有其他组件正在使用这个模块,通过如下方式查看依赖列表:
$ lsmod|grep nf_ nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE nf_conntrack_netlink 40960 0 nfnetlink 16384 2 nf_conntrack_netlink nf_conntrack_ipv4 16384 3 nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 nf_nat_ipv4 16384 1 iptable_nat nf_nat 24576 3 nf_nat_ipv4,xt_nat,nf_nat_masquerade_ipv4 nf_conntrack 110592 6 nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4此时我们需要逐步进行模块写在操作:
$ modprobe -r iptable_nat nf_conntrack_ipv4 $ modprobe -r ipt_MASQUERADE $ modprobe -r nf_nat_masquerade_ipv4 $ modprobe -r nf_nat nf_nat_masquerade_ipv4 $ modprobe -r nf_conntrack此时卸载完成,执行 iptables 时,则结果如下:
$ iptables -nvL -t nat iptables v1.4.21: can't initialize iptables table `nat': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded.