问题描述
使用ansible安装Kubernetes,最后出现如所示报错,提示kubelet启动异常
TASK [kube-node : 轮询等待kubelet启动] ******************************************************************************************************************************
fatal: [192.168.10.52]: FAILED! => {"attempts": 4, "changed": true, "cmd": "systemctl is-active kubelet.service", "delta": "0:00:00.006796", "end": "2023-02-01 22:30:10.756458", "msg": "non-zero return code", "rc": 3, "start": "2023-02-01 22:30:10.749662", "stderr": "", "stderr_lines": [], "stdout": "activating", "stdout_lines": ["activating"]}
fatal: [192.168.10.51]: FAILED! => {"attempts": 4, "changed": true, "cmd": "systemctl is-active kubelet.service", "delta": "0:00:00.010879", "end": "2023-02-01 22:30:10.859450", "msg": "non-zero return code", "rc": 3, "start": "2023-02-01 22:30:10.848571", "stderr": "", "stderr_lines": [], "stdout": "activating", "stdout_lines": ["activating"]}
PLAY RECAP **********************************************************************************************************************************************************
192.168.10.51 : ok=50 changed=30 unreachable=0 failed=1 skipped=1 rescued=0 ignored=0
192.168.10.52 : ok=49 changed=30 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
问题排查
检查kubelet状态,显示没启动成功
使用journalctl -u kubelet --no-pager 查看启动报错日志
Dec 07 23:50:21 iZ2vc2h2j9l2p8zqnwy6zoZ kubelet[24786]: E1207 23:50:21.347929 24786 remote_runtime.go:168] "Version from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
Dec 07 23:50:21 iZ2vc2h2j9l2p8zqnwy6zoZ kubelet[24786]: E1207 23:50:21.348041 24786 kuberuntime_manager.go:225] "Get runtime version failed" err="get remote runtime typed version failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
Dec 07 23:50:21 iZ2vc2h2j9l2p8zqnwy6zoZ kubelet[24786]: Error: failed to run Kubelet: failed to create kubelet: get remote runtime typed version failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService
依据报错应该是containerd的问题,确认containerd状态
应该和配置文件/etc/containerd/config.toml中的disabled_plugins = ["cri"]有关,详情参见https://github.com/containerd/containerd/issues/4581
移除/etc/containerd/config.toml配置文件
grep "disabled_plugins" /etc/containerd/config.toml
mv /etc/containerd/config.toml /tmp/
重启 kubelet 成功
问题原因
https://github.com/containerd/containerd/issues/4581、
解决办法
mv /etc/containerd/config.toml /tmp
systemctl restart containerd
systemctl restart kubelet