Pod 是可以在 Kubernetes 中创建和管理的、最小的可部署的计算单元, 它包含一组容器,这些容器共享存储、网络、以及怎样运行这些容器的声明。
当Pod备调度期调度到某个节点后,节点上的kubelet就会和High-Level的容器运行时通信,把创建pod所涉及的容器参数传递给容器运行时,容器运行时最终通过Low-Level的runc或者其它运行时工具创建对应的容器。当容器创建成功后,我们可以通过docker inspect 或者 nerdctl inspect 来找到容器的pid,然后通过pid找到具体的cgroup信息。
k8s pod相关cgroup基础信息位置如下
k8s pod相关cgroup位置 : /sys/fs/cgroup/systemd/kubepods.slice
Guaranteed 类型pod 直接存放在 kubepods.slice 根目录下,例如:
/sys/fs/cgroup/systemd/kubepods.slice/kubepods-pod48574e3c_f4d0_4a5c_84bb_166fd32ea22b.slice
Burstable 类型pod直接在子目录 kubepods-burstable.slice下, 例如 :
/sys/fs/cgroup/systemd/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod49f8fde6_4c35_44c7_a237_c5b8c4312953.slice
Best-Effort 类型pod直接在子目录 kubepods-besteffort.slice下, 例如:
/sys/fs/cgroup/systemd/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice
此处以 xg-dev命名空间的 pod besteffort-busybox-778b6fb576-8p59p为例,可以通过如下步骤找到对应的cgroup详细信息
1) 获取容器信息
# nerdctl --namespace=k8s.io ps|grep besteffort
b6ec5bbb1447 docker.io/kubesphere/pause:3.7 "/pause" 36 minutes ago Up k8s://xg-dev/besteffort-busybox-778b6fb576-8p59p
be0bfc9325bc docker.io/library/busybox:1.32 "/bin/sh -c sleep 36…" 36 minutes ago Up k8s://xg-dev/besteffort-busybox-778b6fb576-8p59p/busybox
2)通过nerdctl inspect 获取容器pid
# nerdctl --namespace=k8s.io inspect be0bfc9325bc|grep -i pid
"Pid": 33866,
3)通过pid获取 cgroup位置
通过 cat /proc/${pid}/cgroup 来找到实际pid的cgroup配置
# cat /proc/33866/cgroup
11:devices:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
10:blkio:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
9:hugetlb:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
8:memory:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
7:freezer:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
6:perf_event:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
5:net_prio,net_cls:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
4:cpuset:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
3:cpuacct,cpu:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
2:pids:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
1:name=systemd:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
4)通过 ls /sys/fs/cgroup/systemd/** 就可以看到这个pod指定容器的cgroup基础信息,其中 cgroup.procs 存放了容器进程的id
# ls /sys/fs/cgroup/systemd/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
cgroup.clone_children cgroup.event_control cgroup.procs notify_on_release tasks
5)进一步可以在 /sys/fs/cgroup/cpu/kubepods.slice/* 中查看cpu相关详细cgroup信息
# ls /sys/fs/cgroup/cpu/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc1bb8757_1115_4dc5_a0cf_5fd39e14fdd9.slice/cri-containerd-be0bfc9325bcf952464d5bf613e29afd792cbb9069bd34164ddc6a23e5b10ea5.scope
cgroup.clone_children cgroup.procs cpuacct.usage cpu.cfs_period_us cpu.rt_period_us cpu.shares notify_on_release
cgroup.event_control cpuacct.stat cpuacct.usage_percpu cpu.cfs_quota_us cpu.rt_runtime_us cpu.stat tasks
同理可以在 /sys/fs/cgroup/{blkio,memory}/** 中查看blkio、memory等详细信息。
创建3个不同qosClass 的 deployment, 相关参数如下:
besteffort-busybox | burstable-busybox | guaranted-busybox | |
---|---|---|---|
cpu requests | 60m | 60m | 空 |
memory requests | 50Mi | 50Mi | 空 |
cpu limits | 60m | 100m | 空 |
memory limits | 50Mi | 100Mi | 空 |
cgroup 位置 | /sys/*/kubepods.slice/ | /sys/*/kubepods.slice/kubepods-burstable.slice | /sys/*/kubepods.slice/kubepods-besteffort.slice |
guaranted-busybox 容器的CPU和Memory信息如下
这里 cfs_period_us 默认为100ms, 100ms内cfs_quota_us为6ms,即1000ms内为60ms,等价于我们的60m
这里 5010241024 = 52428800 ,刚好为50Mi
同理 burstable-busybox 容器的CPU和Memory信息如下
这里100ms内cfs_quota_us为10ms,刚好对应CPU limit 100m, 10010241024 = 104857600 ,刚好对应memory limit 100Mi
同理 besteffort-busybox 容器的CPU和Memory信息如下
可以发现 besteffort 的pod对应的容器cpu.cfs_quota_us为-1, memory.limit_in_bytes为一个极大值(远超实际的内存)。
通过上述内容可以发现当Pod对应的容器在机器创建成功后,系统上会对该容器做对应的cgroup限制,后续CPU、内存的使用就会被限制了。