创建一个 liveness.yaml 文件
apiVersion: v1
kind: Pod
metadata:
name: liveness-http
labels:
test: liveness
spec:
containers:
- name: liveness
image: mirrorgooglecontainers/liveness
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
这是官方曾经提供的一个案例
在这个配置文件中,可以看到 Pod 也只有一个容器
periodSeconds 字段指定了 kubelet 每隔 3 秒执行一次存活探测
initialDelaySeconds 字段告诉 kubelet 在执行第一次探测前应该等待 3 秒
kubelet会向容器内运行的服务(服务会监听 8080 端口)发送一个 HTTP GET 请求来执行探测
如果服务器上 /healthz 路径下的处理程序返回成功代码,则 kubelet 认为容器是健康存活的
如果处理程序返回失败代码,则 kubelet 会杀死这个容器并且重新启动它
任何大于或等于 200 并且小于 400 的返回代码标示成功,其它返回代码都标示失败。
可以有容器内运行服务的源码 server.go
http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
duration := time.Now().Sub(started)
if duration.Seconds() > 10 {
w.WriteHeader(500)
w.Write([]byte(fmt.Sprintf("error: %v", duration.Seconds())))
} else {
w.WriteHeader(200)
w.Write([]byte("ok"))
}
})
容器存活的最开始 10 秒中, /healthz 处理程序返回一个 200 的状态码
之后处理程序返回 500 的状态码
$ kubectl apply -f liveness.yaml
创建 pod
pod/liveness-http created
$ kubectl get po -w | grep live
监控 pod
liveness-http 0/1 ContainerCreating 0 2s
liveness-http 1/1 Running 0 5s
liveness-http 1/1 Running 1 (4s ago) 26s
liveness-http 1/1 Running 2 (3s ago) 46s
liveness-http 1/1 Running 3 (3s ago) 67s
liveness-http 1/1 Running 4 (3s ago) 88s
liveness-http 0/1 CrashLoopBackOff 4 (0s ago) 106s
liveness-http 1/1 Running 5 (56s ago) 2m42s
liveness-http 0/1 CrashLoopBackOff 5 (0s ago) 2m58s
liveness-http 1/1 Running 6 (95s ago) 4m33s
liveness-http 0/1 CrashLoopBackOff 6 (0s ago) 4m52s
- 可见一直在重启
$ kubectl describe pod liveness-http
查看 pod 详细问题
Name: liveness-http
Namespace: default
Priority: 0
Node: node1.k8s/10.211.55.11
Start Time: Thu, 18 Apr 2024 16:45:57 +0800
Labels: test=liveness
Annotations: <none>
Status: Running
IP: 10.244.1.27
IPs:
IP: 10.244.1.27
Containers:
liveness:
Container ID: docker://0bde3a8c2ab79d2fd389659354771da10edbdfd29e3bb3d5c87fcbcb44f918b1
Image: mirrorgooglecontainers/liveness
Image ID: docker-pullable://mirrorgooglecontainers/liveness@sha256:854458862be990608ad916980f9d3c552ac978ff70ceb0f90508858ec8fc4a62
Port: <none>
Host Port: <none>
Args:
/server
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 18 Apr 2024 16:47:19 +0800
Finished: Thu, 18 Apr 2024 16:47:36 +0800
Ready: False
Restart Count: 4
Liveness: http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nv6lw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-nv6lw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 114s default-scheduler Successfully assigned default/liveness-http to node1.k8s
Normal Pulled 111s kubelet Successfully pulled image "mirrorgooglecontainers/liveness" in 2.298827654s
Normal Pulled 92s kubelet Successfully pulled image "mirrorgooglecontainers/liveness" in 1.281083837s
Normal Created 74s (x3 over 111s) kubelet Created container liveness
Normal Started 74s (x3 over 111s) kubelet Started container liveness
Normal Pulled 74s kubelet Successfully pulled image "mirrorgooglecontainers/liveness" in 1.048779521s
Warning Unhealthy 58s (x9 over 100s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Killing 58s (x3 over 94s) kubelet Container liveness failed liveness probe, will be restarted
Normal Pulling 57s (x4 over 114s) kubelet Pulling image "mirrorgooglecontainers/liveness"
- 这里可以看到
Liveness probe failed: HTTP probe failed with statuscode: 500
Container liveness failed liveness probe, will be restarted
- 返回 500 之后,进行了重启
这样,通过探针的健康检查,可以判断pod的运行健康状态