k8s--容器探测

时间:2022-12-14 17:57:27

容器探测介绍

容器探测用于检测容器中的应用实例是否正常工作,是保障业务可用性的一种传统机制。如果经过探测,实例的状态不符合预期,那么 k8s 就会把该问题实例“摘除”,不承担业务流量,k8s 提供了两种探针来实现容器探测,分别是

  • liveness probes:存活性探针,用于检测应用实例当前是否处于正常运行状态,如果不是,k8s 会重启容器
  • readness probes:就绪性探针,用于检测应用实例当前是否可以接受请求,如果不能,k8s 不会转发流量

上面两种探针目前均支持三种探测方式

  • Exec命令:在容器内执行一次命令,如果命令执行的退出码为0,则认为程序正常,否则不正常
……
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
……
  • TCPSocket:将会尝试访问一个用户容器的端口,如果能够建立这条连接,则认为程序正常,否则不正常
……      
livenessProbe:
tcpSocket:
port: 8080
……
  • HTTPGet:调用容器内 Web 应用的 URL,如果返回的状态码在 200 和 399 之间,则认为程序正常,否则不正常
……
livenessProbe:
httpGet:
path: / #URI地址
port: 80 #端口号
host: 127.0.0.1 #主机地址
scheme: HTTP #支持的协议,http 或者 https
……

下面以 liveness probes 为例,做几个演示

方式一:Exec

创建 pod-liveness-exec.yaml

apiVersion: v1
kind: Pod
metadata:
name: pod-liveness-exec
namespace: zouzou
spec:
containers:
- name: nginx
image: nginx:1.14
ports:
- name: nginx-port
containerPort: 80
livenessProbe:
exec:
command: ["/bin/cat","/tmp/hello.txt"] # 执行一个查看文件的命令,是容器内部的文件,但容器内部没有这个文件

创建 pod,观察效果

# 创建 pod
[root@dce-10-6-215-215 tmp]# kubectl apply -f pod-liveness-exec.yaml
pod/pod-liveness-exec created

# 查看 pod,发现 pod 重启了 2 次
[root@dce-10-6-215-215 tmp]# kubectl get pod pod-liveness-exec -n zouzou
NAME READY STATUS RESTARTS AGE
pod-liveness-exec 1/1 Running 2 70s

# 查看 pod 的详细信息,可以看到 Liveness probe failed: /bin/cat: /tmp/hello.txt: No such file or directory
[root@dce-10-6-215-215 tmp]# kubectl describe pod pod-liveness-exec -n zouzou
Name: pod-liveness-exec
Namespace: zouzou
Priority: 0
.....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 85s default-scheduler Successfully assigned zouzou/pod-liveness-exec to dce-10-6-215-200
Normal Pulled 22s (x3 over 82s) kubelet Container image "nginx:1.14" already present on machine
Normal Created 22s (x3 over 81s) kubelet Created container nginx
Normal Killing 22s (x2 over 52s) kubelet Container nginx failed liveness probe, will be restarted
Normal Started 21s (x3 over 81s) kubelet Started container nginx
Warning Unhealthy 2s (x8 over 72s) kubelet Liveness probe failed: /bin/cat: /tmp/hello.txt: No such file or directory # 这里说了错误信息

可以看到,容器一直在重启,因为容器里面没有那个文件,接下来,我们删除这个 pod,在修改文件,改为一个正确的

kubectl delete -f pod-liveness-exec.yaml # 删除 pod

修改 pod-liveness-exec.yaml 文件,修改后的如下

apiVersion: v1
kind: Pod
metadata:
name: pod-liveness-exec
namespace: zouzou
spec:
containers:
- name: nginx
image: nginx:1.14
ports:
- name: nginx-port
containerPort: 80
livenessProbe:
exec:
command: ["/bin/ls","/tmp"] # 执行一个查看文件的命令,是容器内部的文件,这个肯定是可以的,因为 /tmp 目录存在

创建 pod,查看效果

# 创建pod
kubectl apply -f pod-liveness-exec.yaml

查看 event

# 发现没有重启
[root@dce-10-6-215-215 tmp]# kubectl get pod pod-liveness-exec -n zouzou
NAME READY STATUS RESTARTS AGE
pod-liveness-exec 1/1 Running 0 50s

# 查看 event,正常的
[root@dce-10-6-215-215 tmp]# kubectl describe pod pod-liveness-exec -n zouzou
Name: pod-liveness-exec
Namespace: zouzou
Priority: 0
Node: dce-10-6-215-200/10.6.215.200
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 61s default-scheduler Successfully assigned zouzou/pod-liveness-exec to dce-10-6-215-200
Normal Pulled 59s kubelet Container image "nginx:1.14" already present on machine
Normal Created 59s kubelet Created container nginx
Normal Started 58s kubelet Started container nginx

方式二:TCPSocket

tcpSocket 就是访问某个端口,看是不是通的

创建 pod-liveness-tcpsocket.yaml,内容如下

apiVersion: v1
kind: Pod
metadata:
name: pod-liveness-tcpsocket
namespace: zouzou
spec:
containers:
- name: nginx
image: nginx:1.14
ports:
- name: nginx-port
containerPort: 80
livenessProbe:
tcpSocket:
port: 8080 # 尝试访问8080端口,因为我们只有一个 nginx 容器,只有 80 端口是正常的,所以8080是访问不通的

创建 pod,观察效果

# 创建 pod
[root@dce-10-6-215-215 tmp]# kubectl apply -f pod-liveness-tcpsocket.yaml
pod/pod-liveness-tcpsocket created

# 查看信息,发现有 RESTART
[root@dce-10-6-215-215 tmp]# kubectl get pod pod-liveness-tcpsocket -n zouzou
NAME READY STATUS RESTARTS AGE
pod-liveness-tcpsocket 1/1 Running 1 34s

# 查看 event,报错信息说明端口是不通的
[root@dce-10-6-215-215 tmp]# kubectl describe pod pod-liveness-tcpsocket -n zouzou
Name: pod-liveness-tcpsocket
Namespace: zouzou
Priority: 0
Node: dce-10-6-215-200/10.6.215.200
Start Time: Fri, 15 Apr 2022 18:25:10 +0800
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 48s default-scheduler Successfully assigned zouzou/pod-liveness-tcpsocket to dce-10-6-215-200
Normal Pulled 18s (x2 over 45s) kubelet Container image "nginx:1.14" already present on machine
Normal Killing 18s kubelet Container nginx failed liveness probe, will be restarted
Normal Created 17s (x2 over 45s) kubelet Created container nginx
Normal Started 17s (x2 over 44s) kubelet Started container nginx
Warning Unhealthy 8s (x4 over 38s) kubelet Liveness probe failed: dial tcp 172.29.34.232:8080: connect: connection refused # 8080 端口访问失败

可以看到,容器一直在重启,因为容器里面 8080 端口是不通的。接下来,我们删除这个 pod,在修改文件,改为一个正确的 

# 删除 pod
kubectl delete -f pod-liveness-tcpsocket.yaml

修改 pod-liveness-tcpsocket.yaml,内容如下

apiVersion: v1
kind: Pod
metadata:
name: pod-liveness-tcpsocket
namespace: zouzou
spec:
containers:
- name: nginx
image: nginx:1.14
ports:
- name: nginx-port
containerPort: 80
livenessProbe:
tcpSocket:
port: 80 # 80 端口是可以正常访问的

创建 pod,查看效果

# 创建 pod
kubectl apply -f pod-liveness-tcpsocket.yaml

查看 event

# 查看 pod ,发现没有重启
[root@dce-10-6-215-215 tmp]# kubectl get pod pod-liveness-tcpsocket -n zouzou
NAME READY STATUS RESTARTS AGE
pod-liveness-tcpsocket 1/1 Running 0 93s

# 查看 event,都是正常的
[root@dce-10-6-215-215 tmp]# kubectl describe pod pod-liveness-tcpsocket -n zouzou
Name: pod-liveness-tcpsocket
Namespace: zouzou
Priority: 0
Node: dce-10-6-215-200/10.6.215.200
Start Time: Fri, 15 Apr 2022 18:30:32 +0800
.....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 100s default-scheduler Successfully assigned zouzou/pod-liveness-tcpsocket to dce-10-6-215-200
Normal Pulled 97s kubelet Container image "nginx:1.14" already present on machine
Normal Created 97s kubelet Created container nginx
Normal Started 96s kubelet Started container nginx

方式三:HTTPGet 

httpget 就是访问一个 url,看能不能访问成功

创建 pod-liveness-httpget.yaml,内容如下

apiVersion: v1
kind: Pod
metadata:
name: pod-liveness-httpget
namespace: zouzou
spec:
containers:
- name: nginx
image: nginx:1.14
ports:
- name: nginx-port
containerPort: 80
livenessProbe:
httpGet: # 其实就是访问http://127.0.0.1:80/hello,nginx 容器里没有这个地址
scheme: HTTP #支持的协议,http或者https
port: 80 #端口号
path: /hello #URI地址
  • ​host​​:连接使用的主机名,默认是 Pod 的 IP。也可以在 HTTP 头中设置 “Host” 来代替。
  • ​scheme​​ :用于设置连接主机的方式(HTTP 还是 HTTPS)。默认是 "HTTP"。
  • ​path​​:访问 HTTP 服务的路径。默认值为 "/"。
  • ​httpHeaders​​:请求中自定义的 HTTP 头。HTTP 头字段允许重复。
  • ​port​​:访问容器的端口号或者端口名。如果数字必须在 1~65535 之间。

对于 HTTP 探测,kubelet 发送一个 HTTP 请求到指定的路径和端口来执行检测。 除非 ​​httpGet​​​ 中的 ​​host​​​ 字段设置了,否则 kubelet 默认是给 Pod 的 IP 地址发送探测。 如果 ​​scheme​​​ 字段设置为了 ​​HTTPS​​​,kubelet 会跳过证书验证发送 HTTPS 请求。 大多数情况下,不需要设置​​host​​​ 字段。 这里有个需要设置 ​​host​​​ 字段的场景,假设容器监听 127.0.0.1,并且 Pod 的 ​​hostNetwork​​​ 字段设置为了 ​​true​​​。那么 ​​httpGet​​​ 中的 ​​host​​​ 字段应该设置为 127.0.0.1。 可能更常见的情况是如果 Pod 依赖虚拟主机,你不应该设置 ​​host​​​ 字段,而是应该在 ​​httpHeaders​​​ 中设置 ​​Host​​。

针对 HTTP 探针,kubelet 除了必需的 ​​Host​​​ 头部之外还发送两个请求头部字段: ​​User-Agent​​​ 和 ​​Accept​​​。这些头部的默认值分别是 ​​kube-probe/{{ skew currentVersion >}}​​​ (其中 ​​1.24​​​ 是 kubelet 的版本号)和 ​​*/*​​。

创建 pod,观察效果

# 创建 pod
[root@dce-10-6-215-215 tmp]# kubectl apply -f pod-liveness-httpget.yaml
pod/pod-liveness-httpget created

# 查看 pod,发现有 RESTARTS
[root@dce-10-6-215-215 tmp]# kubectl get pod pod-liveness-httpget -n zouzou
NAME READY STATUS RESTARTS AGE
pod-liveness-httpget 1/1 Running 1 37s

# 查看 event,发现返回的状态码为 404
[root@dce-10-6-215-215 tmp]# kubectl describe pod pod-liveness-httpget -n zouzou
Name: pod-liveness-httpget
Namespace: zouzou
Priority: 0
Node: dce-10-6-215-200/10.6.215.200
Start Time: Fri, 15 Apr 2022 18:36:18 +0800
.....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 49s default-scheduler Successfully assigned zouzou/pod-liveness-httpget to dce-10-6-215-200
Normal Killing 16s kubelet Container nginx failed liveness probe, will be restarted
Normal Pulled 15s (x2 over 46s) kubelet Container image "nginx:1.14" already present on machine
Normal Created 15s (x2 over 46s) kubelet Created container nginx
Normal Started 15s (x2 over 46s) kubelet Started container nginx
Warning Unhealthy 6s (x4 over 36s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 404 # 状态码为 404

可以看到,容器一直在重启,因为容器里面没有那个地址,接下来,我们删除这个 pod,在修改文件,改为一个正确的

# 删除 pod
kubectl delete -f pod-liveness-httpget.yaml

修改 pod-liveness-httpget.yaml 文件,修改后的如下

apiVersion: v1
kind: Pod
metadata:
name: pod-liveness-httpget
namespace: zouzou
spec:
containers:
- name: nginx
image: nginx:1.14
ports:
- name: nginx-port
containerPort: 80
livenessProbe:
httpGet: # 其实就是访问http://127.0.0.1:80/,nginx 容器里可以访问这个地址
scheme: HTTP #支持的协议,http或者https
port: 80 #端口号
path: / #URI地址

创建 pod,查看效果

# 创建 pod
[root@dce-10-6-215-215 tmp]# kubectl apply -f pod-liveness-httpget.yaml
pod/pod-liveness-httpget created

# 查看 pod,没有重启
[root@dce-10-6-215-215 tmp]# kubectl get pod pod-liveness-httpget -n zouzou
NAME READY STATUS RESTARTS AGE
pod-liveness-httpget 1/1 Running 0 220s

# 查看 event,正常启动的
[root@dce-10-6-215-215 tmp]# kubectl describe pod pod-liveness-httpget -n zouzou
Name: pod-liveness-httpget
Namespace: zouzou
Priority: 0
Node: dce-10-6-215-200/10.6.215.200
Start Time: Fri, 15 Apr 2022 18:41:35 +0800
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27s default-scheduler Successfully assigned zouzou/pod-liveness-httpget to dce-10-6-215-200
Normal Pulled 24s kubelet Container image "nginx:1.14" already present on machine
Normal Created 24s kubelet Created container nginx
Normal Started 23s kubelet Started container nginx

其他参数配置

至此,已经使用 liveness Probe 演示了三种探测方式,但是查看 livenessProbe 的子属性,会发现除了这三种方式,还有一些其他的配置

[root@dce-10-6-215-215 tmp]# kubectl explain pod.spec.containers.livenessProbe
FIELDS:
exec <Object>
tcpSocket <Object>
httpGet <Object>
initialDelaySeconds <integer> # 容器启动后等待多少秒执行第一次探测
timeoutSeconds <integer> # 探测超时时间。默认1秒,最小1秒
periodSeconds <integer> # 执行探测的频率。默认是10秒,最小1秒
failureThreshold <integer> # 连续探测失败多少次才被认定为失败。默认是3。最小值是1
successThreshold <integer> # 连续探测成功多少次才被认定为成功。默认是1

注意:如果设置了探针等待多少秒执行一次的话(initialDelaySeconds),如果还没到这个时间,k8s 就默认这个 pod 是失败的,只有等到了时间,并且探针通过的话,才认为这个 pod 是成功的