Kubernetes网络性能测试-calico插件环境

Kubernetes 网络性能测试-calico插件环境

本次主要针对calico网络插件的性能进行摸底。

1. 测试准备

1.1 测试环境

测试环境为VMware Workstation虚拟机搭建的一套K8S环境，版本为1.28.2，网络插件使用calico，版本为3.28.0。

hostname	ip	配置	备注
master	192.168.0.61	2C4G	master
node1	192.168.0.62	2C4G	worker
node2	192.168.0.63	2C4G	worker

已经部署测试应用sample-webapp，三个副本：

root@master1:~# kubectl get svc
NAME            TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)        AGE
kubernetes      ClusterIP      10.96.0.1        <none>          443/TCP        15d
sample-webapp   ClusterIP      10.106.172.112   <none>          8000/TCP       47s
root@master1:~# kubectl get pod
NAME                            READY   STATUS    RESTARTS   AGE
sample-webapp-77745d4db-57rbx   1/1     Running   0          50s
sample-webapp-77745d4db-vprjc   1/1     Running   0          50s
sample-webapp-77745d4db-wltzl   1/1     Running   0          12s

备注：

本文测试环境，master节点取消了NoSchedule的Taints，使得master节点也可以调度业务pod。

测试用到的demo程序可以访问以下仓库获取：https://gitee.com/lldhsds/distributed-load-testing-using-k8s.git，仅部署sample-webapp即可。当然也可以使用其他应用进行测试，比如nginx等。

1.2 测试场景

Kubernetes 集群 node 节点上通过 Cluster IP 方式访问
Kubernetes 集群内部通过 service 访问
Kubernetes 集群外部通过 metalLB暴露的地址访问

1.3 测试工具

curl
测试程序：sample-webapp，源码见 Github kubernetes 的分布式负载测试。

1.4 测试说明

通过向 sample-webapp 发送 curl 请求获取响应时间，直接 curl 后的结果为：

root@master1:~# curl "http://10.106.172.112:8000"
Welcome to the "Distributed Load Testing Using Kubernetes" sample web app

2. 网络延迟测试

2.1 Kubernetes 集群 node 节点上通过 Cluster IP 访问

测试命令

curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}' "http://10.106.172.112:8000"

测试10组数据，取平均结果：

[root@k8s-node1 ~]# echo "time_connect  time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://10.106.172.112:8000"; done
time_connect  time_starttransfer time_total
0.000362 0.001538 0.001681
0.000456 0.002825 0.003137
0.000744 0.003152 0.003394
0.000388 0.001750 0.001847
0.000617 0.002405 0.003527
0.000656 0.002153 0.002390
0.000204 0.001597 0.001735
0.000568 0.002759 0.003117
0.000826 0.002278 0.002463
0.000582 0.002062 0.002120

平均响应时间：2.541 ms

指标说明：

time_connect：建立到服务器的 TCP 连接所用的时间
time_starttransfer：在发出请求之后，Web 服务器返回数据的第一个字节所用的时间
time_total：完成请求所用的时间

2.2 Kubernetes 集群内部通过 service 访问

# 进入测试的客户端
root@master1:~# kubectl exec -it test-78bcb7b45d-t9mvw -- /bin/bash
# 执行测试
test-78bcb7b45d-t9mvw:/# echo "time_connect  time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://sample-webapp:8000/"; done
time_connect  time_starttransfer time_total
0.001359 0.004808 0.005001
0.001470 0.003394 0.003508
0.001888 0.003702 0.004018
0.001630 0.003267 0.003622
0.001621 0.003235 0.003491
0.001008 0.002891 0.002951
0.001508 0.003137 0.003313
0.004427 0.006035 0.006293
0.001512 0.002924 0.002962
0.001649 0.003774 0.003981

平均响应时间：约3.914 ms

2.3 在外部通过ingress 访问

本文使用LB service类型，也可以将service改为NodePort类型进行测试。

root@master1:~# kubectl get svc
NAME            TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)          AGE
kubernetes      ClusterIP      10.96.0.1        <none>          443/TCP          15d
sample-webapp   LoadBalancer   10.98.247.190    192.168.0.242   80:31401/TCP     6m16s

在外部的客户端添加域名解析sample-webapp.test.com 192.168.0.242，进行访问测试：

$ echo "time_connect  time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://sample-webapp.test.com/"; done
time_connect  time_starttransfer time_total
0.015586 0.017507 0.028004
0.039582 0.042034 0.052532
0.006702 0.008856 0.019461
0.041783 0.046729 0.057053
0.005912 0.008222 0.019276
0.006669 0.009513 0.020058
0.006592 0.010141 0.020598
0.011113 0.014444 0.026173
0.006471 0.008854 0.019571
0.041217 0.043651 0.055667

平均响应时间：31.839 ms

2.4 测试结果

在这三种场景下的响应时间测试结果如下：

Kubernetes 集群 node 节点上通过 Cluster IP 方式访问：2.541 ms
Kubernetes 集群内部通过 service 访问：3.914 ms
Kubernetes 集群外部通过 traefik ingress 暴露的地址访问：31.839 ms

说明：

执行测试的 node 节点 / Pod 与 serivce 所在的 pod 的距离（是否在同一台主机上），对前两个场景可能会有一定影响。

测试结果仅作参考，与具体的资源配置、网络环境等因素有关系。

3. 网络性能测试

网络使用 flannel 的 vxlan 模式，使用 iperf 进行测试。

服务端命令：

iperf3 -s -p 12345 -i 1

客户端命令：

iperf3 -c ${server-ip} -p 12345 -i 1 -t 10

3.1 node节点之间

# node1启动iperf服务端
[root@node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345

# node2启动iperf客户端测试
root@node2:~# iperf3 -c 192.168.0.62 -p 12345 -i 1 -t 60
Connecting to host 192.168.0.62, port 12345
[  5] local 192.168.0.63 port 40948 connected to 192.168.0.62 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   325 MBytes  2.72 Gbits/sec  107   1.53 MBytes
...
[  5]  59.00-60.00  sec   339 MBytes  2.84 Gbits/sec    2   1.29 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  19.9 GBytes  2.85 Gbits/sec  421             sender
[  5]   0.00-60.00  sec  19.9 GBytes  2.85 Gbits/sec                  receiver

3.2 不同node的 Pod 之间（Native Routing-BGP模式）

查看calico当前的网络模式，也是calico默认的配置：

root@master1:~# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 192.168.0.62 | node-to-node mesh | up    | 04:06:45 | Established |
| 192.168.0.63 | node-to-node mesh | up    | 04:06:39 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

root@master1:~# calicoctl get ipPool default-ipv4-ippool -o wide
NAME                  CIDR            NAT    IPIPMODE   VXLANMODE     DISABLED   DISABLEBGPEXPORT   SELECTOR
default-ipv4-ippool   10.244.0.0/16   true   Never      CrossSubnet   false      false              all()

# 查看节点路由表，跨节点pod通信走节点网卡（ens33）直接转发
root@master1:~# ip route
default via 192.168.0.1 dev ens33 proto static
10.244.44.0/24 via 192.168.0.63 dev ens33 proto 80 onlink
10.244.154.0/24 via 192.168.0.62 dev ens33 proto 80 onlink
...

当前的网络模式为bgp，VXLANMODE为CrossSubnet，只有在跨子网（node节点）转发时才进行Vxlan封装，同子网（node节点）之间进行三层路由转发。

备注：

上面的配置为calico的默认配置，同子网（node节点）情况下，pod之间通信直接通过node节点的路由表进行转发。这点和flannel的host-gw模式是类似的。只有在跨子网（node节点）时，才会走vxlan隧道封装解封装。

calico 也支持同子网node之间pod通信走vxlan隧道的通信方式，将上面的参数修改，vxlanMode: Always，由于overlay封装，性能会有损耗，这点和flannel的vxlan模式是类似的。

详情参见：Overlay networking | Calico Documentation (tigera.io)

测试不同node节点pod之间的网络性能：

# 部署测试用pod，分布于两个node节点
[root@k8s-master ~]# cat test-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  replicas: 3
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: test
        image: lldhsds/alpine:perf-20240717
        command: ["sleep", "3600"]

[root@master1 ~]# kubectl get pod -o wide
NAME               READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
test-78bcb7b45d-mpfgn   1/1   Running     0     50s   10.244.44.136    node2     <none>           <none>
test-78bcb7b45d-t9mvw   1/1   Running     0     84m   10.244.154.221   node1     <none>           <none>
...

# 在node2的pod中启动iperf服务端
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -s -p 12345 -i 1

# 在node1的pod中启动iperf客户端测试
root@master1:~# kubectl exec -it test-78bcb7b45d-t9mvw -- /bin/bash
test-78bcb7b45d-t9mvw:/# iperf3 -c 10.244.44.136 -p 12345 -i 1 -t 60
Connecting to host 10.244.44.136, port 12345
[  5] local 10.244.154.221 port 39024 connected to 10.244.44.136 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   351 MBytes  2.94 Gbits/sec  1565   1.55 MBytes
...
[  5]  59.00-60.00  sec   347 MBytes  2.91 Gbits/sec   25   1.33 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  19.2 GBytes  2.75 Gbits/sec  4355             sender
[  5]   0.00-60.01  sec  19.2 GBytes  2.75 Gbits/sec                  receiver

说明：

测试镜像已经上传值docker hub，可以通过命令拉取：docker pull lldhsds/alpine:perf-20240717

3.3 Node 与不同node的 Pod 之间（Native Routing-BGP模式）

# node1启动iperf服务端
[root@node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345

# 在node2的pod中启动iperf客户端测试
[root@k8s-master ~]# kubectl get pod -o wide
NAME                 READY   STATUS RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
test-78bcb7b45d-mpfgn  1/1   Running    0    40m     10.244.44.136    node2     <none>           <none>
test-78bcb7b45d-t9mvw  1/1   Running    0    124m    10.244.154.221   node1     <none>           <none>
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -c 192.168.0.62 -p 12345 -i 1 -t 60
Connecting to host 192.168.0.62, port 12345
[  5] local 10.244.44.136 port 32770 connected to 192.168.0.62 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   328 MBytes  2.75 Gbits/sec  771   1.48 MBytes
...
[  5]  59.00-60.00  sec   326 MBytes  2.73 Gbits/sec  191   1.14 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  19.4 GBytes  2.78 Gbits/sec  4237             sender
[  5]   0.00-60.00  sec  19.4 GBytes  2.78 Gbits/sec                  receiver

3.4 不同node的 Pod 之间（VXLAN）

修改calico bgp的VXLANMODE为always，当pod跨节点（同子网）通信时，使用vxlan封装解封装。

# 查看calico的ipv4地址池
root@master1:~# calicoctl get ippool
NAME                  CIDR            SELECTOR
default-ipv4-ippool   10.244.0.0/16   all()

# 到处默认ipv4地址池的配置
root@master1:~# calicoctl  get ippool default-ipv4-ippool -o yaml > ippool.yaml

# 修改vxlanMode，默认为CrossSubnet
root@master1:~# vi ippool.yaml	# 
...
  vxlanMode: Always
  
# 应用修改
root@master1:~# calicoctl apply -f ippool.yaml
Successfully applied 1 'IPPool' resource(s)

root@master1:~# calicoctl  get ippool default-ipv4-ippool -o wide
NAME                  CIDR            NAT    IPIPMODE   VXLANMODE   DISABLED   DISABLEBGPEXPORT   SELECTOR
default-ipv4-ippool   10.244.0.0/16   true   Never      Always      false      false              all()

# 这个时候查看node节点的路由表，当pod进行跨节点通信时，会走vxlan.calico进行封装
root@master1:~#  ip route
default via 192.168.0.1 dev ens33 proto static
10.244.44.0/24 via 10.244.44.0 dev vxlan.calico onlink
10.244.154.0/24 via 10.244.154.0 dev vxlan.calico onlink
.。。

进行pod之间的带宽测试：

# 在node1的pod中启动iperf服务端
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -s -p 12345 -i 1

# 在node2的pod中启动iperf客户端测试
test-78bcb7b45d-t9mvw:/# iperf3 -c 10.244.44.136 -p 12345 -i 1 -t 60
Connecting to host 10.244.44.136, port 12345
[  5] local 10.244.154.221 port 59994 connected to 10.244.44.136 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   123 MBytes  1.03 Gbits/sec  196    793 KBytes
[  5]   1.00-2.00   sec   120 MBytes  1.00 Gbits/sec   73    896 KBytes
...
[  5]  59.00-60.00  sec   172 MBytes  1.45 Gbits/sec    0   2.30 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  6.83 GBytes   978 Mbits/sec  1650             sender
[  5]   0.00-60.02  sec  6.83 GBytes   977 Mbits/sec                  receiver

3.5 Node 与不同node的 Pod 之间（VXLAN）

# node1启动iperf服务端
[root@node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345

# 在node2的pod中启动iperf客户端测试
[root@k8s-master ~]# kubectl get pod -o wide
NAME                 READY   STATUS RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
test-78bcb7b45d-mpfgn  1/1   Running    0    40m     10.244.44.136    node2     <none>           <none>
test-78bcb7b45d-t9mvw  1/1   Running    0    124m    10.244.154.221   node1     <none>           <none>
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -c 192.168.0.62 -p 12345 -i 1 -t 60
Connecting to host 192.168.0.62, port 12345
[  5] local 10.244.44.136 port 51004 connected to 192.168.0.62 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   346 MBytes  2.89 Gbits/sec  992   1.46 MBytes
[  5]   1.00-2.00   sec   350 MBytes  2.94 Gbits/sec  347   1.61 MBytes
...
[  5]  59.00-60.00  sec   314 MBytes  2.63 Gbits/sec    0   1.49 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  18.6 GBytes  2.67 Gbits/sec  4278             sender
[  5]   0.00-60.01  sec  18.6 GBytes  2.67 Gbits/sec                  receiver

3.6 不同node的Pod之间（IP-IP模式）

修改为IP-IP模式：

root@master1:~# calicoctl  get ippool default-ipv4-ippool -o yaml > ippool.yaml

# 修改ipipMode，同时删除VXLANMode = 'CrossSubnet'
root@master1:~# vi ippool.yaml
...
  ipipMode: Always
...
# 应用修改
root@master1:~# calicoctl apply -f ippool.yaml
Successfully applied 1 'IPPool' resource(s)

# 查看节点路由表此时流量转发通过tunl0设备出去
root@master1:~# ip route
default via 192.168.0.1 dev ens33 proto static
10.244.44.0/24 via 192.168.0.63 dev tunl0 proto bird onlink
10.244.154.0/24 via 192.168.0.62 dev tunl0 proto bird onlink
blackhole 10.244.161.0/24 proto bird

进行pod之间的带宽测试：

# 在node1的pod中启动iperf服务端
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -s -p 12345 -i 1

# 在node2的pod中启动iperf客户端测试
test-78bcb7b45d-t9mvw:/# iperf3 -c 10.244.44.136 -p 12345 -i 1 -t 60
Connecting to host 10.244.44.136, port 12345
[  5] local 10.244.154.221 port 59994 connected to 10.244.44.136 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   163 MBytes  1.37 Gbits/sec  119   1.04 MBytes
[  5]   1.00-2.00   sec   129 MBytes  1.08 Gbits/sec  128   1.12 MBytes
...
[  5]  58.00-59.00  sec   172 MBytes  1.44 Gbits/sec    0   2.92 MBytes
[  5]  59.00-60.00  sec   161 MBytes  1.35 Gbits/sec    0   3.13 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  8.60 GBytes  1.23 Gbits/sec  1256             sender
[  5]   0.00-60.02  sec  8.60 GBytes  1.23 Gbits/sec                  receiver

3. 7 Node 与不同node的 Pod 之间（IP-IP）

# node1启动iperf服务端
[root@node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345

# 在node2的pod中启动iperf客户端测试
[root@k8s-master ~]# kubectl get pod -o wide
NAME                 READY   STATUS RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
test-78bcb7b45d-mpfgn  1/1   Running    0    40m     10.244.44.136    node2     <none>           <none>
test-78bcb7b45d-t9mvw  1/1   Running    0    124m    10.244.154.221   node1     <none>           <none>
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -c 192.168.0.62 -p 12345 -i 1 -t 60
Connecting to host 192.168.0.62, port 12345
[  5] local 10.244.44.136 port 51004 connected to 192.168.0.62 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   407 MBytes  3.41 Gbits/sec  506    692 KBytes
[  5]   1.00-2.00   sec   336 MBytes  2.82 Gbits/sec  577    949 KBytes
[  5]   2.00-3.00   sec   310 MBytes  2.60 Gbits/sec   92   1.13 MBytes
...
[  5]  58.00-59.00  sec   323 MBytes  2.71 Gbits/sec    3   1.28 MBytes
[  5]  59.00-60.00  sec   329 MBytes  2.76 Gbits/sec    0   1.44 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  19.4 GBytes  2.78 Gbits/sec  3960             sender
[  5]   0.00-60.00  sec  19.4 GBytes  2.77 Gbits/sec                  receiver

4. 总结

4.1 带宽数据对比

场景	calio网络模式	带宽（Gbits/sec）
不同node之间	不涉及	2.85
不同node的pod之间	Native Routing (BGP) 模式	2.75
node与不同node的pod之间	Native Routing (BGP) 模式	2.78
不同node的pod之间	VXLAN	0.98
node与不同node的pod之间	VXLAN	2.67
不同node的pod之间	IP-IP	1.23
node与不同node的pod之间	IP-IP	2.77

从上述数据得出结论：

calico的BGP模式网络性能（pod之间）相比宿主机直接互联的损耗在5%左右，基本接近与node节点之间的网络性能，考虑浮动和误差，基本可以认为网络没有损耗。从转发路径分析，相比于节点直接转发，pod转发只是多了连接pod和节点的veth-pair。
VXLAN模式和IP-IP模式下，pod之间性能损耗高于50%，性能数据较差，这两种模式下报文都经过了封装、解封装。相对来说，IP-IP模式下的性能要好与VXLAN模式，从底层报文封装角度讲，ip-ip要比vxlan开销小，这样也能解释的通。
Node与不同node的Pod之间测试来看，性能损耗比较少。

说明：

测试过程中也出现过pod之间带宽高于node节点之间的情况，数据有一定的波动。本文使用的是笔记本下创建的虚拟机测试环境，虚拟机资源存在争抢，对于测试结果有一定的影响。

VXLAN、IP-IP模式下损耗大于50%，不代表一般的经验数据，可能与具体的环境有关系，此处仅作参考。

关于calico网络相关的进一步说明请参考：Overlay networking | Calico Documentation (tigera.io)。

4.2 综合对比

本文使用的测试环境节点都在同子网下，只对比了有封装和直接转发两种场景的性能数据。

关于跨子网的场景有待进一步摸索，下面是一些汇总的说明：

模式	网络模式	描述	性能	适用场景
BGP	underlay	使用 BGP 在节点间传播路由，不进行任何封装	损耗最低	物理网络支持 BGP，节点可直接通信。首选该模式。
IPIP-CrossSubnet	underlay+overlay	节点跨子网使用封装，同子网内直接通信	同子网内基本无损耗，跨子网损耗较多	物理网络不支持 BGP，节点大多在同子网下
IPIP-Always	overaly	不管节点是否跨子网，都使用 IPIP 封装进行跨节点通信	损耗较多，优于vxlan	物理网络不支持 BGP，节点跨多个子网的集群
VXLAN-CrossSubnet	underlay+overaly	节点跨子网使用封装，同子网内直接通信	同子网内基本无损耗，跨子网损耗较多	物理网络不支持 BGP，节点大多在同子网下
VXLAN-Always	overaly	不管节点是否跨子网，都使用 VXLAN 封装进行跨节点通信	损耗较多	物理网络不支持 BGP，节点跨多个子网的集群

秒客网

Kubernetes网络性能测试-calico插件环境