K8s 系列(三) - 如何配置 etcd https 证书?

时间:2023-02-04 15:31:50

在 K8s 中,kube-apiserver 使用 etcd 对 REST object 资源进行持久化存储,本文介绍如何配置生成自签 https 证书,搭建 etcd 集群给 apiserver 使用,并附相关坑点记录。

1. 安装 cfssl 工具

cd /data/work

wget https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssl_1.6.0_linux_amd64 -O cfssl
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssljson_1.6.0_linux_amd64 -O cfssljson
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssl-certinfo_1.6.0_linux_amd64 -O cfssl-certinfo chmod +x cfssl*
mv cfssl* /usr/local/bin/ chmod +x cfssl*
mv cfssl_linux-amd64 /usr/local/bin/cfssl
mv cfssljson_linux-amd64 /usr/local/bin/cfssljson
mv cfssl-certinfo_linux-amd64 /usr/local/bin/cfssl-certinfo

2. 创建 ca 证书

cat > ca-csr.json <<EOF
{
"CN": "etcd-ca",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Beijing",
"L": "Beijing",
"O": "etcd-ca",
"OU": "etcd-ca"
}
],
"ca": {
"expiry": "87600h"
}
}
EOF cfssl gencert -initca ca-csr.json | cfssljson -bare ca => 会生成:ca-key.pem, ca.csr, ca.pem

3. 配置 ca 证书策略

cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"etcd-ca": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
}
}
}
}
EOF

4. 配置 etcd 请求 csr

cat > etcd-csr.json <<EOF
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"etcd0-0.etcd",
"etcd1-0.etcd",
"etcd2-0.etcd"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [{
"C": "CN",
"ST": "Beijing",
"L": "Beijing",
"O": "etcd",
"OU": "etcd"
}]
}
EOF

5. 生成 etcd 证书

cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd-ca etcd-csr.json | cfssljson  -bare etcd

=> 会生成:etcd-key.pem, etcd.csr, etcd.pem

mv etcd.pem etcd-server.pem
mv etcd-key.pem etcd-server-key.pem

6. 创建 etcd cluster

yaml 文件:https://github.com/k8s-club/etcd-operator

kubectl apply -f etcd-cluster.yaml

7. 查看 DNS 解析

dnsutils 安装:https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

kubectl exec -it -n etcd dnsutils -- nslookup etcd

Server:         9.165.x.x
Address: 9.165.x.x#53 Name: etcd.etcd.svc.cluster.local
Address: 9.165.x.x
Name: etcd.etcd.svc.cluster.local
Address: 9.165.x.x
Name: etcd.etcd.svc.cluster.local
Address: 9.165.x.x

8. 查看 etcd 集群状态

kubectl exec -it -n etcd etcd0-0 -- sh

/usr/local/bin/etcdctl --write-out=table --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd-server.pem --key=/etc/etcd/ssl/etcd-server-key.pem --endpoints=https://etcd0-0.etcd:2379,https://etcd1-0.etcd:2379,https://etcd2-0.etcd:2379 endpoint health

+---------------------------+--------+-------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+---------------------------+--------+-------------+-------+
| https://etcd0-0.etcd:2379 | true | 13.551982ms | |
| https://etcd1-0.etcd:2379 | true | 13.540498ms | |
| https://etcd2-0.etcd:2379 | true | 23.119639ms | |
+---------------------------+--------+-------------+-------+ /usr/local/bin/etcdctl --write-out=table --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd-server.pem --key=/etc/etcd/ssl/etcd-server-key.pem --endpoints=https://etcd0-0.etcd:2379,https://etcd1-0.etcd:2379,https://etcd2-0.etcd:2379 endpoint status +--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://etcd0-0.etcd:2379 | 4dde210279eea33a | 3.4.13 | 20 kB | true | false | 2 | 9 | 9 | |
| http://etcd1-0.etcd:2379 | 20669865d12a473b | 3.4.13 | 20 kB | false | false | 2 | 9 | 9 | |
| http://etcd2-0.etcd:2379 | 3f17922d1ed63113 | 3.4.13 | 20 kB | false | false | 2 | 9 | 9 | |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

9. 验证 etcd 读写

kubectl exec -it -n etcd etcd0-0 -- sh

/usr/local/bin/etcdctl --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd-server.pem --key=/etc/etcd/ssl/etcd-server-key.pem put hello world
OK /usr/local/bin/etcdctl --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd-server.pem --key=/etc/etcd/ssl/etcd-server-key.pem get hello
hello
world 查看所有 keys:
/usr/local/bin/etcdctl --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd-server.pem --key=/etc/etcd/ssl/etcd-server-key.pem get "" --keys-only --prefix
hello 查看所有 key-val:
/usr/local/bin/etcdctl --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd-server.pem --key=/etc/etcd/ssl/etcd-server-key.pem get "" --prefix
hello
world

10. 配置 apiserver 请求 csr

cat > apiserver-csr.json <<EOF
{
"CN": "apiserver",
"hosts": [
"*.etcd"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [{
"C": "CN",
"ST": "Beijing",
"L": "Beijing",
"O": "apiserver",
"OU": "apiserver"
}]
}
EOF

11. 生成 apiserver 证书

cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd-ca apiserver-csr.json | cfssljson  -bare apiserver

=> 会生成:apiserver-key.pem, apiserver.csr, apiserver.pem

mv apiserver.pem etcd-client-apiserver.pem
mv apiserver-key.pem etcd-client-apiserver-key.pem

12. 创建 extension-apiserver

apiserver.yaml:通过 ConfigMap 将生成的 *.pem 证书挂载给 apiserver 使用

...
containers:
- image: apiserver.xxxxx:latest
args:
- --etcd-servers=https://etcd0-0.etcd:2379
- --etcd-cafile=/etc/kubernetes/certs/ca.pem
- --etcd-certfile=/etc/kubernetes/certs/etcd-client-apiserver.pem
- --etcd-keyfile=/etc/kubernetes/certs/etcd-client-apiserver-key.pem
...
kubectl apply -f apiserver.yaml

13. 坑点记录

13.1 证书 hosts 不对

log:
etcd0-0:
{"level":"warn","ts":"2021-08-19T11:55:07.755Z","caller":"embed/config_logging.go:279","msg":"rejected connection","remote-addr":"127.0.0.1:41226","server-name":"","error":"tls: first record does not look like a TLS handshake"} etcd1-0:
{"level":"info","ts":"2021-08-19T11:54:16.830Z","caller":"embed/serve.go:191","msg":"serving client traffic securely","address":"[::]:2379"}
{"level":"info","ts":"2021-08-19T11:54:16.838Z","caller":"etcdserver/server.go:716","msg":"initialized peer connections; fast-forwarding election ticks","local-member-id":"30dd90df9a304e97","forward-ticks":18,"forward-duration":"4.5s","election-ticks":20,"election-timeout":"5s","active-remote-members":2}
{"level":"info","ts":"2021-08-19T11:54:16.867Z","caller":"membership/cluster.go:558","msg":"set initial cluster version","cluster-id":"80c7f1f6c2848777","local-member-id":"30dd90df9a304e97","cluster-version":"3.4"}
{"level":"info","ts":"2021-08-19T11:54:16.867Z","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"} etcd2-0:
{"level":"warn","ts":"2021-08-19T11:54:17.782Z","caller":"embed/config_logging.go:270","msg":"rejected connection","remote-addr":"9.165.x.x:40180","server-name":"etcd2-0.etcd","ip-addresses":["0.0.0.0","127.0.0.1"],"dns-names":["etcd0-0.etcd","etcd1-0.etcd","etcd2-0.etcd"],"error":"tls: \"9.165.x.x\" does not match any of DNSNames [\"etcd0-0.etcd\" \"etcd1-0.etcd\" \"etcd2-0.etcd\"] (lookup etcd1-0.etcd on 9.165.x.x:53: no such host)"}

解决:重新配置正确的 hosts 域名

13.2 证书 hosts 配置坑点

"hosts": [
"127.0.0.1",
"etcd0-0.etcd",
"*.etcd" // 允许 * 泛域名,但不能为空 "" 或 *
],

13.3 dns 设置参考

推荐设置 *.xxx.ns.svc,这样扩容后也不需要重签证书

参考:https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/

Go 代码参考如下:

func genEtcdWildcardDnsName(namespace, serviceName string) []string {
return []string{
fmt.Sprintf("%s.%s.%s", serviceName, namespace, "svc"),
fmt.Sprintf("*.%s.%s.%s", serviceName, namespace, "svc"),
fmt.Sprintf("%s.%s.%s", serviceName, namespace, DnsBase),
fmt.Sprintf("*.%s.%s.%s", serviceName, namespace, DnsBase),
}
}

13.4 leader/follower 已经建立成功了,但访问报错

# /usr/local/bin/etcdctl put hello world
{"level":"warn","ts":"2021-08-19T12:32:11.200Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-05ed1825-e70f-492a-af94-03c633d0affc/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection closed"}
Error: context deadline exceeded

解决:etcdctl 需要带证书访问

/usr/local/bin/etcdctl --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd-server.pem --key=/etc/etcd/ssl/etcd-server-key.pem put hello world

13.5 http 与 https 之间不能切换

先通过 http 建立了 cluster,然后再用自签证书 https 来建立,这样就会报错:

tls: first record does not look like a TLS handshake

经过验证:无论是从 http => https,还是从 https => http 的切换都会报这个错,因为一旦建立 cluster 成功,则把连接的协议(http/https) 写入到 etcd 存储里了,不能再更改连接协议。

解决:如果真正遇到需要切换协议,可尝试下面方式

  • 允许删除数据:删除后重新建立 cluster
  • 不允许删数据:可以尝试采用 snapshot & restore 进行快照与恢复操作

13.6 apiserver 可直接使用第 5 步生成的 etcd 证书吗?

经过验证,是可以直接使用 etcd 证书的,但生产上不建议这样使用。

生产上建议对 apiserver(或其他应用) 单独生成证书,可使用泛域名(*.xx.xx)、不同过期时间等方式灵活配置,也更有利于集群管控。