k8s系列---资源指标API及自定义指标API

时间:2023-10-09 10:36:01

 https://www.linuxea.com/2112.html

以前是用heapster来收集资源指标才能看,现在heapster要废弃了。

从k8s v1.8开始后,引入了新的功能,即把资源指标引入api。

资源指标:metrics-server

自定义指标: prometheus,k8s-prometheus-adapter

因此,新一代架构:

1) 核心指标流水线:由kubelet、metrics-server以及由API server提供的api组成;cpu累计利用率、内存实时利用率、pod的资源占用率及容器的磁盘占用率

2) 监控流水线:用于从系统收集各种指标数据并提供终端用户、存储系统以及HPA,他们包含核心指标以及许多非核心指标。非核心指标不能被k8s所解析。

metrics-server是个api server,仅仅收集cpu利用率、内存利用率等。

[root@master ~]# kubectl api-versions
admissionregistration.k8s.io/v1beta1
apiextensions.k8s.io/v1beta1
apiregistration.k8s.io/v1
apiregistration.k8s.io/v1beta1
apps/v1
apps/v1beta1
apps/v1beta2
authentication.k8s.io/v1
authentication.k8s.io/v1beta1
authorization.k8s.io/v1

  

访问 https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server  获取yaml文件,但这个里面的yaml文件更新了。和视频内的有差别

贴出我修改后的yaml文件,留作备用

[root@master metrics-server]# cat auth-delegator.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metrics-server:system:auth-delegator
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system

cat auth-delegator.yaml

[root@master metrics-server]# cat auth-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: metrics-server-auth-reader
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system

auth-reader.yaml

[root@master metrics-server]# cat metrics-apiservice.yaml
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100

metrics-apiservice.yaml

关键是这个文件

[root@master metrics-server]# cat metrics-server-deployment.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: ConfigMap
metadata:
name: metrics-server-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
NannyConfiguration: |-
apiVersion: nannyconfig/v1alpha1
kind: NannyConfiguration
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server-v0.3.1
namespace: kube-system
labels:
k8s-app: metrics-server
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v0.3.1
spec:
selector:
matchLabels:
k8s-app: metrics-server
version: v0.3.1
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
version: v0.3.1
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
containers:
- name: metrics-server
image: mirrorgooglecontainers/metrics-server-amd64:v0.3.1
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
# These are needed for GKE, which doesn't support secure communication yet.
# Remove these lines for non-GKE clusters, and when GKE supports token-based auth.
#- --kubelet-port=10250
#- --deprecated-kubelet-completely-insecure=true ports:
- containerPort: 443
name: https
protocol: TCP
- name: metrics-server-nanny
image: mirrorgooglecontainers/addon-resizer:1.8.4
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 5m
memory: 50Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: metrics-server-config-volume
mountPath: /etc/config
command:
- /pod_nanny
- --config-dir=/etc/config
- --cpu=100m
- --extra-cpu=0.5m
- --memory=100Mi
- --extra-memory=50Mi
- --threshold=5
- --deployment=metrics-server-v0.3.1
- --container=metrics-server
- --poll-period=300000
- --estimator=exponential
# Specifies the smallest cluster (defined in number of nodes)
# # resources will be scaled to.
- --minClusterSize=10 volumes:
- name: metrics-server-config-volume
configMap:
name: metrics-server-config
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"

metrics-server-deployment.yaml

[root@master metrics-server]# cat metrics-server-service.yaml
apiVersion: v1
kind: Service
metadata:
name: metrics-server
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "Metrics-server"
spec:
selector:
k8s-app: metrics-server
ports:
- port: 443
protocol: TCP
targetPort: https

metrics-server-service.yaml

[root@master metrics-server]# cat metrics-server-service.yaml
apiVersion: v1
kind: Service
metadata:
name: metrics-server
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "Metrics-server"
spec:
selector:
k8s-app: metrics-server
ports:
- port: 443
protocol: TCP
targetPort: https
[root@master metrics-server]# cat resource-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- namespaces
- nodes/stats
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- deployments
verbs:
- get
- list
- update
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:metrics-server
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system

metrics-server-service.yaml

如果从github上下载以上文件apply出错,就用上面的metrics-server-deployment.yaml文件,删掉重新apply一下就可以了

[root@master metrics-server]# kubectl apply -f ./

  

[root@master ~]#  kubectl proxy --port=8080

  

确保metrics-server-v0.3.1-76b796b-4xgvp是running状态,我当时出现了Error发现是yaml里面有问题,最后该掉running了,该来该去该到上面的最终版

[root@master metrics-server]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
canal-mgbc2 3/3 Running 12 3d23h
canal-s4xgb 3/3 Running 23 3d23h
canal-z98bc 3/3 Running 15 3d23h
coredns-78d4cf999f-5shdq 1/1 Running 0 6m4s
coredns-78d4cf999f-xj5pj 1/1 Running 0 5m53s
etcd-master 1/1 Running 13 17d
kube-apiserver-master 1/1 Running 13 17d
kube-controller-manager-master 1/1 Running 19 17d
kube-flannel-ds-amd64-8xkfn 1/1 Running 0 <invalid>
kube-flannel-ds-amd64-t7jpc 1/1 Running 0 <invalid>
kube-flannel-ds-amd64-vlbjz 1/1 Running 0 <invalid>
kube-proxy-ggcbf 1/1 Running 11 17d
kube-proxy-jxksd 1/1 Running 11 17d
kube-proxy-nkkpc 1/1 Running 12 17d
kube-scheduler-master 1/1 Running 19 17d
kubernetes-dashboard-76479d66bb-zr4dd 1/1 Running 0 <invalid>
metrics-server-v0.3.1-76b796b-4xgvp 2/2 Running 0 9s

  

查看出错日志 -c指定容器名,该pod内有两个容器,metrcis-server只是其中一个,另一个查询方法一样,把名字改掉即可

[root@master metrics-server]# kubectl logs metrics-server-v0.3.1-76b796b-4xgvp   -c metrics-server -n kube-system

  

大致出错的日志内容如下几条;

403 Forbidden", response: "Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=stats)

E0903  1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host

no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )

E1109 09:54:49.509521       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:linuxea.node-2.com: unable to fetch metrics from Kubelet linuxea.node-2.com (10.10.240.203): Get https://10.10.240.203:10255/stats/summary/: dial tcp 10.10.240.203:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-3.com: unable to fetch metrics from Kubelet linuxea.node-3.com (10.10.240.143): Get https://10.10.240.143:10255/stats/summary/: dial tcp 10.10.240.143:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-4.com: unable to fetch metrics from Kubelet linuxea.node-4.com (10.10.240.142): Get https://10.10.240.142:10255/stats/summary/: dial tcp 10.10.240.142:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.master-1.com: unable to fetch metrics from Kubelet linuxea.master-1.com (10.10.240.161): Get https://10.10.240.161:10255/stats/summary/: dial tcp 10.10.240.161:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-1.com: unable to fetch metrics from Kubelet linuxea.node-1.com (10.10.240.202): Get https://10.10.240.202:10255/stats/summary/: dial tcp 10.10.240.202:10255: connect: connection refused]

  

当时我按照网上的方法尝试修改coredns配置,结果搞的日志出现获取所有pod都unable,如下,然后又取消掉了修改,删掉了coredns,让他自己重新生成了俩新的coredns容器

- --kubelet-insecure-tls这种方式是禁用tls验证,一般不建议在生产环境中使用。并且由于DNS是无法解析到这些主机名,使用- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP进行规避。还有另外一种方法,修改coredns,不过,我并不建议这样做。

参考这篇:https://github.com/kubernetes-incubator/metrics-server/issues/131

metrics-server unable to fetch pdo metrics for pod

  

以上为遇到的问题,反正用我上面的yaml绝对保证解决以上所有问题。还有那个flannel改了directrouting之后为啥每次重启集群机器,他就失效呢,我不得不在删掉flannel然后重新生成,这个问题前面文章写到了。

此时执行如下命令就都成功了,item里也有值了

[root@master ~]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "metrics.k8s.io/v1beta1",
"resources": [
{
"name": "nodes",
"singularName": "",
"namespaced": false,
"kind": "NodeMetrics",
"verbs": [
"get",
"list"
]
},
{
"name": "pods",
"singularName": "",
"namespaced": true,
"kind": "PodMetrics",
"verbs": [
"get",
"list"
]
}
]

  

[root@master metrics-server]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/pods | more
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14868 0 14868 0 0 1521k 0 --:--:-- --:--:-- --:--:-- 1613k
{
"kind": "PodMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/metrics.k8s.io/v1beta1/pods"
},
"items": [
{
"metadata": {
"name": "pod1",
"namespace": "prod",
"selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/prod/pods/pod1",
"creationTimestamp": "2019-01-29T02:39:12Z"
},

  

[root@master metrics-server]# kubectl top pods
NAME CPU(cores) MEMORY(bytes)
filebeat-ds-4llpp 1m 2Mi
filebeat-ds-dv49l 1m 5Mi
myapp-0 0m 1Mi
myapp-1 0m 2Mi
myapp-2 0m 1Mi
myapp-3 0m 1Mi
myapp-4 0m 2Mi

  

[root@master metrics-server]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 206m 5% 1377Mi 72%
node1 88m 8% 534Mi 28%
node2 78m 7% 935Mi 49%

  

自定义指标(prometheus)

大家看到,我们的metrics已经可以正常工作了。不过,metrics只能监控cpu和内存,对于其他指标如用户自定义的监控指标,metrics就无法监控到了。这时就需要另外一个组件叫prometheus。

k8s系列---资源指标API及自定义指标API

prometheus的部署非常麻烦。

node_exporter是agent;

PromQL相当于sql语句来查询数据;

k8s-prometheus-adapter:prometheus是不能直接解析k8s的指标的,需要借助k8s-prometheus-adapter转换成api

kube-state-metrics是用来整合数据的。

k8s系列---资源指标API及自定义指标API

下面开始部署。

访问 https://github.com/ikubernetes/k8s-prom

[root@master pro]# git clone https://github.com/iKubernetes/k8s-prom.git

  

先创建一个叫prom的名称空间:

[root@master k8s-prom]# kubectl apply -f namespace.yaml
namespace/prom created

  

部署node_exporter:

[root@master k8s-prom]# cd node_exporter/
[root@master node_exporter]# ls
node-exporter-ds.yaml node-exporter-svc.yaml
[root@master node_exporter]# kubectl apply -f .
daemonset.apps/prometheus-node-exporter created
service/prometheus-node-exporter created

  

[root@master node_exporter]# kubectl get pods -n prom
NAME READY STATUS RESTARTS AGE
prometheus-node-exporter-dmmjj 1/1 Running 0 7m
prometheus-node-exporter-ghz2l 1/1 Running 0 7m
prometheus-node-exporter-zt2lw 1/1 Running 0 7m

  

部署prometheus:

[root@master k8s-prom]# cd prometheus/
[root@master prometheus]# ls
prometheus-cfg.yaml prometheus-deploy.yaml prometheus-rbac.yaml prometheus-svc.yaml
[root@master prometheus]# kubectl apply -f .
configmap/prometheus-config created
deployment.apps/prometheus-server created
clusterrole.rbac.authorization.k8s.io/prometheus created
serviceaccount/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
service/prometheus created

  

看prom名称空间中的所有资源: pod/prometheus-server-76dc8df7b-hw8xc  处于 Pending   状态,日志显示内存不足

 [root@master prometheus]# kubectl logs prometheus-server-556b8896d6-dfqkp -n prom
Warning FailedScheduling 2m52s (x2 over 2m52s) default-scheduler 0/3 nodes are available: 3 Insufficient memory.

  

修改prometheus-deploy.yaml,删掉内存那三行

        resources:
limits:
memory: 2Gi

  

重新apply

[root@master prometheus]# kubectl apply -f prometheus-deploy.yaml

  

[root@master prometheus]# kubectl get all -n prom
NAME READY STATUS RESTARTS AGE
pod/prometheus-node-exporter-dmmjj 1/1 Running 0 10m
pod/prometheus-node-exporter-ghz2l 1/1 Running 0 10m
pod/prometheus-node-exporter-zt2lw 1/1 Running 0 10m
pod/prometheus-server-65f5d59585-6l8m8 1/1 Running 0 55s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/prometheus NodePort 10.111.127.64 <none> 9090:30090/TCP 56s
service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 10m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/prometheus-node-exporter 3 3 3 3 3 <none> 10m
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-server 1 1 1 1 56s
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-server-65f5d59585 1 1 1 56s

  

上面我们看到通过NodePorts的方式,可以通过宿主机的30090端口,来访问prometheus容器里面的应用。

k8s系列---资源指标API及自定义指标API

最好挂载个pvc的存储,要不这些监控数据过一会就没了。

部署kube-state-metrics,用来整合数据:

[root@master k8s-prom]# cd kube-state-metrics/
[root@master kube-state-metrics]# ls
kube-state-metrics-deploy.yaml kube-state-metrics-rbac.yaml kube-state-metrics-svc.yaml
[root@master kube-state-metrics]# kubectl apply -f .
deployment.apps/kube-state-metrics created
serviceaccount/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created

  

[root@master kube-state-metrics]# kubectl get all -n prom
NAME READY STATUS RESTARTS AGE
pod/kube-state-metrics-58dffdf67d-v9klh 1/1 Running 0 14m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-state-metrics ClusterIP 10.111.41.139 <none> 8080/TCP 14m

  

部署k8s-prometheus-adapter,这个需要自制证书:

[root@master k8s-prometheus-adapter]# cd /etc/kubernetes/pki/
[root@master pki]# (umask 077; openssl genrsa -out serving.key 2048)
Generating RSA private key, 2048 bit long modulus
...........................................................................................+++
...............+++
e is 65537 (0x10001)

  

证书请求:

[root@master pki]#  openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"

  

开始签证:

[root@master pki]# openssl  x509 -req -in serving.csr -CA ./ca.crt -CAkey ./ca.key -CAcreateserial -out serving.crt -days 3650
Signature ok
subject=/CN=serving
Getting CA Private Key

  

创建加密的配置文件:

[root@master pki]# kubectl create secret generic cm-adapter-serving-certs --from-file=serving.crt=./serving.crt --from-file=serving.key=./serving.key  -n prom
secret/cm-adapter-serving-certs created

  

注:cm-adapter-serving-certs是custom-metrics-apiserver-deployment.yaml文件里面的名字。

[root@master pki]# kubectl get secrets -n prom
NAME TYPE DATA AGE
cm-adapter-serving-certs Opaque 2 51s
default-token-knsbg kubernetes.io/service-account-token 3 4h
kube-state-metrics-token-sccdf kubernetes.io/service-account-token 3 3h
prometheus-token-nqzbz kubernetes.io/service-account-token 3 3h

  

部署k8s-prometheus-adapter:

[root@master k8s-prom]# cd k8s-prometheus-adapter/
[root@master k8s-prometheus-adapter]# ls
custom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml custom-metrics-apiserver-service.yaml
custom-metrics-apiserver-auth-reader-role-binding.yaml custom-metrics-apiservice.yaml
custom-metrics-apiserver-deployment.yaml custom-metrics-cluster-role.yaml
custom-metrics-apiserver-resource-reader-cluster-role-binding.yaml custom-metrics-resource-reader-cluster-role.yaml
custom-metrics-apiserver-service-account.yaml hpa-custom-metrics-cluster-role-binding.yaml

  

由于k8s v1.11.2和k8s-prometheus-adapter最新版不兼容,1.13的也不兼容,解决办法就是访问https://github.com/DirectXMan12/k8s-prometheus-adapter/tree/master/deploy/manifests下载最新版的custom-metrics-apiserver-deployment.yaml文件,并把里面的namespace的名字改成prom;同时还要下载custom-metrics-config-map.yaml文件到本地来,并把里面的namespace的名字改成prom。

[root@master k8s-prometheus-adapter]# kubectl apply -f .
clusterrolebinding.rbac.authorization.k8s.io/custom-metrics:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/custom-metrics-auth-reader created
deployment.apps/custom-metrics-apiserver created
clusterrolebinding.rbac.authorization.k8s.io/custom-metrics-resource-reader created
serviceaccount/custom-metrics-apiserver created
service/custom-metrics-apiserver created
apiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/custom-metrics-server-resources created
clusterrole.rbac.authorization.k8s.io/custom-metrics-resource-reader created
clusterrolebinding.rbac.authorization.k8s.io/hpa-controller-custom-metrics created

  

[root@master k8s-prometheus-adapter]# kubectl get all -n prom
NAME READY STATUS RESTARTS AGE
pod/custom-metrics-apiserver-65f545496-64lsz 1/1 Running 0 6m
pod/kube-state-metrics-58dffdf67d-v9klh 1/1 Running 0 4h
pod/prometheus-node-exporter-dmmjj 1/1 Running 0 4h
pod/prometheus-node-exporter-ghz2l 1/1 Running 0 4h
pod/prometheus-node-exporter-zt2lw 1/1 Running 0 4h
pod/prometheus-server-65f5d59585-6l8m8 1/1 Running 0 4h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/custom-metrics-apiserver ClusterIP 10.103.87.246 <none> 443/TCP 36m
service/kube-state-metrics ClusterIP 10.111.41.139 <none> 8080/TCP 4h
service/prometheus NodePort 10.111.127.64 <none> 9090:30090/TCP 4h
service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 4h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/prometheus-node-exporter 3 3 3 3 3 <none> 4h
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/custom-metrics-apiserver 1 1 1 1 36m
deployment.apps/kube-state-metrics 1 1 1 1 4h
deployment.apps/prometheus-server 1 1 1 1 4h
NAME DESIRED CURRENT READY AGE
replicaset.apps/custom-metrics-apiserver-5f6b4d857d 0 0 0 36m
replicaset.apps/custom-metrics-apiserver-65f545496 1 1 1 6m
replicaset.apps/custom-metrics-apiserver-86ccf774d5 0 0 0 17m
replicaset.apps/kube-state-metrics-58dffdf67d 1 1 1 4h
replicaset.apps/prometheus-server-65f5d59585 1 1 1 4h

  

最终看到prom名称空间里面的所有资源都是running状态了。

[root@master k8s-prometheus-adapter]# kubectl api-versions
custom.metrics.k8s.io/v1beta1

  

可以看到custom.metrics.k8s.io/v1beta1这个api了。我那没看到上面这个东西,但是不影响使用

开个代理:

[root@master k8s-prometheus-adapter]# kubectl proxy --port=8080

  

可以看到指标数据了:

[root@master pki]# curl  http://localhost:8080/apis/custom.metrics.k8s.io/v1beta1/
{
"name": "pods/ceph_rocksdb_submit_transaction_sync",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "jobs.batch/kube_deployment_created",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "jobs.batch/kube_pod_owner",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},

  

下面我们就可以愉快的创建HPA了(水平Pod自动伸缩)。

另外,prometheus还可以和grafana整合。如下步骤。

先下载文件grafana.yaml,访问https://github.com/kubernetes/heapster/blob/master/deploy/kube-config/influxdb/grafana.yaml

[root@master pro]# wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/grafana.yaml

  

修改grafana.yaml文件内容:

把namespace: kube-system改成prom,有两处;
把env里面的下面两个注释掉:
- name: INFLUXDB_HOST
value: monitoring-influxdb
在最有一行加个type: NodePort
ports:
- port: 80
targetPort: 3000
selector:
k8s-app: grafana
type: NodePort

  

[root@master pro]# kubectl apply -f grafana.yaml
deployment.extensions/monitoring-grafana created
service/monitoring-grafana created

  

[root@master pro]# kubectl get pods -n prom
NAME READY STATUS RESTARTS AGE
monitoring-grafana-ffb4d59bd-gdbsk 1/1 Running 0 5s

  

如果还有问题就删掉上面的那几个,重新在apply一下

看到grafana这个pod运行起来了。

[root@master pro]# kubectl get svc -n prom
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
monitoring-grafana NodePort 10.106.164.205 <none> 80:32659/TCP 19m

  

我们可以访问宿主机master ip: http://172.16.1.100:32659

k8s系列---资源指标API及自定义指标API

k8s系列---资源指标API及自定义指标API

上图端口号是9090,根据自己svc实际端口去填写。除了把80 改成9090.其余不变,为什么是上面的格式,因为他们都处于一个名称空间内,可以通过服务名访问到的。

[root@master pro]# kubectl get svc -n prom
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
custom-metrics-apiserver ClusterIP 10.109.58.249 <none> 443/TCP 52m
kube-state-metrics ClusterIP 10.103.52.45 <none> 8080/TCP 69m
monitoring-grafana NodePort 10.110.240.31 <none> 80:31128/TCP 17m
prometheus NodePort 10.110.19.171 <none> 9090:30090/TCP 145m
prometheus-node-exporter ClusterIP None <none> 9100/TCP 146m

  

k8s系列---资源指标API及自定义指标API

k8s系列---资源指标API及自定义指标API

然后,就能从界面上看到相应的数据了。

登录下面的网站下载个grafana监控k8s-prometheus的模板: https://grafana.com/dashboards/6417

k8s系列---资源指标API及自定义指标API

然后再grafana的界面中导入上面下载的模板:

k8s系列---资源指标API及自定义指标API

k8s系列---资源指标API及自定义指标API

导入模板之后,就能看到监控数据了:

k8s系列---资源指标API及自定义指标API

HPA的没去实际操作,因为以前自己做过了,就不做了,直接复制过来,如有问题自己单独解决

HPA(水平pod自动扩展)

当pod压力大了,会根据负载自动扩展Pod个数以均匀压力。

目前,HPA只支持两个版本,v1版本只支持核心指标的定义(只能根据cpu利用率的指标进行pod的扩展);

[root@master pro]# kubectl explain hpa.spec.scaleTargetRef
scaleTargetRef:表示基于什么指标来计算pod伸缩的标准

  

[root@master pro]# kubectl api-versions |grep auto
autoscaling/v1
autoscaling/v2beta1

  

上面看到分别支持hpav1和hpav2。

下面我们用命令行的方式重新创建一个带有资源限制的pod myapp:

[root@master ~]# kubectl run myapp --image=ikubernetes/myapp:v1 --replicas=1 --requests='cpu=50m,memory=256Mi' --limits='cpu=50m,memory=256Mi' --labels='app=myapp' --expose --port=80
service/myapp created
deployment.apps/myapp created

  

[root@master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-6985749785-fcvwn 1/1 Running 0 58s

  

下面我们让myapp 这个pod能自动水平扩展,用kubectl autoscale,其实就是指明HPA控制器的。

[root@master ~]# kubectl autoscale deployment myapp --min=1 --max=8 --cpu-percent=60
horizontalpodautoscaler.autoscaling/myapp autoscaled

  

--min:表示最小扩展pod的个数

--max:表示最多扩展pod的个数

--cpu-percent:cpu利用率

[root@master ~]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapp Deployment/myapp 0%/60% 1 8 1 4m

  

[root@master ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
myapp ClusterIP 10.105.235.197 <none> 80/TCP 19

  

下面我们把service改成NodePort的方式:

[root@master ~]# kubectl patch svc myapp -p '{"spec":{"type": "NodePort"}}'
service/myapp patched

  

[root@master ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
myapp NodePort 10.105.235.197 <none> 80:31990/TCP 22m

  

[root@master ~]# yum install httpd-tools #主要是为了安装ab压测工具

  

[root@master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
myapp-6985749785-fcvwn 1/1 Running 0 25m 10.244.2.84 node2

  

开始用ab工具压测

[root@master ~]# ab -c 1000 -n 5000000 http://172.16.1.100:31990/index.html
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 172.16.1.100 (be patient)

  

多等一会,会看到pods的cpu利用率为98%,需要扩展为2个pod了:

[root@master ~]# kubectl describe hpa
resource cpu on pods (as a percentage of request): 98% (49m) / 60%
Deployment pods: 1 current / 2 desired

  

[root@master ~]# kubectl top pods
NAME CPU(cores) MEMORY(bytes)
myapp-6985749785-fcvwn 49m (我们设置的总cpu是50m) 3Mi

  

[root@master ~]#  kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
myapp-6985749785-fcvwn 1/1 Running 0 32m 10.244.2.84 node2
myapp-6985749785-sr4qv 1/1 Running 0 2m 10.244.1.105 node1

  

上面我们看到已经自动扩展为2个pod了,再等一会,随着cpu压力的上升,还会看到自动扩展为4个或更多的pod:

[root@master ~]#  kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
myapp-6985749785-2mjrd 1/1 Running 0 1m 10.244.1.107 node1
myapp-6985749785-bgz6p 1/1 Running 0 1m 10.244.1.108 node1
myapp-6985749785-fcvwn 1/1 Running 0 35m 10.244.2.84 node2
myapp-6985749785-sr4qv 1/1 Running 0 5m 10.244.1.105 node1

  

等压测一停止,pod个数还会收缩为正常个数的。

上面我们用的是hpav1来做的水平pod自动扩展的功能,我们前面也说过,hpa v1版本只能根据cpu利用率括水平自动扩展pod。

下面我们介绍一下hpa v2的功能,它可以根据自定义指标利用率来水平扩展pod。

在使用hpa v2版本前,我们先把前面创建的hpa v1版本删除了,以免和我们测试的hpa v2版本冲突:

[root@master hpa]# kubectl delete hpa myapp
horizontalpodautoscaler.autoscaling "myapp" deleted

  

好了,下面我们创建一个hpa v2:

[root@master hpa]# cat hpa-v2-demo.yaml
apiVersion: autoscaling/v2beta1 #从这可以看出是hpa v2版本
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa-v2
spec:
scaleTargetRef: #根据什么指标来做评估压力
apiVersion: apps/v1 #对谁来做自动扩展
kind: Deployment
name: myapp
minReplicas: 1 #最少副本数量
maxReplicas: 10
metrics: #表示依据哪些指标来进行评估
- type: Resource #表示基于资源进行评估
resource:
name: cpu
targetAverageUtilization: 55 #表示pod cpu使用率超过55%,就自动水平扩展pod个数
- type: Resource
resource:
name: memory #我们知道hpa v1版本只能根据cpu来进行评估,而到了我们的hpa v2版本就可以根据内存来进行评估了
targetAverageValue: 50Mi #表示pod内存使用超过50M,就自动水平扩展pod个数

  

[root@master hpa]# kubectl apply -f hpa-v2-demo.yaml
horizontalpodautoscaler.autoscaling/myapp-hpa-v2 created

  

[root@master hpa]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapp-hpa-v2 Deployment/myapp 3723264/50Mi, 0%/55% 1 10 1 37s

  

我们看到现在只有一个pod

[root@master hpa]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
myapp-6985749785-fcvwn 1/1 Running 0 57m 10.244.2.84 node2

  

开始压测:

[root@master ~]# ab -c 100 -n 5000000 http://172.16.1.100:31990/index.html

  

看hpa v2的检测情况:

[root@master hpa]# kubectl describe hpa
Metrics: ( current / target )
resource memory on pods: 3756032 / 50Mi
resource cpu on pods (as a percentage of request): 82% (41m) / 55%
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 2 desired

  

[root@master hpa]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
myapp-6985749785-8frq4 1/1 Running 0 1m 10.244.1.109 node1
myapp-6985749785-fcvwn 1/1 Running 0 1h 10.244.2.84 node2

  

看到自动扩展出了2个Pod。等压测一停止,pod个数还会收缩为正常个数的。

将来我们不光可以用hpa v2,根据cpu和内存使用率进行伸缩Pod个数,还可以根据http并发量等。

比如下面的:

[root@master hpa]# cat hpa-v2-custom.yaml
apiVersion: autoscaling/v2beta1 #从这可以看出是hpa v2版本
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa-v2
spec:
scaleTargetRef: #根据什么指标来做评估压力
apiVersion: apps/v1 #对谁来做自动扩展
kind: Deployment
name: myapp
minReplicas: 1 #最少副本数量
maxReplicas: 10
metrics: #表示依据哪些指标来进行评估
- type: Pods #表示基于资源进行评估
pods:
metricName: http_requests#自定义的资源指标
targetAverageValue: 800m #m表示个数,表示并发数800

  

关于并发数的hpa,具体镜像可以参考https://hub.docker.com/r/ikubernetes/metrics-app/