说明
针对于redis实例的监控主要分为两类:
-
自建redis实例
-
云端托管的redis
本文主要简单说明下使用redis export + Prometheus + Consul + Grafana + Prometheusalert
整套流程对自建 redis 监控指标收集,看板展示和信息告警的基本流程。
部署 redis export
本次部署使用k8s,yaml 部署文件如下:
cat > redis-exporter-selfbuild.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: redis-10-x-x-x-master
name: redis-10-x-x-x-master
namespace: kubesphere-monitoring-system
spec:
replicas: 1
selector:
matchLabels:
app: redis-10-x-x-x-master
template:
metadata:
labels:
app: redis-10-x-x-x-master
spec:
containers:
- name: redis-exporter
image: oliver006/redis_exporter:latest
env:
- name: TZ
value: "Asia/Shanghai"
- name: REDIS_ADDR
value: "redis://10.x.x.x:6379"
- name: REDIS_PASSWORD
value: "test@2023"
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- name: http-metrics
containerPort: 9121
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
labels:
app: redis-10-x-x-x-master
name: redis-10-x-x-x-master
namespace: kubesphere-monitoring-system
spec:
ports:
- name: http-metirc
protocol: TCP
port: 9121
targetPort: 9121
selector:
app: redis-10-x-x-x-master
## 部署 redis export
kubectl apply -f redis-exporter-selfbuild.yaml
## 查看资源
kubectl get pod -n kubesphere-monitoring-system|grep redis
说明:
-
自建redis实例信息采集,这里采用的一个exporter对应一个redis实例;
-
云端redis实例信息采集,可以支持一个exporter采集多个不同密码的redis实例,下面提供一个yaml文件作为参考;
-
不同密码不同redis实例采集,可使用Multiple instances with different passwords,不推荐,因会暴露redis的密码信息。
云端redis实例信息采集
支持一个exporter采集多个不同密码的redis实例。
cat > redis-exporter-cloud.yaml <<EOF
---
apiVersion: v1
data:
redis_passwd.json: |
{
"redis://xxxxxxxx:6379":"test@2023", ## use passwd
"redis://xxxxxxxx:6379":"" ## not use passwd
}
kind: ConfigMap
metadata:
name: redis-passwd-cm
namespace: kubesphere-monitoring-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: redis-exporter-prod
name: redis-exporter-prod
namespace: kubesphere-monitoring-system
spec:
replicas: 1
selector:
matchLabels:
app: redis-exporter-prod
template:
metadata:
labels:
app: redis-exporter-prod
spec:
containers:
- name: redis-exporter
image: oliver006/redis_exporter:latest
env:
- name: TZ
value: "Asia/Shanghai"
command:
- "/redis_exporter"
args:
- '-redis.password-file=/redis_passwd.json'
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- name: http-metrics
containerPort: 9121
protocol: TCP
volumeMounts:
- name: redis-passwd-conf-map
mountPath: /redis_passwd.json
subPath: redis_passwd.json
volumes:
- name: redis-passwd-conf-map
configMap:
name: redis-passwd-cm
items:
- key: redis_passwd.json
path: redis_passwd.json
---
apiVersion: v1
kind: Service
metadata:
labels:
app: redis-exporter-prod
name: redis-exporter-prod
namespace: kubesphere-monitoring-system
spec:
ports:
- name: http-metirc
protocol: TCP
port: 9121
targetPort: 9121
selector:
app: redis-exporter-prod
EOF
基于 consul 动态发现
关于 Consul 的部署不在本文讨论范围内,自行谷歌或百度helm安装部署。
这里建议部署一个ConsulManager,方便对consul在web界面上进行增删改查。
consul相关部署个人私库。
kubectl apply -f yamls/consulmanager.yaml
监控信息推送至consul或consulmanager,可采用 consulmanager web 界面直接编辑添加的形式,也可以使用命令行:
cat > redis-10-x-x-x-master.sh <<EOF
#!/bin/sh
curl --location --request PUT 'https://consul.test.com/v1/agent/service/register?replace-existing-checks=1' \
--header 'Content-Type: application/json' \
--data '{
"ID": "redis-10-x-x-x-master",
"Name": "redis-prod-outk8s",
"Tags": [
"devops-ci" ## 用于prometheus自动发现标志
],
"Address": "redis-10-x-x-x-master.kubesphere-monitoring-system.svc",
"Port": 9121,
"Meta": { ## 自定义的labels
"namespace": "outk8s-redis",
"project": "prod",
"vendor": "self-built",
"account": "outk8s",
"group": "global",
"name": "redis-10-x-x-x-master",
"instance": "10-x-x-x",
"iid": "20230106"
},
"EnableTagOverride": false,
"Check": {
"HTTP": "http://redis-10-x-x-x-master.kubesphere-monitoring-system.svc:9121/metrics",
"Interval": "10s"
},
"Weights": {
"Passing": 10,
"Warning": 1
}
}'
EOF
sh redis-10-x-x-x-master.sh
登录consul或consulmanager界面就可以看到相关推送metrics信息。
Prometheus 自动发现
prometheus consul自动发现配置,可参考Prometheus Operator 通过 consul 实现自动服务发现。
Prometheusalert 对接
项目所在地址Prometheusalert,部署可参考:
Grafana数据看板接入
Redis Exporter Dashboard 中文版展示看板
告警规则
cat > consul-redis-exporter-rules.yaml << EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: consul-redis-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.3.1
prometheus: k8s
role: alert-rules
name: consul-redis-exporter-rules
namespace: kubesphere-monitoring-system
spec:
groups:
- name: REDIS-Alert
rules:
- alert: RedisDown
expr: redis_up == 0
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis down (instance {{ $labels.instance }})
description: "Redis instance is down\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisMissingMaster
expr: (count(redis_instance_info{role="master"}) by (instance)) < 1
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis missing master (instance {{ $labels.instance }})
description: "Redis cluster has no node marked as master.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisTooManyMasters
expr: count(redis_instance_info{role="master"}) by (instance) > 1
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis too many masters (instance {{ $labels.instance }})
description: "Redis cluster has too many nodes marked as master.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisDisconnectedSlaves
expr: count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 > 1
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis disconnected slaves (instance {{ $labels.instance }})
description: "Redis not replicating for all slaves. Consider reviewing the redis replication status.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisReplicationBroken
expr: delta(redis_connected_slaves[2m]) < 0
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis replication broken (instance {{ $labels.instance }})
description: "Redis instance lost a slave\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisClusterFlapping
expr: changes(redis_connected_slaves[1m]) > 1
for: 2m
labels:
severity: 紧急
annotations:
summary: Redis cluster flapping (instance {{ $labels.instance }})
description: "Changes have been detected in Redis replica connection. This can occur when replica nodes lose connection to the master and reconnect (a.k.a flapping).\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisMissingBackup
expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis missing backup (instance {{ $labels.instance }})
description: "Redis has not been backuped for 24 hours\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
# The exporter must be started with --include-system-metrics flag or REDIS_EXPORTER_INCL_SYSTEM_METRICS=true environment variable.
- alert: RedisOutOfSystemMemory
expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
for: 2m
labels:
severity: 警告
annotations:
summary: Redis out of system memory (instance {{ $labels.instance }})
description: "Redis is running out of system memory (> 90%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
#- alert: RedisOutOfConfiguredMaxmemory
# expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90
# for: 2m
# labels:
# severity: 警告
# annotations:
# summary: Redis out of configured maxmemory (instance {{ $labels.instance }})
# description: "Redis is running out of configured maxmemory (> 90%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisTooManyConnections
expr: redis_connected_clients > 100
for: 2m
labels:
severity: 警告
annotations:
summary: Redis too many connections (instance {{ $labels.instance }})
description: "Redis instance has too many connections\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisNotEnoughConnections
expr: redis_connected_clients < 1
for: 2m
labels:
severity: 警告
annotations:
summary: Redis not enough connections (instance {{ $labels.instance }})
description: "Redis instance should have more connections (> 5)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisRejectedConnections
expr: increase(redis_rejected_connections_total[2m]) > 0
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis rejected connections (instance {{ $labels.instance }})
description: "Some connections to Redis has been rejected\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
EOF
## 部署报警规则
kubectl apply -f consul-redis-exporter-rules.yaml
钉钉报警对接
报警样例:
报警恢复样例: