使用k8s部署redis exporter监控所有的Redis实例

时间:2023-01-09 16:04:27

说明

针对于redis实例的监控主要分为两类:

  • 自建redis实例

  • 云端托管的redis

本文主要简单说明下使用redis export + Prometheus + Consul + Grafana + Prometheusalert整套流程对自建 redis 监控指标收集,看板展示和信息告警的基本流程。

部署 redis export

本次部署使用k8s,yaml 部署文件如下:

cat >  redis-exporter-selfbuild.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: redis-10-x-x-x-master
  name: redis-10-x-x-x-master
  namespace: kubesphere-monitoring-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-10-x-x-x-master
  template:
    metadata:
      labels:
        app: redis-10-x-x-x-master
    spec:
      containers:
      - name: redis-exporter
        image: oliver006/redis_exporter:latest
        env:
        - name: TZ
          value: "Asia/Shanghai"
        - name: REDIS_ADDR
          value: "redis://10.x.x.x:6379"
        - name: REDIS_PASSWORD
          value: "test@2023"
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - name: http-metrics
          containerPort: 9121
          protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: redis-10-x-x-x-master
  name: redis-10-x-x-x-master
  namespace: kubesphere-monitoring-system
spec:
  ports:
  - name: http-metirc
    protocol: TCP
    port: 9121
    targetPort: 9121
  selector:
    app: redis-10-x-x-x-master
## 部署 redis export
kubectl apply -f redis-exporter-selfbuild.yaml
## 查看资源
kubectl get pod -n kubesphere-monitoring-system|grep redis

说明

  • 自建redis实例信息采集,这里采用的一个exporter对应一个redis实例;

  • 云端redis实例信息采集,可以支持一个exporter采集多个不同密码的redis实例,下面提供一个yaml文件作为参考;

  • 不同密码不同redis实例采集,可使用Multiple instances with different passwords不推荐,因会暴露redis的密码信息。

使用k8s部署redis exporter监控所有的Redis实例

云端redis实例信息采集

支持一个exporter采集多个不同密码的redis实例。

cat > redis-exporter-cloud.yaml <<EOF
---  
apiVersion: v1
data:
  redis_passwd.json: |
    {
      "redis://xxxxxxxx:6379":"test@2023", ## use passwd
      "redis://xxxxxxxx:6379":""           ## not use passwd
    }
kind: ConfigMap
metadata:
  name: redis-passwd-cm
  namespace: kubesphere-monitoring-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: redis-exporter-prod
  name: redis-exporter-prod
  namespace: kubesphere-monitoring-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-exporter-prod
  template:
    metadata:
      labels:
        app: redis-exporter-prod
    spec:
      containers:
      - name: redis-exporter
        image: oliver006/redis_exporter:latest
        env:
        - name: TZ
          value: "Asia/Shanghai"
        command:
        - "/redis_exporter"
        args:
        - '-redis.password-file=/redis_passwd.json'
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - name: http-metrics
          containerPort: 9121
          protocol: TCP
        volumeMounts:
        - name: redis-passwd-conf-map
          mountPath: /redis_passwd.json
          subPath: redis_passwd.json
      volumes:
      - name: redis-passwd-conf-map
        configMap:
          name: redis-passwd-cm
          items:
          - key: redis_passwd.json
            path: redis_passwd.json
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: redis-exporter-prod
  name: redis-exporter-prod
  namespace: kubesphere-monitoring-system
spec:
  ports:
  - name: http-metirc
    protocol: TCP
    port: 9121
    targetPort: 9121
  selector:
    app: redis-exporter-prod
EOF

基于 consul 动态发现

关于 Consul 的部署不在本文讨论范围内,自行谷歌或百度helm安装部署。

这里建议部署一个ConsulManager,方便对consul在web界面上进行增删改查。

consul相关部署个人私库

kubectl apply -f yamls/consulmanager.yaml

监控信息推送至consul或consulmanager,可采用 consulmanager web 界面直接编辑添加的形式,也可以使用命令行:

cat > redis-10-x-x-x-master.sh <<EOF
#!/bin/sh
curl --location --request PUT  'https://consul.test.com/v1/agent/service/register?replace-existing-checks=1' \
 --header 'Content-Type: application/json' \
 --data '{
  "ID": "redis-10-x-x-x-master",               
  "Name": "redis-prod-outk8s",   
  "Tags": [
    "devops-ci"     ## 用于prometheus自动发现标志                 
  ],
  "Address": "redis-10-x-x-x-master.kubesphere-monitoring-system.svc",         
  "Port": 9121,                  
  "Meta": {                      ## 自定义的labels
    "namespace": "outk8s-redis",
    "project": "prod",
    "vendor": "self-built",
    "account": "outk8s",
    "group": "global",
    "name": "redis-10-x-x-x-master",
    "instance": "10-x-x-x",
    "iid": "20230106"
  },
  "EnableTagOverride": false,
  "Check": {
    "HTTP": "http://redis-10-x-x-x-master.kubesphere-monitoring-system.svc:9121/metrics",
    "Interval": "10s"
  },
  "Weights": {
    "Passing": 10,
    "Warning": 1
  }
}'
EOF
sh redis-10-x-x-x-master.sh

登录consul或consulmanager界面就可以看到相关推送metrics信息。

Prometheus 自动发现

prometheus consul自动发现配置,可参考Prometheus Operator 通过 consul 实现自动服务发现

Prometheusalert 对接

项目所在地址Prometheusalert,部署可参考:

Grafana数据看板接入

Redis Exporter Dashboard 中文版展示看板

使用k8s部署redis exporter监控所有的Redis实例

告警规则

cat > consul-redis-exporter-rules.yaml << EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: consul-redis-exporter
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 1.3.1
    prometheus: k8s
    role: alert-rules
  name: consul-redis-exporter-rules
  namespace: kubesphere-monitoring-system
spec:
  groups:
  - name: REDIS-Alert
    rules:
    - alert: RedisDown
      expr: redis_up == 0
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis down (instance {{ $labels.instance }})
        description: "Redis instance is down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisMissingMaster
      expr: (count(redis_instance_info{role="master"}) by (instance)) < 1
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis missing master (instance {{ $labels.instance }})
        description: "Redis cluster has no node marked as master.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisTooManyMasters
      expr: count(redis_instance_info{role="master"}) by (instance) > 1
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis too many masters (instance {{ $labels.instance }})
        description: "Redis cluster has too many nodes marked as master.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisDisconnectedSlaves
      expr: count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 > 1
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis disconnected slaves (instance {{ $labels.instance }})
        description: "Redis not replicating for all slaves. Consider reviewing the redis replication status.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisReplicationBroken
      expr: delta(redis_connected_slaves[2m]) < 0
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis replication broken (instance {{ $labels.instance }})
        description: "Redis instance lost a slave\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisClusterFlapping
      expr: changes(redis_connected_slaves[1m]) > 1
      for: 2m
      labels:
        severity: 紧急
      annotations:
        summary: Redis cluster flapping (instance {{ $labels.instance }})
        description: "Changes have been detected in Redis replica connection. This can occur when replica nodes lose connection to the master and reconnect (a.k.a flapping).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisMissingBackup
      expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis missing backup (instance {{ $labels.instance }})
        description: "Redis has not been backuped for 24 hours\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    # The exporter must be started with --include-system-metrics flag or REDIS_EXPORTER_INCL_SYSTEM_METRICS=true environment variable.
    - alert: RedisOutOfSystemMemory
      expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
      for: 2m
      labels:
        severity: 警告
      annotations:
        summary: Redis out of system memory (instance {{ $labels.instance }})
        description: "Redis is running out of system memory (> 90%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    #- alert: RedisOutOfConfiguredMaxmemory
    #  expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90
    #  for: 2m
    #  labels:
    #    severity: 警告
    #  annotations:
    #    summary: Redis out of configured maxmemory (instance {{ $labels.instance }})
    #    description: "Redis is running out of configured maxmemory (> 90%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisTooManyConnections
      expr: redis_connected_clients > 100
      for: 2m
      labels:
        severity: 警告
      annotations:
        summary: Redis too many connections (instance {{ $labels.instance }})
        description: "Redis instance has too many connections\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisNotEnoughConnections
      expr: redis_connected_clients < 1
      for: 2m
      labels:
        severity: 警告
      annotations:
        summary: Redis not enough connections (instance {{ $labels.instance }})
        description: "Redis instance should have more connections (> 5)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisRejectedConnections
      expr: increase(redis_rejected_connections_total[2m]) > 0
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis rejected connections (instance {{ $labels.instance }})
        description: "Some connections to Redis has been rejected\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
EOF
## 部署报警规则
kubectl apply -f consul-redis-exporter-rules.yaml

钉钉报警对接

报警样例:

使用k8s部署redis exporter监控所有的Redis实例

报警恢复样例:

使用k8s部署redis exporter监控所有的Redis实例

参考文档