说明
之前文章有聊过使用Prometheus监控自建Mysql,感兴趣的同学可以看下。本次我们简单聊下结合consulmanager
使用一个 mysqld exporter 监控所有的MySQL实例的具体实现方法。
主要步骤:
-
数据库授权
-
部署 mysql exporter
-
对接 Prometheus 监控
-
对接 consulmanager 服务发现
-
对接 grafana 数据看板
-
监控报警规则
-
对接钉钉告警机器人
数据库授权
创建 mysql 监控专用账号:
mysql> CREATE USER 'monitoring'@'%' IDENTIFIED BY 'Test#123';
Query OK, 0 rows affected (0.09 sec)
mysql> GRANT SELECT, PROCESS, SUPER, REPLICATION CLIENT, RELOAD ON *.* TO 'monitoring'@'%';
Query OK, 0 rows affected (0.01 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.02 sec)
登入测试:
mysql -h 192.168.1.1 -umonitoring -pTest#123
部署 mysql exporter
这里提供两种部署方式,具体文件详见下文。
docker-compose 部署 exporter
version: "3.2"
services:
mysqld_exporter:
image: swr.cn-south-1.myhuaweicloud.com/starsl.cn/mysqld_exporter:latest
container_name: mysqld_exporter
hostname: mysqld_exporter
restart: always
ports:
- "9104:9104"
volumes:
- /usr/share/zoneinfo/PRC:/etc/localtime
environment:
MYSQLD_EXPORTER_PASSWORD: Test#123
entrypoint:
- /bin/mysqld_exporter
- --collect.info_schema.innodb_metrics
- --collect.info_schema.tables
- --collect.info_schema.processlist
- --collect.info_schema.tables.databases=*
- --mysqld.username=monitoring
说明:
-
docker-compose中有2个变量:监控专用的mysql账号和密码,注意修改掉后再启动。
-
docker-compose配置方式是所有的mysql实例都配置了一样的mysql监控账号和密码。
-
如果你有不同mysql实例需要配置不同监控账号密码的需求,请参考官方readme使用配置文件的方式启动。
启动 mysql exporter 服务:
docker-compose up -d
k8s 部署 exporter
mysql-exporter.yaml
文件内容如下:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: mysql-exporter
name: mysql-exporter
namespace: kubesphere-monitoring-system
spec:
replicas: 1
selector:
matchLabels:
app: mysql-exporter
template:
metadata:
labels:
app: mysql-exporter
spec:
containers:
- name: mysql-exporter
image: swr.cn-south-1.myhuaweicloud.com/starsl.cn/mysqld_exporter:latest
command:
- /bin/mysqld_exporter
args:
- --collect.info_schema.innodb_metrics
- --collect.info_schema.tables
- --collect.info_schema.processlist
- --collect.info_schema.tables.databases=*
- --mysqld.username=monitoring
env:
- name: TZ
value: "Asia/Shanghai"
- name: MYSQLD_EXPORTER_PASSWORD
value: "Test#123"
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- name: http-metrics
containerPort: 9104
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
labels:
app: mysql-exporter
name: mysql-exporter
namespace: kubesphere-monitoring-system
spec:
ports:
- name: http-metirc
protocol: TCP
port: 9104
targetPort: 9104
selector:
app: mysql-exporter
启动 mysql exporter 服务:
kubectl apply -f mysql-exporter.yaml
对接 Prometheus 监控
prometheus 自动发现 mysql 服务配置如下:
- job_name: multi_mysqld_exporter
scrape_interval: 15s
scrape_timeout: 5s
metrics_path: /probe
consul_sd_configs:
## consul 服务地址
- server: '192.168.10.60:8500'
token: 'fe48c9a4-364e-af23-81df-9f28303012af'
refresh_interval: 30s
services: ['selfrds_exporter']
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*OFF.*
action: drop
- source_labels: [__meta_consul_service_address,__meta_consul_service_port]
regex: ([^:]+)(?::\d+)?;(\d+)
target_label: __param_target
replacement: $1:$2
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
## mysql exporter 服务地址和端口
replacement: mysql-exporter.kubesphere-monitoring-system.svc:9104
- source_labels: ['__meta_consul_service_metadata_vendor']
target_label: vendor
- source_labels: ['__meta_consul_service_metadata_region']
target_label: region
- source_labels: ['__meta_consul_service_metadata_group']
target_label: group
- source_labels: ['__meta_consul_service_metadata_account']
target_label: account
- source_labels: ['__meta_consul_service_metadata_name']
target_label: name
- source_labels: ['__meta_consul_service_metadata_iid']
target_label: iid
- source_labels: ['__meta_consul_service_metadata_exp']
target_label: exp
- source_labels: ['__meta_consul_service_metadata_cpu']
target_label: cpu
- source_labels: ['__meta_consul_service_metadata_mem']
target_label: mem
- source_labels: ['__meta_consul_service_metadata_disk']
target_label: disk
- source_labels: ['__meta_consul_service_metadata_itype']
target_label: itype
对接 consulmanager 服务发现
登入consulmanager
点击“ 云资源管理 ”,“ MySQL管理 ”,“ 自建MySQL管理 ”,点击新增或导入来添加待监控mysql服务信息。
添加成功后,登入 prometheus Targets 控制台,可以看到multi_mysqld_exporter
下新增的 mysql 服务信息。
对接 grafana 数据看板
Grafana 看板详情, Grafana 看板ID:17320:
监控报警规则
consul-mysql-exporter-rules.yaml
文件内容如下:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: consul-mysql-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.3.1
prometheus: k8s
role: alert-rules
name: consul-mysql-exporter-rules
namespace: kubesphere-monitoring-system
spec:
groups:
- name: MySQL-Alert
rules:
- alert: MySQL_is_down
expr: mysql_up == 0
for: 3m
labels:
severity: 紧急
annotations:
description: "{{ $labels.group }}_{{ $labels.name }}:MySQL database is down. \n> {{ $labels.instance }}\n> {{ $labels.iid }}"
- alert: MySQL_慢查询过多
expr: delta(mysql_global_status_slow_queries[1m]) > 60
for: 1m
labels:
severity: 警告
annotations:
description: "{{ $labels.group }}_{{ $labels.name }}:每分钟慢查询:{{ $value }} \n> {{ $labels.instance }}\n> {{ $labels.iid }}"
- alert: MySQL_当前活跃的连接数过多
expr: mysql_global_status_threads_running > 100
for: 1m
labels:
severity: 紧急
annotations:
description: "{{ $labels.group }}_{{ $labels.name }}:当前活跃的连接数:{{ $value }} \n> {{ $labels.instance }}\n> {{ $labels.iid }}"
- alert: MySQL_当前updating状态的线程过多
expr: mysql_info_schema_processlist_threads{state=~"updating"} > 100
for: 1m
labels:
severity: 警告
annotations:
description: "{{ $labels.group }}_{{ $labels.name }}:当前updating状态的线程:{{ $value }} \n> {{ $labels.instance }}\n> {{ $labels.iid }}"
- alert: MySQL_High_QPS
expr: irate(mysql_global_status_questions[3m]) > 30000
for: 2m
labels:
severity: 警告
annotations:
description: "{{ $labels.group }}_{{ $labels.name }}:Mysql QPS:{{ $value | humanize }} \n> {{ $labels.instance }}\n> {{ $labels.iid }}"
- alert: MySQL_Too_Many_Connections
expr: irate(mysql_global_status_threads_connected[3m]) > 1000
for: 2m
labels:
severity: 警告
annotations:
description: "{{ $labels.group }}_{{ $labels.name }}:Mysql Connections:{{ $value | humanize }} \n> {{ $labels.instance }}\n> {{ $labels.iid }}"
- alert: MySQL_主从IO线程运行状态异常
expr: mysql_slave_status_master_server_id > 0 and ON (instance) mysql_slave_status_slave_io_running == 0
for: 1m
labels:
severity: 紧急
annotations:
description: "{{ $labels.group }}_{{ $labels.name }}:MySQL Slave IO thread not running \n> {{ $labels.instance }}\n> {{ $labels.iid }}"
- alert: MySQL_主从SQL线程运行状态异常
expr: mysql_slave_status_master_server_id > 0 and ON (instance) mysql_slave_status_slave_sql_running == 0
for: 1m
labels:
severity: 紧急
annotations:
description: "{{ $labels.group }}_{{ $labels.name }}:MySQL Slave SQL thread not running \n> {{ $labels.instance }}\n> {{ $labels.iid }}"
- alert: MySQL_主从复制延迟过高
expr: mysql_slave_status_seconds_behind_master > 3
for: 1m
labels:
severity: 紧急
annotations:
description: "{{ $labels.group }}_{{ $labels.name }}:主从复制延迟当前:{{ $value | humanize }}s \n> {{ $labels.instance }}\n> {{ $labels.iid }}"
- alert: MySQL_is_Restart
expr: mysql_global_status_uptime <600
for: 2m
labels:
severity: 紧急
annotations:
description: "{{ $labels.group }}_{{ $labels.name }}:MySQL database is Restart. \n> {{ $labels.instance }}\n> {{ $labels.iid }}"
部署监控报警规则:
kubectl apply -f consul-mysql-exporter-rules.yaml
对接钉钉告警机器人
这里简单写下路由:
- "match_re":
"region": "mysql.*"
"receiver": "dd-mysql-prod"