kubernetes 部署prometheus笔记 (五)

时间:2021-05-21 13:06:46

部署alertmanager

考虑到prometheus需要在配置文件中设置alertmanager监听地址和端口,因此采用把alertmanager和prometheus部署在同一个pod中的方式,当然也可以另外以单独pod部署,然后通过service和port的方式来配置,但是不知为啥,没测试成功.增加相应的配置到prometheus.yml中:

 prometheus.yml: |-
    global:
      scrape_interval: 90s
      evaluation_interval: 90s
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["localhost:9093"]
          #- alertmanager:9093
    rule_files:
      - /etc/prometheus/rules.yml

增加alertmanager需要用的告警规则到prometheus.yml中:

 rules.yml: |-
    groups:
    - name: test-rule
      rules:
      - alert: NodeFilesystemUsage
        expr: (node_filesystem_size{device="rootfs"} - node_filesystem_free{device="rootfs"}) / node_filesystem_size{device="rootfs"} * 100 > 80
        for: 2m
        labels:
          team: node
        annotations:
          summary: "{{$labels.instance}}: High Filesystem usage detected"
          description: "{{$labels.instance}}: Filesystem usage is above 80% (current value is: {{ $value }}"
      - alert: NodeMemoryUsage
        expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 80
        for: 2m
        labels:
          team: node
        annotations:
          summary: "{{$labels.instance}}: High Memory usage detected"
          description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"
      - alert: NodeCPUUsage
        expr: (100 - (avg by (instance) (irate(node_cpu{job="kubernetes-node-exporter",mode="idle"}[5m])) * 100)) > 80
        for: 2m
        labels:
          team: node
        annotations:
          summary: "{{$labels.instance}}: High CPU usage detected"
          description: "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }}"


修改prometheus-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-deployment
  namespace: kube-system
  #annotations:
    # used to scrape app's metrics which deployed in pod
  #  prometheus.io/scrape: 'true'
    # prometheus scrape path, default /metrics
  #  prometheus.io/path: '/metrics'
    # prometheus.io/port relvant port
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      securityContext:
        runAsUser: 0
      containers:
      - name: prometheus
        image: prom/prometheus:v2.2.0
        args:
          - "--config.file=/etc/prometheus/prometheus.yml"
          - "--storage.tsdb.path=/prometheus"
        ports:
          - containerPort: 9090
            protocol: TCP
        volumeMounts:
          - name: gluster-volume
            mountPath: /prometheus
          - name: config-volume
            mountPath: /etc/prometheus
      - name: alertmanager
        image: x.x.x.x/library/prom/alertmanager:latest
        args:
          - '--config.file=/etc/alertmanager/config.yml'
        ports:
        - name: alertmanager
          containerPort: 9093
        volumeMounts:
        - name: alert-volume
          mountPath: /etc/alertmanager
      imagePullSecrets:
      - name: my-secret
      volumes:
        - name: gluster-volume
          persistentVolumeClaim:
            claimName: gluster-prometheus
        - name: config-volume
          configMap:
            name: prometheus-server-conf
        - name: alert-volume
          configMap:
            name: alertmanager

准备alertmanager告警需要用到的邮件设置:

kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: kube-system
data:
  config.yml: |-
    global:
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: 'xxxx@163.com'
      smtp_auth_username: 'xxxx@163.com'
      smtp_auth_password: 'xxxx'


    templates:
      - '/root/alertmanager/template/*.tmpl'

    route:
      group_by: ['alertname', 'cluster', 'service']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 10m
      receiver: default-receiver


    receivers:
    - name: 'default-receiver'
      email_configs:
      - to: 'xxxx@xxx.com'

注意,163的邮箱设置中必须打开SMTP,否则会报如下错误:

evel=error ts=2018-04-03T03:39:32.793284112Z caller=notify.go:303 component=dispatcher msg="Error on notify" err="*notify.loginAuth failed: 550 User has no permission"
level=error ts=2018-04-03T03:39:32.793463167Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="*notify.loginAuth failed: 550 User has no permission"
kubernetes 部署prometheus笔记 (五)

进行创建部署即可.