一、角色分配
- Prometheus 采集数据
- Grafana 用于图表展示
- redis_exporter 用于收集redis的metrics
- node-exporter 用于收集操作系统和硬件信息的metrics
- cadvisor 用于收集docker的相关metrics
二、安装Docker
可以参考:https://ximeneschen.blog.csdn.net/article/details/104923157
三、安装Docker-Compose
参考:https://ximeneschen.blog.csdn.net/article/details/125651027
四、部署Prometheus和Grafana
- 新增Prometheus配置文件
首先,创建/data/prometheus/目录,然后创建prometheus.yml,填入如下内容:
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets: ['192.168.3.250:9093']# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:- "node_down.yml"# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: 'prometheus'# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ['192.168.3.250:9094']- job_name: 'redis'static_configs:- targets: ['192.168.3.250:9121']labels:instance: redis- job_name: 'node'scrape_interval: 8sstatic_configs:- targets: ['192.168.3.250:9100']labels:instance: node- job_name: 'cadvisor'static_configs:- targets: ['192.168.3.250:8088']labels:instance: cadvisori#基于文件自动加载新监控任务- job_name: 'file_ds'file_sd_configs:- files: ['/etc/prometheus/reload/*.yml']refresh_interval: 5s
- 接着创建node_down.yml,添加如下内容:
groups:
- name: node_downrules:- alert: InstanceDownexpr: up == 0for: 1mlabels:user: testannotations:summary: "Instance {{ $labels.instance }} down"description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
- 创建docker-compose
继续在/data/prometheus/目录中创建docker-compose-prometheus.yml,添加如下内容:
version: '2'networks:monitor:driver: bridgeservices:prometheus:image: prom/prometheuscontainer_name: prometheushostname: prometheusrestart: alwaysvolumes:- /data/prometheus/reload:/etc/prometheus/reload- /data/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml- /data/prometheus/node_down.yml:/etc/prometheus/node_down.ymlports:- "9094:9090"networks:- monitorgrafana:image: grafana/grafanacontainer_name: grafanahostname: grafanarestart: alwaysports:- "3000:3000"networks:- monitorredis-exporter:image: oliver006/redis_exportercontainer_name: redis_exporterhostname: redis_exporterrestart: alwaysports:- "9121:9121"networks:- monitorcommand:- '--redis.addr=redis://192.168.3.250:6379'- '--redis.password=password'node-exporter:image: quay.io/prometheus/node-exportercontainer_name: node-exporterhostname: node-exporterrestart: alwaysports:- "9100:9100"networks:- monitormysql-exporter:image: prom/mysqld-exportercontainer_name: mysql-exporterhostname: mysql-exporterrestart: alwaysports:- "9104:9104"networks:- monitorenvironment:DATA_SOURCE_NAME: "user:password@(192.168.3.250:3306)/"cadvisor:image: google/cadvisor:latestcontainer_name: cadvisorhostname: cadvisorrestart: alwaysvolumes:- /:/rootfs:ro- /var/run:/var/run:rw- /sys:/sys:ro- /var/lib/docker/:/var/lib/docker:roports:- "8088:8080"networks:- monitor
- 使用下面的命令启动docker-compose定义的容器
docker-compose -f /data/prometheus/docker-compose-prometheus.yml up -d
输入如下内容即代表启动成功:
Creating network "prometheus_monitor" with driver "bridge"
Creating cadvisor ... done
Creating prometheus ... done
Creating node-exporter ... done
Creating redis_exporter ... done
Creating grafana ... done
也可通过docker ps命令查看是否启动成功。如果要关闭并删除以上5个容器,只需要执行如下命令即可:
docker-compose -f /data/prometheus/docker-compose-monitor.yml down
同样也会输出如下日志:
Stopping cadvisor ... done
Stopping node-exporter ... done
Stopping grafana ... done
Stopping redis_exporter ... done
Stopping prometheus ... done
Removing cadvisor ... done
Removing node-exporter ... done
Removing grafana ... done
Removing redis_exporter ... done
Removing prometheus ... done
Removing network prometheus_monitor
复制代码
打开 http://192.168.3.250:9090/targets ,如果State都是UP即代表Prometheus工作正常,如下图所示:
CentOS7的防火墙firewall将对应的端口添加到防火墙策略里:
firewall-cmd --zone=public --add-port=9100/tcp --permanent
firewall-cmd --zone=public --add-port=8088/tcp --permanent
firewall-cmd --zone=public --add-port=9121/tcp --permanent
firewall-cmd --zone=public --add-port=3000/tcp --permanent
firewall-cmd --zone=public --add-port=9090/tcp --permanent
firewall-cmd --reload
可通过如下命令查看端口策略是否已经生效
firewall-cmd --permanent --zone=public --list-ports
五、配置Grafana
打开http://192.168.3.250:3000, 使用默认账号密码admin/admin登录并修改密码后,默认进来是创建数据库的页面
,在如下图所示中,选择Prometheus。
选择完成后,打开新的页面,在HTTP的URL中输入Prometheus的地址http://192.168.3.250:9094, 点击保存并测试。
使用Grafana模板进行数据展示,以Node-exporter为例子
在 https://grafana.com/grafana/dashboards 中搜索需要的Dashboard模版,并将其json文件下载下来。我本次主要监控node,于是只需要下载下面这个即可:
https://grafana.com/grafana/dashboards/17577-node-exporter-dashboard-22-04-17/
在Grafana菜单栏中第一个+号中,选择import
效果: