prometheus+nightingale+categraf使用

prometheus安装

prometheus在此次的使用中，当做时序库。使用版本prometheus-2.45.0.linux-amd64.tar.gz。

下载及解压

mkdir -p /opt/prometheus
可以使用wget 下载 prometheus-2.45.0.linux-amd64.tar.gz
tar xf prometheus-2.45.0.linux-amd64.tar.gz
cp -far prometheus-2.45.0.linux-amd64/*  /opt/prometheus/

添加至服务

cat <<EOF >/etc/systemd/system/prometheus.service
[Unit]
Description="prometheus"
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple

ExecStart=/opt/prometheus/prometheus  --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data --web.enable-lifecycle --enable-feature=remote-write-receiver --query.lookback-delta=2m 

Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus


[Install]
WantedBy=multi-user.target
EOF

# 重新加载 systemd 守护程序配置
systemctl daemon-reload
# 启用 systemd 服务，使其在系统引导时自动启动
systemctl enable prometheus
# 重启
systemctl restart prometheus
# 状态
systemctl status prometheus

解析

多行文本

cat <<EOF 创建文本块的命令，允许在终端中输入多行文本，直到输入特定终止符(EOF)为止。一旦输入终止符，所有输入的文本将被传递给cat命令。

然后又使用 > 输入到prometheus.service文件中。

--config.file=/opt/prometheus/prometheus.yml
指定 Prometheus 的配置文件路径

--storage.tsdb.path=/opt/prometheus/data
指定 Prometheus 时序数据的硬盘存储路径

--web.enable-lifecycle
启用生命周期管理相关的 API，比如调用 /-/reload 接口就需要启用该项

--enable-feature=remote-write-receiver
启用 remote write 接收数据的接口，启用该项之后，categraf、grafana-agent 等 agent 就可以通过 /api/v1/write 接口推送数据给 Prometheus

--query.lookback-delta=2m
即时查询在查询当前最新值的时候，只要发现这个参数指定的时间段内有数据，就取最新的那个点返回，这个时间段内没数据，就不返回了

--web.enable-admin-api
启用管理性 API，比如删除时间序列数据的 /api/v1/admin/tsdb/delete_series 接口

启动后验证

启动后，如有防火墙，需要开放9090端口。

输入：http://192.168.80.3:9090/ 后进入web控制台页面。

在首页点击搜索框右边的小地球，输入prometheus，随便选择一个，点击execute，发现下面会有搜索结果。

因为prometheus启动配置中，把自己做为一个监控任务，在配置文件/opt/prometheus/prometheus.yml可以看到如下：

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

以后需要prometheus直接拉取的，就在这里配置。

nightingale安装

夜莺在此次使用中，当做监控的显示与告警管理，使用版本为n9e-5.8.0.tar.gz。

夜莺V5依赖mysql 、redis，时序库使用prometheus。

prometheus开启–enable-feature=remote-write-receiver。

下载及解压

mkdir -p /opt/n9e && cd /opt/n9e
tarball=n9e-5.8.0.tar.gz
urlpath=https://github.com/didi/nightingale/releases/download/v5.8.0/${tarball}
wget $urlpath || exit 1
tar zxvf ${tarball}

配置

把/opt/n9e/docker/initsql/a-n9e.sql导入至mysql中。

1	`cd /opt/n9e/etc`

修改server.conf、 webapi.conf中mysql和redis中的地址及用户名密码。

启动

1 2	`nohup ./n9e server &> server.log & nohup ./n9e webapi &> webapi.log &`

夜莺V5 服务器端使用的19000， web控制台使用的是18000。

http://192.168.80.3:18000/ 进入首页，同prometheus查询，有结果即为成功。

categraf安装

categraf为采集器。使用版本：categraf-v0.2.38-linux-amd64.tar.gz。

下载及解压

1
2
3

mkdir -p /opt/categraf && cd /opt/categraf
tar xf categraf-v0.2.38-linux-amd64.tar.gz
cp -far categraf-v0.2.38-linux-amd64/*  /opt/categraf/

添加至服务

cp categraf.service /etc/systemd/system/
systemctl daemon-reload
# 以服务方式启动
systemctl start categraf
# 停止服务
systemctl stop categraf

测试

1	`./categraf --test --inputs mem:system`

使用冒号来分割多个插件。如果某个插件报错可以去安装目录的查看conf/input.xxx/xxx.toml文件中的配置是否正确。

机器监控

在categraf配置文件/opt/categraf/conf/config.toml中

1 2	`[[writers]] url = "http://127.0.0.1:19000/prometheus/v1/write"`

这里是上送到nightingale, nightingale上送至prometheus。

也可以直接上送至prometheus。修改IP端口及地址即可(http://127.0.0.1:9090/api/v1/write)。

重启：

1	`systemctl restart categraf`

测试：

搜索cpu相关指标，可以看到搜索结果。

导入监控大盘

https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/system/dashboard.json

点击监控大盘，更多操作中选择导入监控大盘，选择导入大盘json，填入上面地址中的json，点击导入。

导入后列表会出现导入的监控大盘(linux host), 点击名称进入查看监控大盘。

导入告警规则

https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/system/alerts-linux.json

点击告警管理，点击告警规则，更多操作中选择导入告警规则，导入告警规则JSON,填入上面地址中的json，点击确定，点击关闭。

导入完成后，会出很多告警规则，如机器负载等。点击进入，选择告警接收组，如果人员组织中配置人员中的联系方式可以使用邮件、企业微信、钉钉等。

修改CPU使用 cpu_usage_idle{cpu=”cpu-total”} < 25 改为 cpu_usage_idle{cpu=”cpu-total”} < 99，这时候肯定会告警，测试联系方式可以收到。

网络监控

ping

conf/input.ping/ping.toml

1 2	`[[instances]] targets = [ "192.168.80.3" ]`

修改为需要监控的IP地址。

监控大盘：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/ping/dashboard.json

告警规则：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/ping/alerts.json

因为服务器一般关闭ping，所以看情况吧。

TCP

conf/input.net_response/net_response.toml

[[instances]]
targets = [
    "10.2.3.4:22",
    "localhost:6379",
    ":9090"
]

10.2.3.4:22 表示探测 10.2.3.4 这个机器的 22 端口是否可以连通。

localhost:6379 表示探测本机的 6379 端口是否可以连通。

:9090 表示探测本机的 9090 端口是否可以连通，没有写 IP 或主机名的就默认使用 localhost。

监控大盘：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/net_response/dashboard.json

告警规则：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/net_response/alerts.json

HTTP

conf/input.http_response/http_response.toml

[[instances]]
targets = [
    "http://localhost",
    "https://www.baidu.com"
]

监控大盘：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/http_response/dashboard.json

告警规则：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/http_response/alerts.json

nginx 监控

nginx服务需要启用http_stub_status_module模块。

可以访问到http://192.168.80.3/basic_status

/conf/input.nginx/nginx.toml

[[instances]]
## An array of Nginx stub_status URI to gather stats.
urls = [
        "http://192.168.80.3/basic_status"
]

测试： ./categraf –test –inputs nginx

监控大盘：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/nginx/dashbaords.json

mysql监控

conf/input.mysql/mysql.toml

[[instances]]
address = "127.0.0.1:3306"
username = "root"
password = "1234"

extra_status_metrics = true
extra_innodb_metrics = true
gather_processlist_processes_by_state = false
gather_processlist_processes_by_user = false
gather_schema_size = false
gather_table_size = false
gather_system_table_size = false
gather_slave_status = true

监控大盘：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/mysql/dashboard-by-ident.json

告警规则：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/mysql/alerts.json

redis监控

conf/input.redis/redis.toml

[[instances]]
address = "127.0.0.1:6379"
# username = ""
# password = ""
pool_size = 2

监控大盘：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/redis/dashboard.json

告警规则：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/redis/alerts.json

rabbitmq监控

从 rabbitmq 3.8 版本开始，就内置了 prometheus 的支持，即，如果 rabbitmq 启用了 prometheus，可以直接暴露 metrics 接口。

在rabbitmq的安装目录(sbin)下，执行rabbitmq-plugins enable rabbitmq_prometheus

启用成功的话，rabbitmq 默认会在 15692 端口起监听，访问 http://localhost:15692/metrics 即可看到符合 prometheus 协议的监控数据。

进入prometheus的安装目录，vim prometheus.yml，添加如下：

1
2
3

- job_name: 'rabbitmq'
  static_configs:
    - targets: ['192.168.80.1:15692']

在上面prometheus任务之后，要注意格式，拉到文本编辑中看是否对齐。

监控大盘：https://github.com/flashcatcloud/categraf/blob/v0.2.39/inputs/rabbitmq/dashboard.json

spring boot服务监控

在spring boot项目中添加依赖：

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

开启端点：

management:
  endpoints:
    web:
      exposure:
        include: '*'

启动spring boot项目后，访问http://192.168.80.1:8080/actuator/prometheus , 返回指标相关信息。

进入prometheus的安装目录，vim prometheus.yml，添加如下：

- job_name: 'springboot'
  metrics_path: '/actuator/prometheus'
  static_configs:
    - targets: ['192.168.80.1:8080']

没有现成的监控大盘。

总结

nightingale 使用的还是少，网上信息比较少，监控大盘没有Grafana多，Grafana官网就可以搜索监控大盘。nightingale 的告警还是比较好用的。

监控

#监控

prometheus+nightingale+categraf使用

http://hanqichuan.com/2023/09/06/监控/prometheus+nightingale+categraf使用/

作者

韩启川

发布于

2023年9月6日

许可协议

linux命令入门上一篇

java8的Optional 下一篇