Consul is a service discovery tool for allowing services or VMs to be registered and then provide dns and http interfaces to query on the state of registered services/VMs.

Prometheus is a pull based model monitoring tool by getting the metrics data via querying each targets defined in the configuration.
Start Consul
touch /etc/consul/server/config.json
{
"telemetry": {
"prometheus_retention_time": "480h",
"disable_hostname": true
}
}
This is to enable metrics via telemetry, otherwise below errors will be observed:
415 Unsupport Media Type
Prometheus is not enabled since its retention time is not positive
If config correctly, prometheus data will be returned:
curl http://127.0.0.1:8500/v1/agent/metrics\?format\=prometheus
Configuration in prometheus
add job in scrape_configs:
- job_name: consul
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: '/v1/agent/metrics'
scheme: http
param:
format: ["prometheus"]
static_configs:
- targets:
- <consulserver1>:8500
- <consulserver2>:8500
Grafana dashboard ID: 10642, then you will get the dashboard for consul:
However, the native metrics are not intuitive to design the prometheus rule, so we reroute the metrics with consul_exporter:
docker pull prom/consul-exporter
docker run -d -p 9107:9107 prom/consul-exporter — consul.server=consulserver:8500
## test
curl -s http://localhost:9107/metrics | grep consul_catalog_service_node_healthy
Add the prometheus rule
groups:- name: consul.rulesrules:- alert: ConsulServiceHealthcheckFailedexpr: consul_catalog_service_node_healthy == 0for: 1mlabels:severity: criticalannotations:summary: Consul service healthcheck failed (instance {{ $labels.instance }})description: "Service: `{{ $labels.service_name }}` Healthcheck: `{{ $labels.service_id }}`\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"- alert: ConsulMissingMasterNodeexpr: consul_raft_peers < 3for: 0mlabels:severity: criticalannotations:summary: Consul missing master node (instance {{ $labels.instance }})description: "Numbers of consul raft peers should be 3, in order to preserve quorum.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"- alert: ConsulAgentUnhealthyexpr: consul_health_node_status{status="critical"} == 1for: 0mlabels:severity: criticalannotations:summary: Consul agent unhealthy (instance {{ $labels.instance }})description: "A Consul agent is down\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
Add the exporter job to prometheus.yml. we are ready to receive the alerts from consul!