14.3.2.4 Compose Monitoring Service
A focused guide to Compose Monitoring Service, connecting core concepts with practical Docker and container operations.
A Compose monitoring service is a dedicated metrics, logging, or alerting component defined as part of the same stack it observes, giving operators visibility into the health and behavior of every other service without depending on tooling external to the deployment itself.
Why monitoring belongs in the stack definition
Treating monitoring as a service within the Compose stack, rather than as separately managed infrastructure, keeps observability tied to the application's own deployment lifecycle: when the stack is brought up in a new environment, monitoring comes up with it automatically, already configured to watch the specific services defined alongside it.
services:
api:
image: my-api:latest
prometheus:
image: prom/prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
ports:
- "9090:9090"
grafana:
image: grafana/grafana
ports:
- "3001:3000"
volumes:
- grafana-data:/var/lib/grafana
volumes:
grafana-data:
Scraping metrics from application services
For Prometheus-style monitoring, application services need to expose a metrics endpoint that the monitoring service is configured to scrape on an interval, addressed through the same Compose service name resolution used everywhere else in the stack:
scrape_configs:
- job_name: 'api'
static_configs:
- targets: ['api:8080']
metrics_path: /metrics
const promClient = require('prom-client');
const register = new promClient.Registry();
promClient.collectDefaultMetrics({ register });
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
Because Prometheus resolves api through Docker's internal DNS the same way any other service-to-service communication in the stack would, no separate service discovery mechanism is needed for a single-host Compose deployment.
Persisting monitoring data across restarts
Metrics and dashboard configuration should survive the monitoring service's own container being recreated, the same way application data needs to survive application container recreation:
services:
prometheus:
volumes:
- prometheus-data:/prometheus
grafana:
volumes:
- grafana-data:/var/lib/grafana
volumes:
prometheus-data:
grafana-data:
Without these volumes, restarting the monitoring stack for a routine image update would discard historical metrics and any dashboards configured directly through Grafana's interface rather than provisioned as code.
Centralized log aggregation as monitoring
A monitoring service does not have to be metrics-focused; a log aggregation service serving a similar centralizing role for every other service's output is an equally common pattern within a Compose stack:
services:
api:
logging:
driver: gelf
options:
gelf-address: "udp://logging:12201"
logging:
image: graylog/graylog
ports:
- "9000:9000"
depends_on:
- mongo
- elasticsearch
Configuring every other service's logging driver to ship output to the centralized service within the same stack keeps log aggregation self-contained and consistently configured across the entire deployment.
Alerting as a connected, not standalone, concern
A monitoring service that collects metrics but never alerts anyone provides visibility only to someone actively looking at a dashboard; a production-appropriate monitoring setup should also define alerting rules and a notification path:
services:
alertmanager:
image: prom/alertmanager
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
ports:
- "9093:9093"
groups:
- name: api-alerts
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
Defining the alert rule directly alongside the monitoring service's own configuration, version-controlled with the rest of the stack, keeps alerting logic auditable and reviewable through the same process as any other change to the stack.
Resource considerations for the monitoring service itself
A monitoring service, particularly one retaining metrics history or large volumes of logs, can consume meaningful disk and memory on its own, and should have explicit resource limits and retention settings rather than being allowed to grow unbounded:
services:
prometheus:
command:
- "--storage.tsdb.retention.time=30d"
deploy:
resources:
limits:
memory: 1G
An unconfigured retention period on a metrics store can quietly consume disk space until it becomes a production incident in its own right, unrelated to whatever the monitoring service was originally meant to help diagnose.
Network access for the monitoring dashboard
The monitoring dashboard itself, often the only part of the monitoring service genuinely meant for human access, should be reachable through the same authenticated, proxied path as any other internal tool rather than published directly and unauthenticated:
services:
grafana:
networks:
- internal
# reachable only through the proxy, not published directly
proxy:
networks:
- internal
- public
Publishing a monitoring dashboard directly to the public internet without authentication is a common and easily avoidable exposure, since dashboards frequently reveal internal service names, error rates, and infrastructure details that should not be available to an unauthenticated visitor.
Common mistakes
- Running a monitoring service without persisting its own data volume, losing historical metrics and manually configured dashboards on every container recreation.
- Collecting metrics and logs without ever defining alerting rules, leaving monitoring as a passive dashboard rather than an active part of incident detection.
- Leaving retention settings unconfigured, allowing the monitoring service's own storage to grow unbounded until it becomes its own operational problem.
- Publishing a monitoring dashboard directly to the public internet without authentication, exposing internal service details unnecessarily.
- Treating monitoring as separate infrastructure managed outside the stack, losing the benefit of having it come up automatically, already configured, whenever the stack itself is deployed.
A Compose monitoring service earns its place in the stack by being deployed, configured, and version-controlled the same way as every other service it observes, persisting its own data, alerting actively rather than only displaying passively, and remaining behind the same access controls as any other internal tool within the stack.