15.2 Container Metrics
A focused guide to Container Metrics, connecting core concepts with practical Docker and container operations.
Container metrics are the numeric, time-series measurements of a container's resource consumption and runtime behavior, CPU usage, memory usage, network throughput, and disk I/O among the most fundamental, providing the quantitative complement to logs that tells an operator not just what happened but how much resource it took to happen and how that has changed over time.
The built-in docker stats interface
Docker exposes basic resource metrics for any running container directly through the daemon, without requiring any additional tooling to be installed:
docker stats my-api
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
3f29a8c1d8e2 my-api 12.4% 210MiB / 512MiB 41.0% 1.2MB / 850kB 0B / 4.1MB
This live view is useful for immediate, interactive inspection, but it is not retained as history; the moment the command stops running, that data point is gone, which is why production observability requires a persistent metrics pipeline rather than relying on docker stats alone.
What the cgroups subsystem actually measures
Docker's container metrics are sourced from the Linux kernel's cgroups subsystem, the same mechanism that enforces resource limits, which means the metrics reported are a direct reflection of the limits and accounting the kernel itself is tracking for that container's cgroup:
cat /sys/fs/cgroup/memory/docker/<container-id>/memory.usage_in_bytes
cat /sys/fs/cgroup/cpu/docker/<container-id>/cpuacct.usage
Understanding that container metrics are cgroups accounting, not an approximation or a separate monitoring layer, clarifies why they are highly accurate but also why they sometimes surprise operators used to host-level metrics: cgroups memory accounting, for instance, includes page cache usage in ways that can look different from what a process inside the container might report about its own memory consumption.
CPU metrics and the multi-core normalization question
Container CPU usage is reported as a percentage, but that percentage needs to be interpreted relative to how many CPU cores the container has access to, since a single-threaded process pegging one core at 100% looks very different on a host with 2 cores available to the container versus a host with 16:
docker run -d --cpus=2 my-api
docker stats my-api
CPU %: 95.0%
On a container limited to 2 CPUs, 95% reported usage means the container is using nearly its full allotment of 2 cores; the same raw percentage would mean something very different for a container with a different CPU limit, which is why CPU metrics should always be interpreted alongside the container's actual configured limit, not in isolation.
Memory metrics and what counts as usage
Memory usage reporting for containers includes both the application's actual working memory and, depending on the metric, page cache used for file I/O, which can make memory usage appear higher than what the application itself would report through its own internal accounting:
docker stats --no-stream --format "{{.MemUsage}}" my-api
docker exec my-api cat /proc/meminfo
A container approaching its memory limit due to page cache rather than genuine application memory pressure behaves differently under an OOM kill than one genuinely exhausting memory through application allocations, since the kernel can typically reclaim page cache space before resorting to killing the process, which is a useful distinction when investigating an unexpected restart.
Network and disk I/O metrics
Network and block I/O metrics reflect traffic and storage operations attributable specifically to the container's own network namespace and the volumes it has mounted, providing a per-container view that a host-level network or disk monitoring tool, which sees only aggregate traffic across every process and container, cannot provide on its own:
docker stats --no-stream --format "{{.NetIO}} {{.BlockIO}}" my-api
This per-container breakdown is particularly useful for identifying which specific service, among many running on the same host, is responsible for an unusual spike in overall host network or disk activity.
Exposing container metrics to a metrics pipeline
For anything beyond ad hoc inspection, container metrics need to be exported to a system capable of retaining history and supporting alerting and dashboards; cAdvisor is the most common tool specifically built for this, scraping the same cgroups data Docker itself uses and exposing it in a format Prometheus and similar systems can collect:
services:
cadvisor:
image: gcr.io/cadvisor/cadvisor
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
scrape_configs:
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
Running cAdvisor once per host, rather than instrumenting every individual application container separately for infrastructure-level metrics, is the standard approach, since this category of metric is about the container's resource consumption as observed by the runtime, not something the application itself needs to be aware of or expose on its own.
Container metrics versus application metrics
Container resource metrics answer "how much CPU, memory, and I/O did this container consume," while application-level metrics, instrumented separately inside the application code, answer "what did the application actually do," request counts, latencies, error rates, business-specific measurements. Both are necessary and complementary: container metrics alone cannot tell you why memory usage spiked, only that it did, while application metrics alone cannot tell you whether the underlying container is actually resource-constrained.
const httpRequestDuration = new promClient.Histogram({ name: 'http_request_duration_seconds' });
Common mistakes
- Interpreting CPU percentage without accounting for the container's actual configured CPU limit, leading to a misleading sense of how close to capacity it actually is.
- Treating memory usage metrics as solely reflecting application allocations, without accounting for page cache inclusion that can inflate the reported figure relative to what the application itself believes it is using.
- Relying only on
docker statsfor production observability, with no persisted history available once the interactive session ends. - Collecting only infrastructure-level container metrics without also instrumenting application-level metrics, leaving no visibility into what the container was actually doing when a resource spike occurred.
- Running cAdvisor or an equivalent metrics exporter inconsistently across hosts, leaving gaps in container-level visibility for whichever hosts were missed.
Container metrics, grounded directly in the kernel's own cgroups accounting, provide accurate, per-container resource visibility that complements rather than replaces application-level metrics, and turning that visibility into something actionable in production requires exporting it to a persistent, queryable metrics pipeline rather than relying on the interactive, non-retained view docker stats provides on its own.