✦ For everyone, free.

Practical knowledge for real and everyday life

Home

15.2 Container Metrics

A focused guide to Container Metrics, connecting core concepts with practical Docker and container operations.

Container metrics are the numeric, time-series measurements of a container's resource consumption and runtime behavior, CPU usage, memory usage, network throughput, and disk I/O among the most fundamental, providing the quantitative complement to logs that tells an operator not just what happened but how much resource it took to happen and how that has changed over time.

The built-in docker stats interface

Docker exposes basic resource metrics for any running container directly through the daemon, without requiring any additional tooling to be installed:

docker stats my-api
CONTAINER ID   NAME     CPU %   MEM USAGE / LIMIT   MEM %   NET I/O          BLOCK I/O
3f29a8c1d8e2   my-api   12.4%   210MiB / 512MiB     41.0%   1.2MB / 850kB    0B / 4.1MB

This live view is useful for immediate, interactive inspection, but it is not retained as history; the moment the command stops running, that data point is gone, which is why production observability requires a persistent metrics pipeline rather than relying on docker stats alone.

What the cgroups subsystem actually measures

Docker's container metrics are sourced from the Linux kernel's cgroups subsystem, the same mechanism that enforces resource limits, which means the metrics reported are a direct reflection of the limits and accounting the kernel itself is tracking for that container's cgroup:

cat /sys/fs/cgroup/memory/docker/<container-id>/memory.usage_in_bytes
cat /sys/fs/cgroup/cpu/docker/<container-id>/cpuacct.usage

Understanding that container metrics are cgroups accounting, not an approximation or a separate monitoring layer, clarifies why they are highly accurate but also why they sometimes surprise operators used to host-level metrics: cgroups memory accounting, for instance, includes page cache usage in ways that can look different from what a process inside the container might report about its own memory consumption.

CPU metrics and the multi-core normalization question

Container CPU usage is reported as a percentage, but that percentage needs to be interpreted relative to how many CPU cores the container has access to, since a single-threaded process pegging one core at 100% looks very different on a host with 2 cores available to the container versus a host with 16:

docker run -d --cpus=2 my-api
docker stats my-api
CPU %: 95.0%

On a container limited to 2 CPUs, 95% reported usage means the container is using nearly its full allotment of 2 cores; the same raw percentage would mean something very different for a container with a different CPU limit, which is why CPU metrics should always be interpreted alongside the container's actual configured limit, not in isolation.

Memory metrics and what counts as usage

Memory usage reporting for containers includes both the application's actual working memory and, depending on the metric, page cache used for file I/O, which can make memory usage appear higher than what the application itself would report through its own internal accounting:

docker stats --no-stream --format "{{.MemUsage}}" my-api
docker exec my-api cat /proc/meminfo

A container approaching its memory limit due to page cache rather than genuine application memory pressure behaves differently under an OOM kill than one genuinely exhausting memory through application allocations, since the kernel can typically reclaim page cache space before resorting to killing the process, which is a useful distinction when investigating an unexpected restart.

Network and disk I/O metrics

Network and block I/O metrics reflect traffic and storage operations attributable specifically to the container's own network namespace and the volumes it has mounted, providing a per-container view that a host-level network or disk monitoring tool, which sees only aggregate traffic across every process and container, cannot provide on its own:

docker stats --no-stream --format "{{.NetIO}} {{.BlockIO}}" my-api

This per-container breakdown is particularly useful for identifying which specific service, among many running on the same host, is responsible for an unusual spike in overall host network or disk activity.

Exposing container metrics to a metrics pipeline

For anything beyond ad hoc inspection, container metrics need to be exported to a system capable of retaining history and supporting alerting and dashboards; cAdvisor is the most common tool specifically built for this, scraping the same cgroups data Docker itself uses and exposing it in a format Prometheus and similar systems can collect:

services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
scrape_configs:
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Running cAdvisor once per host, rather than instrumenting every individual application container separately for infrastructure-level metrics, is the standard approach, since this category of metric is about the container's resource consumption as observed by the runtime, not something the application itself needs to be aware of or expose on its own.

Container metrics versus application metrics

Container resource metrics answer "how much CPU, memory, and I/O did this container consume," while application-level metrics, instrumented separately inside the application code, answer "what did the application actually do," request counts, latencies, error rates, business-specific measurements. Both are necessary and complementary: container metrics alone cannot tell you why memory usage spiked, only that it did, while application metrics alone cannot tell you whether the underlying container is actually resource-constrained.

const httpRequestDuration = new promClient.Histogram({ name: 'http_request_duration_seconds' });

Common mistakes

  • Interpreting CPU percentage without accounting for the container's actual configured CPU limit, leading to a misleading sense of how close to capacity it actually is.
  • Treating memory usage metrics as solely reflecting application allocations, without accounting for page cache inclusion that can inflate the reported figure relative to what the application itself believes it is using.
  • Relying only on docker stats for production observability, with no persisted history available once the interactive session ends.
  • Collecting only infrastructure-level container metrics without also instrumenting application-level metrics, leaving no visibility into what the container was actually doing when a resource spike occurred.
  • Running cAdvisor or an equivalent metrics exporter inconsistently across hosts, leaving gaps in container-level visibility for whichever hosts were missed.

Container metrics, grounded directly in the kernel's own cgroups accounting, provide accurate, per-container resource visibility that complements rather than replaces application-level metrics, and turning that visibility into something actionable in production requires exporting it to a persistent, queryable metrics pipeline rather than relying on the interactive, non-retained view docker stats provides on its own.

Content in this section