15.1.1.2 Container Log Collection

A focused guide to Container Log Collection, connecting core concepts with practical Docker and container operations.

Container log collection is the architecture and tooling used to gather log output from many containers across one or more hosts and deliver it to a centralized destination, a distinct concern from how an individual container's logs are captured or formatted, focused instead on the collection topology itself: where collector agents run, how they discover new containers, and how reliably they deliver what they collect.

Node-level collection agents

The most common collection architecture runs one collector agent per host, reading the log files Docker's json-file driver already writes to disk, rather than requiring every individual container to push its own logs to a remote destination directly:

services:
  log-collector:
    image: fluent/fluent-bit
    volumes:
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    command: ["/fluent-bit/bin/fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.conf"]

[INPUT]
    Name tail
    Path /var/lib/docker/containers/*/*.log
    Parser docker

This pattern means individual application containers need no logging configuration of their own beyond writing to stdout and stderr; the collector running once per host is solely responsible for finding, reading, and forwarding every container's log file on that host.

Container metadata enrichment

A node-level collector reading raw json-file logs from disk only sees the log content itself, without context about which container, image, or Compose service produced it, unless it explicitly enriches that data by querying the Docker API:

[FILTER]
    Name docker
    Match *
    Docker_Mode On

docker inspect --format '{{.Config.Labels}}' my-api

Enrichment filters that attach container name, image, and any Compose-assigned labels to each log line before forwarding are what make centralized logs genuinely searchable by service or deployment, rather than requiring an operator to separately cross-reference container IDs against docker ps output to figure out what produced a given log line.

Native logging drivers as an alternative collection path

Rather than reading log files from disk, Docker's logging drivers can push log output directly to a remote destination as it is produced, removing the node-level collector entirely from the path for containers configured this way:

services:
  api:
    logging:
      driver: syslog
      options:
        syslog-address: "udp://logs.example.com:514"

services:
  api:
    logging:
      driver: fluentd
      options:
        fluentd-address: "fluentd-host:24224"
        tag: "docker.{{.Name}}"

This approach avoids the dependency on log files persisting on local disk, which matters in particular for ephemeral or frequently recreated hosts, but it also means docker logs against that specific container locally is no longer available, since the output has been redirected entirely to the remote driver rather than retained locally as well.

Sidecar collection for per-application customization

Instead of a single, shared node-level collector, a sidecar container running alongside each application container provides per-application customization of collection behavior, at the cost of running one additional container for every application container being observed:

services:
  api:
    volumes:
      - api-logs:/var/log/app
  log-shipper:
    image: vector
    volumes:
      - api-logs:/var/log/app:ro
    command: ["vector", "--config", "/etc/vector/api.toml"]

volumes:
  api-logs:

This pattern is more resource-intensive than a single shared node-level collector, but is useful when different applications genuinely need very different parsing rules, retention policies, or destinations that a single shared collector configuration could not reasonably express.

Handling collection reliability and backpressure

A collection pipeline needs to handle the case where the downstream destination is temporarily unreachable, buffering output locally rather than dropping it or, worse, blocking the application container's own execution while waiting for a slow or unavailable log destination:

[OUTPUT]
    Name forward
    Match *
    Host logging-aggregator
    Retry_Limit 5
    storage.total_limit_size 1G

A collector configured with bounded local buffering and a retry policy can absorb a brief outage of the downstream aggregation system without losing log data or putting backpressure on the applications being observed, up to the configured buffer size.

Collecting logs from short-lived containers

Containers that run briefly, a one-off migration job or a scheduled batch task, can exit and be removed before a polling-based collector has a chance to read their full log output, which is a particular risk for collection architectures that scan for new log files on an interval rather than reacting to container lifecycle events directly:

docker run --rm my-api npm run migrate 2>&1 | tee -a /var/log/migrations/migrate-$(date +%Y%m%d).log

Explicitly capturing output from short-lived, one-off container runs through a method that does not depend on the standard collection pipeline's polling interval, such as redirecting output directly during the run itself, is a reasonable safeguard for this specific case.

Scaling collection across many hosts

As the number of hosts grows, a tiered collection architecture, lightweight per-host agents forwarding to a smaller number of regional or central aggregators rather than every individual agent connecting directly to the final storage backend, generally scales better and is easier to operate:

[ host agents ] → [ regional aggregator ] → [ central log storage ]

This tiering also provides a natural point for filtering, sampling, or pre-aggregating high-volume logs before they reach the most expensive, long-term storage tier, which matters for controlling cost as log volume grows with the number of containers and hosts being observed.

Common mistakes

Running a node-level collector without enriching log lines with container metadata, leaving centralized logs difficult to attribute to a specific service or deployment.
Choosing a native push-based logging driver without realizing it disables local docker logs access for that container.
Deploying per-application sidecar collectors uniformly when a single shared node-level collector would have been sufficient and considerably less resource-intensive.
Configuring a collector with no local buffering, causing log loss during any brief outage of the downstream aggregation destination.
Relying entirely on polling-based collection for short-lived, one-off container runs that may exit before the next poll interval captures their output.

Container log collection architecture should be chosen based on host count, the diversity of applications being observed, and reliability requirements: a shared node-level collector reading enriched, locally-buffered log files is the right default for most deployments, with sidecars and native push drivers reserved for cases with genuinely different per-application needs or constraints that the shared default cannot accommodate.