15.1.3.2 Excessive Log Size

A focused guide to Excessive Log Size, connecting core concepts with practical Docker and container operations.

Excessive log size in a Docker context refers to a container's log output growing large enough to consume meaningful disk space, network bandwidth, or aggregation system cost, and addressing it effectively means first identifying whether the volume is a genuine signal worth investigating or simply unbounded growth that needs to be capped, since the right fix differs depending on which is actually happening.

Identifying where the volume is coming from

Before applying a fix, confirming which container and which specific log lines are actually responsible for excessive volume avoids treating a narrow problem with a broad, unnecessary solution:

du -sh /var/lib/docker/containers/*/*.log | sort -rh | head -10

docker logs --since 1h my-api | wc -l

Comparing log line counts across containers over the same time window quickly identifies whether one specific service is disproportionately responsible, which is the common case, rather than every container contributing roughly equally to overall volume.

Distinguishing useful verbosity from noise

A high volume of genuinely useful, structured log lines is a different problem from a high volume of repetitive, low-value noise; the appropriate fix is different in each case, reducing noise at the source versus managing volume that is actually worth retaining:

logger.debug('Cache miss', { key }); // potentially very high volume, low individual value

logger.info('Payment processed', { orderId, amount }); // lower volume, high individual value

Auditing which log statements fire most frequently, and questioning whether each one genuinely needs to exist at its current level, is usually more effective than uniformly suppressing volume across the board, which risks losing genuinely valuable signal along with the noise.

Reducing volume at the application level

The most direct fix for excessive log size is reducing what the application logs in the first place, particularly debug-level statements left enabled in production or per-item logging inside loops that scale with data volume rather than with request volume:

const logger = require('pino')({ level: process.env.LOG_LEVEL || 'info' });

logger.info('Batch processed', { itemCount: items.length }); // instead of one line per item

Lowering the default log level in production, while keeping debug-level logging available through a configuration change rather than a code change, retains the ability to temporarily increase verbosity during an actual investigation without paying the cost of that verbosity continuously.

Sampling high-volume, low-severity output

For log volume that is genuinely useful in aggregate but too costly to retain at full fidelity, sampling a representative fraction while retaining all higher-severity output preserves diagnostic value while controlling size:

[FILTER]
    Name throttle
    Match api.debug
    Rate 100
    Window 60

Applying sampling selectively to specific, identified high-volume streams, rather than uniformly across all log levels, ensures error and warning-level output, which is usually lower volume and higher value, is never affected by a sampling rule intended for routine, high-frequency events.

Bounding local storage regardless of root cause

Independent of fixing the underlying volume, bounding local log storage protects against disk exhaustion while the underlying cause is being addressed:

docker run -d --log-opt max-size=10m --log-opt max-file=3 my-api

{
  "log-driver": "local",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

This is a necessary safeguard but not a real fix for excessive size on its own; it limits the damage of unbounded growth without addressing why the volume is high in the first place, which means rotation alone can mask a problem (an application logging far more than expected) that would otherwise have been a visible signal of something worth investigating.

Cost implications for centralized aggregation

For logs forwarded to a centralized, often usage-priced aggregation system, excessive volume translates directly into cost, which makes addressing the root cause more valuable than it would be for a setup that only stores logs locally with no usage-based pricing:

aws logs put-retention-policy --log-group-name my-app-logs --retention-in-days 14

Reviewing per-service log volume against the aggregation system's billing model periodically surfaces whether a recent change, a new feature, a misconfigured retry loop, an accidentally-enabled debug flag, introduced a volume increase that was not deliberate and is now driving unnecessary cost.

Excessive volume as an early warning signal

A sudden, unexplained spike in log volume is sometimes the first visible symptom of a different underlying problem entirely, a retry loop stuck retrying far more aggressively than intended, an error condition being hit repeatedly, or a dependency failing in a way that triggers continuous error logging:

docker logs --since 10m my-api | grep -c ERROR

Treating an unexpected log volume spike as worth investigating in its own right, rather than only as a storage problem to suppress, occasionally surfaces a real, ongoing issue that the volume increase was actually reporting.

Common mistakes

Applying rotation or sampling to mask excessive volume without ever investigating whether the volume itself indicates an underlying problem worth fixing.
Leaving debug-level logging enabled in production by default, rather than making verbosity adjustable without a code change.
Sampling uniformly across all severity levels instead of selectively targeting specific, identified high-volume, low-value log statements.
Not reviewing aggregation system costs against per-service log volume, missing an unintentional and ongoing cost increase caused by a volume spike.
Treating a sudden volume increase purely as a storage nuisance rather than as a potential early signal of a retry loop, repeated error, or other ongoing issue.

Addressing excessive log size effectively starts with identifying the actual source and nature of the volume, reduces or samples it deliberately at the application level where the volume is genuinely low-value, and uses storage bounds as a safeguard rather than a substitute for understanding and, where appropriate, fixing whatever is actually generating an unexpectedly large amount of log output.