16.3.2.4 Large Log Bloat
A focused guide to Large Log Bloat, connecting core concepts with practical Docker and container operations.
Large log bloat is the specific, often surprisingly severe disk consumption pattern caused by one or more containers' log files growing far beyond expected size, frequently the single largest individual contributor to a sudden disk exhaustion incident, and locating, safely truncating, and properly fixing the underlying cause requires understanding both how to find the responsible container quickly and the specific risks of directly manipulating Docker's log files outside of Docker's own tooling.
Finding the largest log files on a host
Container log files for the default json-file driver live in predictable locations under Docker's data directory, which makes a direct filesystem search the fastest way to identify the worst offenders across an entire host at once:
find /var/lib/docker/containers -name "*-json.log" -exec du -h {} \; | sort -rh | head -10
4.2G /var/lib/docker/containers/3f29a8c1.../3f29a8c1...-json.log
180M /var/lib/docker/containers/a8b9c2d3.../a8b9c2d3...-json.log
A single log file reporting several gigabytes, especially on a host with otherwise modest log volume across its other containers, is an immediate, strong signal of exactly where to focus attention first.
Identifying which container a log file belongs to
The directory name in the path corresponds to the container's full ID, which can be matched back to a human-readable container name directly:
docker ps -a --no-trunc --format "{{.ID}} {{.Names}}" | grep 3f29a8c1
3f29a8c1d8e2... my-api
This quickly connects an oversized log file discovered through a filesystem search back to the specific, named container responsible for it, which is necessary before deciding on either an emergency mitigation or a proper, underlying fix.
Confirming the cause before truncating
Before truncating anything, a brief check of the log's actual recent content clarifies whether this is a genuine, ongoing problem (a repeating error loop, runaway debug logging) or whether it reflects legitimate, if unusually high, log volume from a brief but intense period of activity that may have already subsided:
tail -50 /var/lib/docker/containers/3f29a8c1.../3f29a8c1...-json.log
A log dominated by the exact same repeated error message, appearing many times per second, is the clearest signature of a runaway logging loop specifically, as opposed to a large but varied, legitimate log history accumulated gradually over a long uptime period.
The risk of truncating a json-file log directly
Truncating the log file directly with standard shell tools recovers disk space immediately, but doing so outside of Docker's own awareness can, depending on the specific Docker version and how docker logs tracks its read position within the file, produce confusing or inconsistent behavior the next time docker logs is used against that container:
truncate -s 0 /var/lib/docker/containers/3f29a8c1.../3f29a8c1...-json.log
This is a reasonable, pragmatic emergency measure when disk space genuinely needs to be recovered immediately, but it should be understood explicitly as a blunt, outside-of-normal-tooling intervention, not a routine practice, and is worth following up afterward by restarting the affected container if docker logs behaves unexpectedly against it post-truncation.
Restarting the container as a cleaner alternative
Restarting the container, rather than truncating its log file directly, generally produces a cleaner result, since the restart creates a genuinely fresh log file through Docker's own normal lifecycle rather than manipulating an existing one outside of Docker's awareness:
docker restart my-api
This has the obvious trade-off of actually restarting the application, with whatever brief disruption that causes, which may or may not be acceptable depending on the urgency of the disk space recovery needed versus the cost of a restart for that specific service at that specific moment.
Addressing the actual underlying cause
Whichever immediate mitigation is used, the underlying cause, almost always either a missing rotation configuration or an application bug producing excessive, repetitive logging, needs to be addressed directly to prevent the exact same situation from recurring shortly afterward:
docker inspect my-api --format '{{.HostConfig.LogConfig}}'
docker update --log-opt max-size=10m --log-opt max-file=3 my-api
If the container was running with no rotation limits configured at all, applying them, either to the specific container going forward or, more comprehensively, as a daemon-wide default for every future container, addresses the structural gap that allowed unlimited log growth in the first place.
sum(rate({container="my-api"} |= "ERROR" [5m]))
If the cause was an application-level error loop, fixing that underlying bug is the only way to actually stop the excessive log volume at its source, rather than just continuing to manage its disk consequences after the fact.
Monitoring to catch this earlier next time
Establishing monitoring specifically for individual container log file sizes, or for sudden, sharp increases in log volume, catches this category of problem while it is still a minor, easily addressed issue rather than waiting for it to escalate into a full disk exhaustion emergency:
*/15 * * * * find /var/lib/docker/containers -name "*-json.log" -size +1G -exec echo "Large log: {}" \; | mail -s "Large container log alert" ops@example.com
A simple, scheduled check like this, alerting whenever any container's log file crosses a defined size threshold, provides an early warning specifically for this pattern well before it grows large enough to threaten overall host disk capacity.
Common mistakes
- Searching for general disk usage culprits without specifically checking individual container log file sizes, missing the most common single largest contributor to a sudden disk space crisis.
- Truncating a json-file log directly without understanding this is an outside-of-normal-tooling intervention that can occasionally produce confusing
docker logsbehavior afterward. - Treating an emergency truncation or container restart as a complete fix, without separately addressing the underlying cause, missing rotation configuration or a runaway error loop, that actually produced the oversized log in the first place.
- Not establishing any proactive monitoring specifically for log file size growth, only discovering the problem reactively once it has already escalated into a broader disk exhaustion incident.
- Restarting or truncating logs for every large log file found without first checking whether the size reflects a genuine ongoing problem versus simply a long-running container with legitimately accumulated history.
Large log bloat is one of the fastest-moving and most common specific causes of severe disk pressure, and resolving it completely requires locating the specific responsible container quickly through a direct filesystem search, choosing between truncation and restart based on the situation's urgency, and, critically, fixing the actual underlying cause, missing rotation limits or a runaway logging bug, rather than only addressing the immediate disk space symptom.