16.3 Storage Troubleshooting
A focused guide to Storage Troubleshooting, connecting core concepts with practical Docker and container operations.
Storage troubleshooting covers problems related to disk space consumption, volume behavior, and the underlying storage driver Docker uses to manage image and container filesystem layers, a category of issue that frequently manifests as confusing, seemingly unrelated symptoms, a failed build, a container that cannot start, or a write that fails, all traceable back to the host simply running out of usable disk space or a storage driver behaving unexpectedly.
Starting with an overall disk usage check
Before investigating anything Docker-specific, confirming whether the host itself is genuinely low on disk space rules in or out the single most common underlying cause of a wide range of seemingly unrelated symptoms:
df -h
Filesystem Size Used Avail Use%
/dev/sda1 50G 49G 500M 99%
A host at or near full disk capacity can produce failures that look entirely unrelated to storage, a container failing to start, a build failing partway through, an application unable to write its own logs, all of which trace back to this single, underlying cause.
Breaking down Docker's own disk usage
Once host-level disk pressure is confirmed, docker system df breaks down exactly how much of that usage is attributable to Docker itself, and across which specific category, images, containers, volumes, or build cache:
docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 47 12 18.2GB 14.1GB (77%)
Containers 8 3 1.2GB 0.4GB (33%)
Local Volumes 15 5 22.8GB 9.3GB (40%)
Build Cache 142 0 8.7GB 8.7GB (100%)
This breakdown immediately identifies where the bulk of reclaimable space actually sits, build cache and unused images frequently dominate on a host that has been building and iterating on many image versions over time without periodic cleanup.
Reclaiming space safely
Several pruning commands target specific categories of unused Docker resources, and using the more targeted ones first, before reaching for a broader, more aggressive cleanup, avoids accidentally removing something still needed:
docker image prune -a
docker container prune
docker volume prune
docker builder prune
docker system prune -a --volumes
The combined system prune command with both -a and --volumes is the most aggressive option, removing all unused images, stopped containers, unused networks, and unused volumes simultaneously; running it without first understanding what it will remove, particularly the --volumes flag, which deletes unused named volumes and any data within them, risks permanently losing data that was not actually safe to discard.
Storage driver and overlay filesystem issues
Docker's default storage driver on modern Linux systems, overlay2, occasionally exhibits its own specific issues, particularly related to layer count limits, inode exhaustion, or interactions with certain underlying filesystems that do not fully support all of overlay2's expected features:
docker info --format '{{.Driver}}'
df -i
Checking inode usage specifically, separate from raw disk space, catches a less commonly checked but real failure mode, a filesystem that has plenty of free space but has exhausted its available inodes due to an extremely large number of small files, which produces "no space left on device" errors despite df -h showing available capacity.
Volume-specific storage issues
Problems specific to named or bind-mounted volumes, as distinct from general host disk pressure, include a volume's underlying storage backend running low on capacity independently of the host's own root filesystem, particularly relevant for volumes backed by separate, dedicated storage devices or networked storage:
docker volume inspect pgdata --format '{{ .Mountpoint }}'
df -h $(docker volume inspect pgdata --format '{{ .Mountpoint }}')
Checking disk usage specifically at a volume's actual mountpoint, rather than only at the host's root filesystem, is necessary when a volume is backed by separate storage that could be under its own independent capacity pressure even while the rest of the host has ample free space.
Container writable layer growth
A container that writes a substantial amount of data directly into its own writable layer, rather than to a mounted volume, can grow considerably larger over its lifetime than its base image size alone would suggest, which is worth checking directly when a specific container appears to be consuming disproportionate disk space:
docker ps -s
CONTAINER ID SIZE
3f29a8c1d8e2 1.2GB (virtual 1.4GB)
A container whose writable layer size is unexpectedly large relative to its image's own size is worth investigating for what is actually being written there, logs, temporary files, or application data that should have been directed to a volume instead, which would both control growth going forward and make that data properly persistent.
Diagnosing a sudden storage-related failure
When a previously working host suddenly begins experiencing storage-related failures with no obvious recent configuration change, checking for what specifically changed, recent image pulls, a long-running container that has been writing extensively, or a backup or log rotation process that recently stopped running, narrows down what caused the previously stable disk usage trend to suddenly cross a critical threshold:
find /var/lib/docker -mtime -1 -type f | xargs du -sh 2>/dev/null | sort -rh | head -10
Identifying the most recently modified, largest files within Docker's own data directory often points directly at whatever specific activity is responsible for a sudden change in disk usage.
Establishing routine maintenance to prevent recurrence
Rather than only addressing storage issues reactively once a host runs critically low on space, scheduling routine, automated pruning, along with monitoring and alerting on disk usage trends before they become critical, prevents the more disruptive reactive scenario from recurring:
0 2 * * 0 docker system prune -af --filter "until=168h"
A scheduled, filtered prune that only removes resources unused for longer than a week, run on a routine schedule, balances reclaiming space against the risk of removing something still in active, if infrequent, use.
Common mistakes
- Investigating an unrelated-looking failure, a build error, a container startup failure, without first checking overall host disk usage as a potential underlying cause.
- Running an aggressive
system prune --volumeswithout first understanding exactly what it will remove, risking permanent loss of data in unused but still-needed named volumes. - Checking only raw disk space usage and not inode usage, missing a less common but real cause of "no space left on device" errors despite apparently available capacity.
- Not checking disk usage at a volume's actual mountpoint when that volume is backed by separate, dedicated storage with its own independent capacity.
- Addressing storage issues only reactively after a host becomes critically low on space, rather than establishing routine, scheduled maintenance and proactive monitoring.
Storage troubleshooting starts with confirming overall host disk and inode usage, then narrowing down through docker system df's category breakdown to identify exactly what is consuming space, images, containers, volumes, or build cache, before applying the appropriately targeted cleanup, and establishing routine, scheduled maintenance prevents the same reactive crisis from recurring on a predictable cycle.