14.2.3 Production Data Persistence
A focused guide to Production Data Persistence, connecting core concepts with practical Docker and container operations.
Production data persistence in Docker concerns how application state, databases, uploaded files, and other data that must outlive a container are stored so that replacing, restarting, or rescheduling a container never causes that data to disappear, since a container's own writable layer is deliberately ephemeral and unsuitable for anything that needs to survive.
Why the container layer is the wrong place for data
Every container has its own thin writable layer on top of its image's read-only layers, and that writable layer is destroyed when the container is removed. A container restarted with docker run again, rather than docker start on the same container, gets a brand new writable layer with none of the previous one's contents:
docker run -d --name my-db postgres
docker exec my-db sh -c 'echo "test" > /data/marker'
docker rm -f my-db
docker run -d --name my-db postgres
docker exec my-db cat /data/marker
The final command fails, because the second my-db container is an entirely new container with a fresh writable layer; nothing written to the first container's layer carried over.
Named volumes as the primary persistence mechanism
Docker-managed named volumes exist independently of any single container's lifecycle and are the standard mechanism for persisting application data:
docker volume create pgdata
docker run -d --name my-db -v pgdata:/var/lib/postgresql/data postgres
docker rm -f my-db
docker run -d --name my-db -v pgdata:/var/lib/postgresql/data postgres
Because both containers mount the same named volume, data written by the first container is still present when the second one starts, regardless of the fact that the containers themselves are entirely separate, disposable instances.
docker volume inspect pgdata --format '{{ .Mountpoint }}'
Bind mounts for host-managed persistence
A bind mount maps a specific host directory into the container, which is useful when the data needs to be directly accessible to other host-level tooling, such as a backup agent that does not understand Docker volumes:
docker run -d -v /srv/postgres-data:/var/lib/postgresql/data postgres
Bind mounts trade Docker's volume drivers and management commands for direct visibility into exactly where the data lives on the host filesystem, which is sometimes the deciding factor when host-level tooling needs uncomplicated access to the raw files.
Volume drivers for networked and distributed storage
In multi-host production deployments, a local named volume is tied to the specific host it was created on, which becomes a problem if the orchestrator reschedules a container onto a different host. Volume drivers that back onto networked storage (NFS, cloud block storage, distributed filesystems) solve this by making the volume's data accessible regardless of which host the container lands on:
docker volume create --driver local \
--opt type=nfs \
--opt o=addr=nfs-server.example.com,rw \
--opt device=:/exports/pgdata \
pgdata-nfs
volumes:
pgdata:
driver: rexray/ebs
driver_opts:
size: 100
Without a networked or distributed backing store, scheduling a stateful service onto a different host effectively orphans the data left behind on the original host.
Database-specific persistence considerations
Databases have stricter requirements for the consistency of what is on disk than a typical stateless application's files do. A volume backing a database needs storage with adequate write durability and, depending on the database engine, specific filesystem behavior around fsync and write ordering:
docker run -d \
-v pgdata:/var/lib/postgresql/data \
--shm-size=256mb \
postgres
Increasing shared memory (--shm-size) is a common production adjustment for database containers whose default container shared memory allocation is too small for the workload, which is unrelated to the volume itself but frequently encountered alongside persistence configuration for database containers specifically.
Backup as a complement to persistence, not a substitute
A named or bind-mounted volume protects data against container removal and replacement, but not against the underlying host's disk failing entirely, nor against accidental deletion of the volume itself. Persistence and backup solve different problems and both are needed:
docker run --rm -v pgdata:/data:ro -v "$(pwd)":/backup alpine \
tar czf /backup/pgdata-$(date +%Y%m%d).tar.gz -C /data .
A volume that has never been backed up is one host failure away from total data loss, regardless of how reliably it has persisted data across ordinary container restarts up to that point.
Avoiding accidental volume deletion
docker system prune and similar cleanup commands can remove volumes that appear unused, which is a real risk for a volume that is only ever attached when a particular container is briefly stopped for maintenance:
docker volume prune
docker system prune --volumes
Both commands remove volumes not currently attached to a running container, so running them on a host with a stopped-but-not-removed production database container can permanently delete that database's data if the cleanup happens to coincide with planned maintenance.
Common mistakes
- Writing application data into the container's writable layer with no volume at all, then losing it the next time the container is replaced for an unrelated reason such as a routine image update.
- Using a local named volume for a stateful service in a multi-host cluster without a networked storage backend, leaving the data effectively pinned to one host.
- Treating volume persistence as equivalent to a backup, with no separate process protecting against host-level disk failure or accidental volume deletion.
- Running
docker system prune --volumeson a production host without first confirming which stopped containers still reference volumes that appear unused. - Underestimating database-specific storage requirements, such as shared memory or fsync behavior, and attributing the resulting instability to the volume mechanism itself rather than the underlying configuration gap.
Production data persistence in Docker is achieved by keeping anything that must survive a container's replacement entirely out of the writable layer, choosing a volume mechanism that matches the deployment's host topology, and treating backup as a distinct, additional layer of protection rather than something the persistence mechanism alone already provides.