✦ For everyone, free.

Practical knowledge for real and everyday life

Home

16.3.1 Data Loss Problems

A focused guide to Data Loss Problems, connecting core concepts with practical Docker and container operations.

Data loss problems in Docker environments almost always trace back to a small number of well-understood causes, data written somewhere other than a persistent volume, a volume or container removed without realizing it held the only copy of something, or a backup process that was assumed to be working but was not, and recognizing which of these specific patterns occurred is the first step toward both recovering what can still be recovered and preventing a recurrence.

Data written to the container's writable layer instead of a volume

The single most common root cause of unexpected data loss is an application writing its persistent state directly into the container's own writable layer rather than to a mounted volume, which works perfectly well right up until the container is removed and replaced, an entirely routine, expected event during any deployment or restart:

docker run -d --name my-db postgres
docker rm -f my-db
docker run -d --name my-db postgres

The second my-db container starts with a completely fresh writable layer; nothing written by the first container survives, since no volume was ever configured to persist that data independently of the specific container instance.

docker run -d --name my-db -v pgdata:/var/lib/postgresql/data postgres

Confirming, immediately after deploying any stateful service, that its actual data directory is mounted to a named volume or bind mount, not left on the default writable layer, is the most important preventive check for this specific and entirely common cause.

Volume removed inadvertently

A named volume can be removed explicitly, accidentally, or through an overly broad cleanup command, and unlike a container's writable layer, a volume's removal is a deliberate, distinct operation, but one that is easy to perform unintentionally when working with cleanup commands that affect more than was actually intended:

docker volume rm pgdata
docker compose down -v

The -v flag on docker compose down removes every volume defined in the Compose file along with the containers, which is a common and easy mistake to make when intending only to stop and remove containers temporarily, not realizing the same command also discards the persistent data those containers were relying on.

docker system prune --volumes

This broader cleanup command removes every volume not currently attached to a running container, which can unexpectedly remove a volume belonging to a service that happens to be stopped for unrelated maintenance at the exact moment the prune command runs.

Recovering from an accidental volume removal

Once a volume has actually been removed, recovery depends entirely on whether a backup exists; Docker itself provides no recovery mechanism or recycle bin for a deleted volume, the underlying data is genuinely gone the moment the volume removal completes:

docker run --rm -v pgdata:/data -v "$(pwd)":/backup alpine \
  tar xzf /backup/pgdata-latest.tar.gz -C /data

If a backup exists, restoring into a freshly recreated volume is the path forward; if no backup exists, the data is permanently lost, which is precisely why backup strategy and restore testing are not optional considerations for any data a team would genuinely be upset to lose.

Bind mount path mistakes destroying host data

A bind mount targeting an unintended host path, due to a typo, an incorrect variable substitution, or a misunderstanding of the current working directory, can result in a container writing over or deleting existing host files that had nothing to do with the container's intended data at all:

docker run -v /:/app/data my-cleanup-script

A bind mount of the entire host root filesystem, perhaps intended as a narrower, more specific path that was mistyped or misconfigured, combined with a script that deletes or overwrites files within its mounted path, can cause extensive and difficult-to-reverse host-level data loss; double-checking bind mount source paths with particular care, especially for any container running with destructive or cleanup-oriented logic, is essential given the severity of this specific failure mode.

Database corruption from improper shutdown

Data loss can also occur without any volume or file being removed at all, if a database container is forcibly killed (SIGKILL, exceeding a graceful shutdown timeout) in the middle of a write operation, potentially leaving its data files in a corrupted or inconsistent state:

docker stop --time=30 my-db
docker kill my-db

Ensuring an adequate stop timeout for database containers specifically, long enough for the database's own graceful shutdown procedure to complete fully, reduces the risk of this specific category of data loss, which manifests not as missing files but as files that are present yet damaged and potentially unusable without a successful repair or restore from backup.

Detecting data loss before it becomes a crisis

Routine verification that expected data is actually present and growing as expected, rather than only discovering its absence when an application unexpectedly reports missing records, catches data loss closer to when it actually occurred, when recovery options (a more recent backup, a shorter window of lost work) are still more favorable:

docker exec my-db psql -U postgres -c "SELECT count(*) FROM orders;"

A scheduled check comparing record counts or key metrics against expected trends, alerting when an unexpected drop occurs, surfaces data loss far faster than waiting for a user or downstream process to notice missing data on their own.

The role of backup and restore testing

Every category of data loss described here is recoverable, fully or partially, if a tested, working backup exists; none of them are recoverable if no backup exists or if the existing backup process turns out, upon attempting a restore, to have been broken all along. This is why restore testing, not just backup creation, is the actual safeguard against this entire category of problem.

docker run --rm -v pgdata-restore-test:/data -v "$(pwd)":/backup alpine \
  tar xzf /backup/pgdata-latest.tar.gz -C /data

Common mistakes

  • Running a stateful service without confirming its actual data directory is mounted to a volume, leaving it vulnerable to total data loss on the next routine container replacement.
  • Using docker compose down -v or docker system prune --volumes without fully understanding which volumes will actually be removed.
  • Bind-mounting an overly broad or mistyped host path into a container with destructive logic, risking unintended host-level data loss.
  • Forcibly killing a database container without an adequate graceful shutdown timeout, risking data corruption rather than data being merely absent.
  • Maintaining a backup process without ever testing whether a restore from it actually succeeds, discovering it was broken only after data loss has already occurred.

Data loss problems are nearly always traceable to one of a small number of specific, recognizable patterns, and the actual defense against all of them is the same: ensure genuinely persistent data lives in a properly mounted volume, treat any command that removes volumes or affects broad host paths with explicit care, and maintain a backup and restore process that has actually been verified to work, rather than only assumed to.

Content in this section