14.1.3.5 Host Backup Strategy

A focused guide to Host Backup Strategy, connecting core concepts with practical Docker and container operations.

A host backup strategy for a Docker production environment defines what state on the host must survive a hardware failure, accidental deletion, or catastrophic misconfiguration, and how that state is captured, stored, and restored independently of the containers themselves, which are treated as disposable and rebuildable from images.

What actually needs backing up

Containers are designed to be ephemeral; the container layer itself, by convention, should not be the target of a backup strategy. What does need protection is everything that represents state the system cannot regenerate from a Dockerfile or a registry pull:

Named volumes containing databases, uploaded files, or other persistent application state.
Bind-mounted host directories that hold configuration, certificates, or data outside the Docker-managed volume store.
The Docker daemon configuration itself (/etc/docker/daemon.json).
Compose files, Swarm stack definitions, or other deployment manifests that describe how to reconstruct the running topology.
Secrets and environment files required to bring services back up with the correct credentials.

Container images are intentionally excluded from this list under normal circumstances, since they should always be reproducible from source and a registry; if an image cannot be rebuilt or repulled, that is a build pipeline gap to fix rather than a backup gap to paper over.

Locating Docker-managed volumes on the host

Named volumes live under the daemon's data root, by default /var/lib/docker/volumes/, and can be enumerated and inspected directly:

docker volume ls

docker volume inspect pgdata

docker volume inspect pgdata --format '{{ .Mountpoint }}'

The mountpoint reported is the actual path on the host where the volume's data lives, and is the path that a host-level backup tool needs to capture.

Backing up a volume with a throwaway container

A common, portable approach to volume backup avoids touching the host filesystem path directly and instead uses a temporary container to archive the volume's contents:

docker run --rm \
  -v pgdata:/data:ro \
  -v "$(pwd)":/backup \
  alpine \
  tar czf /backup/pgdata-$(date +%Y%m%d).tar.gz -C /data .

This pattern works identically regardless of the underlying storage driver, since it operates through the volume mount rather than assuming a particular on-disk layout.

Restoring a volume from a backup archive

Restoring follows the inverse of the backup pattern, extracting the archive into a freshly created (or existing, emptied) volume:

docker volume create pgdata
docker run --rm \
  -v pgdata:/data \
  -v "$(pwd)":/backup \
  alpine \
  tar xzf /backup/pgdata-20240101.tar.gz -C /data

For database volumes specifically, a filesystem-level backup of the data directory while the database is running can produce an inconsistent snapshot unless the storage layer guarantees crash-consistent point-in-time copies. Application-aware backups (pg_dump, mysqldump, or the database's native backup tooling) are generally more reliable than raw filesystem copies for databases, and are often layered on top of, rather than replacing, the volume-level backup.

docker exec my-postgres pg_dump -U postgres mydb > mydb-$(date +%Y%m%d).sql

Backing up bind mounts

Bind-mounted directories are ordinary host paths and can be captured with standard host backup tooling without any Docker-specific handling, since Docker is not managing their storage location:

tar czf /backups/app-config-$(date +%Y%m%d).tar.gz /srv/app/config

rsync -a /srv/app/data/ backup-host:/backups/app-data/

Backing up the Compose or stack definition

The deployment manifest itself should be version-controlled rather than only backed up as a file on disk, since version control provides both backup and a history of how the running configuration evolved over time:

git add docker-compose.yml .env.example
git commit -m "Update production compose definition"

Actual secret values should not be committed to version control; they belong in a secrets manager or an encrypted backup separate from the manifest that references them.

Automating the backup schedule

A scheduled job, typically driven by cron or a systemd timer on the host, should run the backup commands on a defined interval and ship the resulting archives off the host entirely, since a backup stored only on the same disk as the data it protects does not protect against host-level failure:

0 2 * * * /usr/local/sbin/backup-docker-volumes.sh >> /var/log/docker-backup.log 2>&1

aws s3 cp /backups/pgdata-20240101.tar.gz s3://my-backup-bucket/docker/

Verifying restorability

A backup that has never been restored in practice is unverified, and the failure mode for an unverified backup is typically discovered at the worst possible time. A periodic restore drill, ideally automated, should provision a fresh volume from the latest backup archive and confirm the application starts correctly against it:

docker volume create pgdata-restore-test
docker run --rm -v pgdata-restore-test:/data -v "$(pwd)":/backup alpine \
  tar xzf /backup/pgdata-latest.tar.gz -C /data
docker run --rm -v pgdata-restore-test:/var/lib/postgresql/data postgres:16 \
  pg_isready

Common mistakes

Backing up the container's writable layer instead of its volumes, capturing transient state while missing the actual persistent data, or capturing nothing useful at all if the application correctly stores state only in a volume.
Taking raw filesystem snapshots of a live database's data directory without using the database's own consistent-backup mechanism, producing archives that may fail to restore cleanly.
Storing backup archives on the same host and disk as the original data, so a single disk failure destroys both the data and its backup simultaneously.
Never test-restoring a backup, leaving the actual recoverability of the strategy unverified until a real incident forces the first attempt.
Omitting the Compose file, environment configuration, or secrets from the backup plan, leaving an operator with data but no record of how to reconstruct the services that used it.

A sound host backup strategy for Docker production treats named volumes and bind-mounted state as the assets requiring protection, uses container-based or application-native tooling to produce consistent backups, ships those backups off the host on a schedule, and periodically proves restorability rather than assuming it.