20.3.2.5 Backup Rollback Planning

A focused guide to Backup Rollback Planning, connecting core concepts with practical Docker and container operations.

Backup and rollback planning in Docker environments covers two distinct but complementary concerns: preserving the data that containers produce (database contents, uploaded files, application state stored in volumes), and being able to quickly restore a previous version of the application when a deployment goes wrong. Without explicit planning for both, a failed deployment can be difficult to reverse and a data loss event may be unrecoverable.

The Two Dimensions of Recovery

Application rollback means reverting the running containers to the previously deployed image version. Because Docker images are immutable and versioned, this is fundamentally simple — stop the current containers, start them from the previous image tag. The complexity lies in managing the transition safely, especially when database schema changes are involved.

Data backup means preserving the contents of Docker volumes that hold persistent state — database files, uploaded content, application-generated data. Volumes persist independently of containers, but they are not automatically backed up. A host failure, accidental docker volume rm, or docker compose down -v permanently deletes volume data without a backup.

Image Versioning as Rollback Foundation

The prerequisite for application rollback is that every deployed image has a unique, immutable tag. Using latest as the deployment tag makes rollback impossible — there is no reference to the previous image.

The standard pattern is to tag images with the Git commit SHA or a semantic version:

# Tag with commit SHA
docker build -t registry.example.com/my-api:$(git rev-parse --short HEAD) .
docker push registry.example.com/my-api:$(git rev-parse --short HEAD)

# Also tag as latest for convenience
docker tag registry.example.com/my-api:$(git rev-parse --short HEAD) registry.example.com/my-api:latest
docker push registry.example.com/my-api:latest

With commit-SHA tags, the previous deployment's tag is always available in the registry:

registry.example.com/my-api:a1b2c3d  ← current deployment
registry.example.com/my-api:e4f5a6b  ← previous deployment
registry.example.com/my-api:g7h8i9j  ← two deployments ago

Rolling Back the Application

To roll back to the previous version:

# Stop and remove the current containers
docker stop my-api
docker rm my-api

# Start the previous image
docker run -d \
  --name my-api \
  --restart unless-stopped \
  registry.example.com/my-api:e4f5a6b

In Docker Compose, change the image tag in compose.yml and redeploy:

services:
  api:
    image: registry.example.com/my-api:e4f5a6b  # rolled back

docker compose up -d

The volume data is unaffected — only the application container image changes.

Tracking Deployment History

Maintain a deployment log that records which image tag is running in each environment at any given time. This can be as simple as a file in the repository updated by the CI/CD pipeline, or as formal as a deployment tracking system. Without this record, identifying "the previous version" under time pressure during an incident requires searching commit history and build logs.

Database Backup: Core Principle

Volume data is the application's ground truth. The image can always be rebuilt from source; volume data cannot be regenerated if lost. Backup the volume data, not the container.

For PostgreSQL:

# Backup — run pg_dump against the running database container
docker exec my-postgres pg_dump -U postgres mydb > backup-$(date +%Y%m%d-%H%M%S).sql

# Restore — pipe the backup into psql
docker exec -i my-postgres psql -U postgres mydb < backup-2024-03-15-142231.sql

The backup file is a plain SQL dump stored on the host. This file should be copied to a remote location (object storage, a backup server) since it is only as safe as the host disk.

Volume-Level Backup

For databases that do not support hot logical backups, or for file storage volumes, back up the volume directory directly:

# Stop the container to ensure filesystem consistency
docker stop my-postgres

# Back up the volume contents to a tar archive
docker run --rm \
  -v my_postgres_data:/data \
  -v $(pwd):/backup \
  alpine \
  tar czf /backup/postgres-backup-$(date +%Y%m%d).tar.gz /data

# Restart the container
docker start my-postgres

This runs a temporary Alpine container that mounts the volume and creates a tar archive. The archive is written to the current host directory. For volumes that support live backup (most modern databases), this cold backup approach (requiring a stop) can be replaced with a hot backup using the database's own backup tools.

Restore from Volume Backup

# Stop the container
docker stop my-postgres

# Restore the tar archive into the volume
docker run --rm \
  -v my_postgres_data:/data \
  -v $(pwd):/backup \
  alpine \
  sh -c "cd / && tar xzf /backup/postgres-backup-20240315.tar.gz"

# Restart the container
docker start my-postgres

Schema Migration and Rollback

Application rollbacks become complicated when the new version includes database schema migrations. A migration that adds a new column is generally safe to roll back — the old application version ignores the new column. A migration that removes a column or changes a column's type is destructive — rolling back the application may not work if the database no longer has the schema the old version expects.

The defensive practice is to maintain backward-compatible migrations:

Deploy migration that adds the new column (old and new app versions both work).
Deploy new application version (uses the new column).
Deploy cleanup migration that removes old column (only after the new version is confirmed stable).

This expand-contract pattern makes every database state compatible with both the current and previous application version, enabling safe rollback at any point.

Verifying Backup Integrity

A backup that has never been tested is not a backup — it is an assumption. Regularly test restores:

# Start a temporary database container with the backup
docker run -d \
  --name restore-test \
  -e POSTGRES_PASSWORD=test \
  postgres:15-alpine

# Wait for it to be ready
sleep 5

# Restore the backup
docker exec -i restore-test psql -U postgres < backup-2024-03-15.sql

# Verify data is present
docker exec restore-test psql -U postgres -c "SELECT count(*) FROM users;"

# Clean up
docker rm -f restore-test

Automated weekly restore tests in a staging environment provide confidence that the backup process actually produces restorable data.

Backup Storage

A backup stored on the same host as the running container provides no protection against host failure. Backups must be stored remotely:

# Copy backup to S3 (using AWS CLI)
aws s3 cp backup-2024-03-15.sql s3://my-backups/postgres/

# Or to a remote server via SSH
scp backup-2024-03-15.sql user@backup-server:/backups/postgres/

Remote storage also enables cross-region recovery — restoring a database from a backup in a different data center after a regional failure.

Rollback Decision Criteria

Define rollback criteria before deployment, not during an incident. Example criteria:

Error rate above threshold (e.g., >1% 5xx responses) within 5 minutes of deployment.
Health endpoint returning unhealthy for more than 2 consecutive checks.
Database connection errors in application logs.
Latency P99 above defined SLA threshold.

When a criterion is met, execute the documented rollback procedure immediately rather than waiting to diagnose the root cause first. Diagnosis can happen on the rolled-back stable version.