14.2.3.5 Migration Data Handling

A focused guide to Migration Data Handling, connecting core concepts with practical Docker and container operations.

Migration data handling for Docker production deployments covers how database schema changes are applied safely to running, containerized services, ensuring that a migration runs exactly once, in the correct order, before the application code depending on it starts serving traffic, and without corrupting data if the migration itself fails partway through.

Why migrations need a different lifecycle than the application

An application container is designed to be started and stopped repeatedly, often with multiple replicas running simultaneously. A database migration, by contrast, must run exactly once per schema version, regardless of how many application replicas are starting up at the same time. Running migration logic directly inside every application container's startup sequence creates a race condition where multiple replicas could attempt to apply the same migration concurrently.

// Risky: every replica's startup tries to run this
await runMigrations();
await startServer();

If several replicas start simultaneously during a rolling deployment, each one racing to run the same migration can produce conflicting schema changes, duplicate data, or a migration tool reporting an error because another replica's transaction already holds a lock on the same table.

Running migrations as a separate, one-off task

The more reliable pattern runs the migration as a distinct, single-execution step, separate from the long-running application containers entirely, and only starts (or rolls out) the application containers after that step completes successfully:

docker run --rm --env-file production.env my-api npm run migrate
docker compose -f docker-compose.yml -f docker-compose.production.yml up -d

deploy_production:
  script:
    - docker run --rm --env-file production.env my-api npm run migrate
    - docker compose -f docker-compose.yml -f docker-compose.production.yml up -d

Because this migration step runs as a single container invocation rather than as part of every replica's own startup, there is no concurrency to coordinate; exactly one process attempts the migration, and the deployment only proceeds to the application rollout if that process exits successfully.

Migration locking for additional safety

Even with a separate migration step, defense in depth against accidental concurrent execution (a second deployment triggered manually while one is already in progress, for instance) is worth having. Most migration tools support or can be wrapped with an advisory lock that prevents two migration runs from executing simultaneously against the same database:

SELECT pg_advisory_lock(123456);
-- run migration
SELECT pg_advisory_unlock(123456);

docker run --rm my-api npx node-pg-migrate up --lock-timeout 30000

A migration tool that acquires a database-level lock before applying changes, and fails fast if it cannot acquire that lock within a reasonable timeout, converts an accidental double-run from a data corruption risk into a clean, visible failure.

Backward-compatible migrations for zero-downtime deployment

In a rolling deployment, there is a window where old and new application code run simultaneously against the same database, since replicas are updated one at a time rather than all at once. A migration applied before the rollout completes must not break the old code that is still running against the same schema during that window:

-- Safe: adding a nullable column does not break code unaware of it
ALTER TABLE users ADD COLUMN display_name VARCHAR(255);

-- Unsafe to apply before old code is fully retired: removes a column old code still reads
ALTER TABLE users DROP COLUMN legacy_name;

The common pattern for handling a column removal safely across a rolling deployment is splitting it into multiple deploys: first deploy code that stops using the column while the column still exists, then in a later, separate deployment, drop the now-unused column.

Handling migration failure mid-run

A migration that fails partway through should leave the database in a known, recoverable state rather than a partially applied one. Wrapping schema changes in a transaction, where the database engine supports transactional DDL, ensures a failed migration rolls back cleanly rather than leaving some changes applied and others not:

BEGIN;
ALTER TABLE orders ADD COLUMN fulfillment_status VARCHAR(50);
UPDATE orders SET fulfillment_status = 'pending' WHERE fulfillment_status IS NULL;
COMMIT;

docker run --rm my-api npm run migrate || (echo "Migration failed, aborting deployment" && exit 1)

The deployment pipeline should treat a non-zero exit from the migration step as a hard stop, never proceeding to roll out application code that expects a schema change which did not actually complete.

Large data migrations and locking concerns

A migration that needs to backfill or transform a large volume of existing data presents a different risk than a pure schema change: a long-running UPDATE across millions of rows can hold locks that block ordinary application traffic for the duration:

UPDATE large_table SET status = 'archived' WHERE created_at < '2023-01-01';

-- Process in smaller batches to avoid holding a long-running lock
UPDATE large_table SET status = 'archived'
WHERE id IN (SELECT id FROM large_table WHERE status IS NULL AND created_at < '2023-01-01' LIMIT 1000);

Batching a large backfill into many smaller transactions, run as a separate, monitored job rather than as part of the deployment's own migration step, keeps individual lock durations short enough that ordinary application queries are not blocked for an extended period.

Testing migrations against a production-representative dataset

A migration that runs cleanly against a small development database can behave very differently against production data volume and shape. Running migrations against a staging environment seeded with a realistic, representative dataset before they ever reach production catches performance and locking issues early:

docker run --rm --env-file staging.env my-api npm run migrate

Common mistakes

Running migration logic inside every application replica's own startup sequence, creating a race condition during rolling deployments with multiple replicas starting concurrently.
Applying a migration that removes or renames a column still used by the previous version of the application code still running during a rolling deployment.
Running a large data backfill as a single, unbatched transaction that holds locks long enough to noticeably degrade application performance.
Allowing the deployment pipeline to proceed to the application rollout even when the migration step itself failed.
Testing migrations only against small development datasets, missing locking or performance problems that only appear at production data volume.

Reliable migration data handling treats the migration as a distinct, single-execution step gated ahead of the application rollout, designs schema changes to remain backward-compatible during the rolling deployment window, and batches large data transformations to avoid holding locks long enough to affect live traffic.