14.2.2.5 Production Secret Rotation
A focused guide to Production Secret Rotation, connecting core concepts with practical Docker and container operations.
Production secret rotation is the process of replacing a credential, such as a database password, API key, or signing key, with a new value on a recurring schedule or in response to a suspected compromise, in a way that propagates to every container relying on it without causing a service interruption during the transition.
Why rotation matters even without a known leak
A credential that never changes becomes more valuable to an attacker the longer it remains valid, since a single historical leak, whether from a misconfigured log, a former employee's access, or a compromised backup, stays exploitable indefinitely if the value behind it is never replaced. Rotation limits the usable lifetime of any given leak, turning a permanent exposure into a temporary one bounded by the rotation interval.
The core challenge: rotation without downtime
The difficulty in production rotation is not generating a new credential, it is updating every consumer of the old credential to the new one without a window where some consumers have the old value, some have the new value, and the underlying system (a database, an external API) only accepts one of them at a time. Two general strategies address this: supporting both the old and new credential simultaneously for a transition period, or performing a coordinated, fast cutover.
Dual-credential rotation
Many systems support having two valid credentials active at once specifically to make rotation safe. A database user with two valid passwords, or an API key system that allows two active keys per account, lets the rotation happen gradually:
ALTER USER app_user WITH PASSWORD 'new-password';
-- old password remains valid until explicitly revoked
docker secret create db_password_v2 - <<< "new-password"
docker service update --secret-add db_password_v2 my-api
During the transition window, some replicas read the old secret and some read the new one, and both continue to work against the database until every replica has been confirmed running with the new value, at which point the old password and the old secret object are revoked and removed.
docker service update --secret-rm db_password_v1 my-api
Coordinated cutover rotation
When dual-credential support is not available, rotation requires a coordinated, fast replacement: update the underlying system and every consumer's configuration in immediate succession, accepting a brief window where requests using the stale credential will fail until the rolling update completes:
docker compose -f docker-compose.yml -f docker-compose.production.yml up -d --force-recreate
Minimizing the duration of this window matters; a fast rolling update across replicas, combined with a retry policy in dependent services for transient authentication failures, keeps the user-visible impact small even without dual-credential support.
Automating rotation with a secrets manager
A dedicated secrets manager can automate much of the rotation process, including generating the new credential, updating the underlying system, and notifying or directly updating consumers:
vault write database/rotate-root/my-postgres-db
path "database/creds/my-api-role" {
capabilities = ["read"]
}
Dynamic secrets, where the secret manager issues a short-lived, uniquely generated credential per request rather than a long-lived static one, push rotation even further: instead of periodically rotating a single shared credential, every consumer receives its own credential that expires automatically, removing the need for a coordinated rotation event entirely.
vault read database/creds/my-api-role
Detecting which containers still use a stale credential
Before revoking an old credential, confirming that no running container still depends on it avoids an unplanned outage:
for c in $(docker ps --filter "name=my-api" --format "{{.Names}}"); do
echo "$c: $(docker exec "$c" sha256sum /run/secrets/db_password)"
done
Comparing the hash of the mounted secret file across all replicas (rather than the value itself, to avoid printing it) confirms whether every instance has actually picked up the new credential before the old one is revoked.
Rotation triggers beyond a fixed schedule
In addition to a routine schedule, certain events should trigger an immediate, out-of-cycle rotation regardless of when the next scheduled rotation was due:
- A credential was visible in a log file, error report, or terminal output.
- An employee or contractor with access to the credential's value leaves the organization or changes roles.
- A related system is compromised, even if the credential itself was not directly observed in the incident.
- A third-party vendor reports a breach affecting credentials they issued.
vault lease revoke -prefix database/creds/my-api-role/
Revoking active leases immediately, rather than waiting for natural expiration, is the appropriate response when a credential's exposure is suspected rather than merely scheduled for routine replacement.
Rotation testing
A rotation process that has never actually been executed outside of an emergency carries real risk of failing at the worst possible time. Periodically rehearsing rotation, ideally as an automated, scheduled drill rather than only a documented runbook, confirms the mechanism still works as the system evolves:
./scripts/rotate-db-password.sh --dry-run
Common mistakes
- Rotating a credential without verifying every consumer has picked up the new value before revoking the old one, causing an avoidable outage for any replica still running stale configuration.
- Treating rotation as a manual, infrequent, undocumented process, making it slow and error-prone exactly when speed matters most, during a suspected compromise.
- Never rotating credentials that have no obvious expiration pressure, leaving long-lived static secrets as the weakest link in an otherwise well-secured system.
- Failing to revoke an old credential after rotation completes, leaving it valid and exploitable indefinitely even though it is no longer in active use.
- Relying entirely on a fixed rotation schedule without also rotating immediately in response to a suspected leak, treating rotation as routine maintenance rather than also as an incident response tool.
Production secret rotation works best when the underlying system supports overlapping valid credentials during a transition window, when a secrets manager automates the generation and propagation steps, and when the rotation process itself is rehearsed often enough to be trusted under the time pressure of an actual incident.