14.2.1.4 Production Config Validation
A focused guide to Production Config Validation, connecting core concepts with practical Docker and container operations.
Production configuration validation is the practice of verifying, before and immediately after a deployment, that the values, secrets, and settings a container actually received are complete, correctly typed, and consistent with what the running service requires, rather than discovering a missing or malformed value only after it causes a failure in front of real traffic.
Why validation has to happen at multiple points
Configuration for a production container passes through several stages where it can silently break: the value can be missing from the deployment manifest, present but malformed, present and well-formed but pointing at the wrong target, or correct at deploy time but later changed without the running container being restarted to pick it up. A validation strategy that only checks one of these stages will still let the others through.
Pre-deployment validation
Before a container starts, the values intended for it can be checked against a schema without ever running the application itself:
docker run --rm --env-file production.env alpine:3 \
sh -c '[ -n "$DATABASE_URL" ] && [ -n "$JWT_SECRET" ] && echo OK || (echo MISSING; exit 1)'
# config-schema.yaml
required:
- DATABASE_URL
- JWT_SECRET
- LOG_LEVEL
pattern:
DATABASE_URL: "^postgres://"
LOG_LEVEL: "^(debug|info|warn|error)$"
A small schema file checked by a validation script in the deployment pipeline catches a missing or malformed value before it ever reaches a running container, which is considerably cheaper than catching it after the container has already started serving requests incorrectly.
Startup-time validation inside the application
The application itself is the most reliable place to validate its own configuration, since it knows exactly what it needs and can fail immediately and loudly rather than starting in a partially functional state:
const required = ['DATABASE_URL', 'JWT_SECRET', 'LOG_LEVEL'];
const missing = required.filter((key) => !process.env[key]);
if (missing.length > 0) {
console.error(`Missing required configuration: ${missing.join(', ')}`);
process.exit(1);
}
if (!/^postgres:\/\//.test(process.env.DATABASE_URL)) {
console.error('DATABASE_URL must be a postgres connection string');
process.exit(1);
}
Exiting immediately with a non-zero status on invalid configuration, rather than logging a warning and continuing, ensures the container orchestrator's restart and health-check logic treats the failure correctly instead of letting a misconfigured instance sit in rotation.
Verifying the effective configuration of a running container
After deployment, the configuration a container actually received can be inspected directly, which catches mistakes in how the deployment tooling assembled the final value set:
docker exec my-api env | sort
docker inspect my-api --format '{{json .Config.Env}}' | jq .
Comparing this output against the expected list, ideally automated as part of a post-deployment check rather than performed manually, confirms that the override layering (base file, environment-specific file, secrets) produced the intended final result rather than an unintended one.
Validating secrets without exposing them
Secret values should be validated for presence and basic shape without ever printing their actual contents into logs or terminal output:
docker exec my-api sh -c '[ -s /run/secrets/db_password ] && echo "secret present" || echo "secret missing"'
docker exec my-api sh -c 'wc -c < /run/secrets/jwt_signing_key'
Checking that a secret file exists and has a plausible length is usually sufficient to catch the common failure mode, an empty or missing secret mount, without ever needing to reveal the secret's actual value during validation.
Health checks as continuous configuration validation
A health check endpoint that performs a real dependency check, rather than only confirming the process is alive, effectively performs configuration validation continuously rather than only at deploy time:
app.get('/healthz', async (req, res) => {
try {
await db.query('SELECT 1');
res.status(200).json({ status: 'ok' });
} catch (err) {
res.status(503).json({ status: 'degraded', reason: 'database unreachable' });
}
});
HEALTHCHECK --interval=15s --timeout=5s --retries=3 \
CMD curl -f http://localhost:3000/healthz || exit 1
If a configuration value such as a database hostname is wrong, this kind of health check fails immediately and visibly, rather than only surfacing the problem the first time a real user request touches that dependency.
Validating across replicas
In a multi-replica deployment, configuration validation should confirm every replica received the same effective configuration, since a partial or staggered rollout can leave some replicas running with stale or inconsistent values:
for c in $(docker ps --filter "name=my-api" --format "{{.Names}}"); do
echo "$c: $(docker exec "$c" printenv APP_VERSION)"
done
A divergence in this output across replicas, beyond what a deliberate rolling update would briefly produce, is a strong signal of a configuration propagation problem worth investigating immediately.
Automating validation as a deployment gate
The most effective configuration validation strategy treats it as a gate the deployment pipeline cannot proceed past, rather than a check an operator remembers to run manually:
deploy_production:
script:
- ./scripts/validate-config.sh production.env
- docker compose -f docker-compose.yml -f docker-compose.production.yml up -d
- ./scripts/post-deploy-validate.sh
Common mistakes
- Validating configuration only at deploy time and never re-checking the effective configuration of containers that have been running for a long period, missing drift introduced by manual changes.
- Logging a warning for missing configuration and continuing to start anyway, allowing a misconfigured instance to serve traffic in a degraded or incorrect state.
- Printing secret values during validation for convenience, defeating the purpose of keeping them out of logs and terminal history.
- Writing a health check that only confirms the process is alive without checking the dependencies that configuration values actually point at, missing the exact class of failure configuration validation exists to catch.
- Performing validation manually and inconsistently rather than as an automated, mandatory step in the deployment pipeline.
Production configuration validation works best as a layered defense: a pre-deployment schema check, startup-time validation inside the application that fails fast and loudly, and a continuous, dependency-aware health check that keeps verifying the configuration remains correct for as long as the container keeps running.