15.3.2.5 Health Restart Interaction
A focused guide to Health Restart Interaction, connecting core concepts with practical Docker and container operations.
Health restart interaction is the relationship, often assumed to be more direct and automatic than it actually is, between a container's health check status and Docker's restart policy mechanism, and understanding exactly how (and how little) these two systems connect by default is essential for building reliable, automated recovery from an unhealthy container rather than assuming Docker handles this connection on its own.
Restart policies are triggered by process exit, not health status
Docker's restart policies, no, on-failure, always, and unless-stopped, all key their behavior off the container's main process exiting, not off the health check status field directly:
docker run -d --restart=on-failure my-api
A container that becomes unhealthy while its main process continues running, the health check is failing, but nothing has caused the process itself to exit, will not be restarted by this policy at all, since nothing about an unhealthy status, on its own, causes a process exit; the two signals are tracked independently by the daemon.
How an application can bridge this gap itself
The most direct way to connect health status to an actual restart is having the application itself exit when it detects a sustained internal failure condition, converting the health problem into a process exit that the restart policy can then act on directly:
let consecutiveHealthFailures = 0;
async function checkHealth() {
const ok = await checkDependencies();
consecutiveHealthFailures = ok ? 0 : consecutiveHealthFailures + 1;
if (consecutiveHealthFailures >= 5) {
console.error('Sustained health failure, exiting for restart');
process.exit(1);
}
}
docker run -d --restart=on-failure my-api
This pattern effectively duplicates some of the health check's own failure-detection logic inside the application, but it closes the gap directly: the application's own decision to exit is what triggers the restart policy, rather than relying on Docker's external health check status to somehow cause that exit on its own, which it does not.
Using an external watcher process
An alternative that does not require modifying the application itself is a separate process or script that watches Docker's health status and explicitly issues a restart command when a container transitions to unhealthy:
docker events --filter event=health_status | while read -r event; do
container=$(echo "$event" | grep -oP '(?<=container=)\S+')
status=$(docker inspect --format='{{.State.Health.Status}}' "$container" 2>/dev/null)
if [ "$status" = "unhealthy" ]; then
docker restart "$container"
fi
done
Running a watcher like this as its own long-lived process (itself ideally monitored for its own health, to avoid the watcher becoming an unmonitored single point of failure) provides the missing link between Docker's health status reporting and an actual remediation action, without requiring any change to the applications being watched.
Orchestrator-level handling
Higher-level orchestration layers built on top of plain Docker often implement this connection natively, treating a sustained unhealthy status as a first-class trigger for replacing or restarting the affected instance without requiring either application-level exit logic or a custom external watcher script:
services:
api:
image: my-api:latest
deploy:
restart_policy:
condition: on-failure
The specific behavior and configuration for this varies meaningfully between orchestration platforms, and it is worth verifying directly, for whichever specific layer is in use, whether and how an unhealthy container actually triggers a restart, rather than assuming a particular behavior based on experience with plain Docker, Compose, or a different orchestrator that may handle this connection differently.
Restart loops caused by a persistent health problem
If a restart-on-unhealthy mechanism is in place, but the underlying problem causing the unhealthy status is not actually transient, an unreachable database, a missing required configuration value, restarting the container repeatedly accomplishes nothing except producing a rapid restart loop, since the same condition that caused the original failure persists across each restart:
docker events --filter event=restart --filter container=my-api --since 10m
A restart count climbing rapidly within a short window is worth checking against the actual root cause directly, since a persistent, non-transient problem needs an actual fix, not repeated automated restarts that will keep failing for the identical underlying reason.
Backoff for restart-on-unhealthy mechanisms
A custom or scripted restart-on-unhealthy mechanism, unlike Docker's own built-in restart policy (which includes an increasing backoff delay between automatic restart attempts), may not include any backoff of its own unless explicitly implemented, risking a tight restart loop that adds load without giving the underlying problem any chance to resolve on its own between attempts:
RESTART_COUNT=0
docker events --filter event=health_status | while read -r event; do
if echo "$event" | grep -q unhealthy; then
RESTART_COUNT=$((RESTART_COUNT + 1))
sleep $((RESTART_COUNT * 5))
docker restart my-api
fi
done
Adding an explicit, increasing backoff to a custom watcher script, rather than restarting immediately and unconditionally on every unhealthy transition, mirrors the protection Docker's own native restart policy already provides for process-exit-triggered restarts.
Verifying the actual configured behavior
Because the relationship between health status and restart behavior depends on exactly which mechanism, if any, has been configured to bridge the gap, directly verifying what actually happens when a container becomes unhealthy, rather than assuming based on the restart policy alone, avoids a false sense of automated recovery that does not actually exist:
docker exec my-api pkill -STOP node # simulate an unresponsive process for testing
sleep 60
docker inspect --format='{{.State.Health.Status}} {{.RestartCount}}' my-api
Deliberately inducing an unhealthy condition in a test environment and observing whether an actual restart occurs is the most reliable way to confirm the assumed health-to-restart connection genuinely works as expected.
Common mistakes
- Assuming Docker's restart policy automatically reacts to health status directly, without verifying that some explicit mechanism, application-level exit logic, an external watcher, or an orchestrator's native support, actually bridges the two.
- Implementing a custom restart-on-unhealthy watcher with no backoff, risking a tight restart loop during a sustained, non-transient failure.
- Restarting a container repeatedly for a persistent root cause that a restart cannot actually fix, rather than treating repeated restarts as a signal to investigate the underlying problem directly.
- Not testing the assumed health-to-restart connection deliberately, only discovering during a real incident that no actual remediation mechanism was ever in place.
- Building a watcher script with no monitoring of its own, creating a new, unmonitored single point of failure for the entire remediation mechanism.
Health restart interaction is not automatic by default in plain Docker, and reliably converting an unhealthy status into an actual remediation action requires explicitly building or configuring one of several available bridging mechanisms, then verifying directly, through deliberate testing rather than assumption, that the intended connection actually behaves as expected.