15.3.2 Container Health States
A focused guide to Container Health States, connecting core concepts with practical Docker and container operations.
Container health states are the specific, finite set of values Docker assigns to a container's health check status, starting, healthy, and unhealthy, and understanding exactly when and why a container moves between them clarifies behavior that otherwise looks inconsistent, particularly around the startup period and around what happens to a container once it transitions to unhealthy.
The three states
A container with a configured health check is always in exactly one of three states at any given moment:
docker inspect --format='{{.State.Health.Status}}' my-api
starting
healthy
unhealthy
A container with no HEALTHCHECK defined at all simply has no health status; this is a distinct case from any of the three states, and docker inspect for such a container returns an empty health object entirely rather than one of these three values.
The starting state
Immediately after a container starts, its health status is starting, and it remains in this state until either a check succeeds (transitioning to healthy) or enough checks fail within the configured start-period to exceed the retry threshold (transitioning to unhealthy), or, if a start-period is configured, until that grace window elapses and normal retry counting begins:
HEALTHCHECK --start-period=30s --interval=10s --retries=3 \
CMD curl -f http://localhost:3000/healthz || exit 1
During the start-period, failed checks do not count toward the retry threshold at all, which is specifically designed to accommodate an application's legitimate startup time without those expected early failures (the server not yet listening, a database connection not yet established) prematurely flipping the container to unhealthy before it has had a fair chance to finish starting.
Transitioning to healthy
A container moves from starting (or from unhealthy, if it recovers) to healthy the moment a single check succeeds; unlike the failure threshold, there is no equivalent "successes required" count for entering the healthy state, a single success is sufficient:
docker events --filter event=health_status --filter container=my-api
health_status: healthy
This asymmetry, requiring multiple consecutive failures to become unhealthy but only a single success to become healthy, means recovery is detected quickly while degradation requires sustained evidence, a deliberate design choice favoring fast recovery detection over fast failure detection.
Transitioning to unhealthy
A container moves to unhealthy only after the configured number of consecutive check failures (the --retries threshold) is reached, and once in this state, Docker's own behavior depends on what else is configured: a restart policy may cause the container to be restarted, or, in many setups, the unhealthy status simply remains visible for external systems (a load balancer, an orchestrator) to act on, since Docker itself does not automatically restart a container purely because its health check reports unhealthy unless something else is specifically configured to respond to that signal.
docker inspect --format='{{.State.Health.Status}}' my-api
This is a common point of confusion: marking a container unhealthy is a status report, not by itself an action; whatever should happen in response, restarting, removing from a load balancer's rotation, alerting, needs to be implemented by something consuming that status, whether that is an orchestration layer, a custom script watching Docker's event stream, or an external monitoring system.
Inspecting the health check log
Docker retains a short history of recent individual check results, including their output and exit codes, which is valuable for diagnosing exactly why a container is in its current state without needing to reconstruct the check's behavior indirectly:
docker inspect --format='{{json .State.Health.Log}}' my-api | jq .
[
{ "Start": "2024-06-01T14:32:01Z", "ExitCode": 1, "Output": "curl: (7) Failed to connect" }
]
This log is the most direct evidence available for understanding a specific health transition, often more useful than re-running the check manually after the fact, since it captures exactly what happened at the moment of each actual check attempt rather than the (potentially different) conditions present at a later, manual retry.
Health states and container removal
Health status has no direct bearing on whether a container can be stopped or removed; an operator or automated process can stop, restart, or remove a container regardless of its current health state, since health status is purely an informational and decision-input signal, not an access control or lifecycle gate enforced by Docker itself:
docker stop my-api
docker rm -f my-api
These commands succeed regardless of whether the container was healthy, unhealthy, or still starting at the time, which is worth knowing explicitly since it means nothing about the health check mechanism itself prevents an operator from manually intervening at any point.
Health state visibility through docker ps
The container listing itself surfaces a summary of health status directly, which is a quick way to scan for unhealthy containers across a host without needing to inspect each one individually:
docker ps
CONTAINER ID IMAGE STATUS
3f29a8c1d8e2 my-api Up 2 hours (healthy)
a8b9c2d3e4f5 my-worker Up 10 minutes (unhealthy)
This summarized view in the STATUS column is often the fastest way to get an at-a-glance sense of host-wide health across many running containers without writing a script to query each one's detailed health object individually.
Common mistakes
- Assuming a container automatically restarts the moment it is marked unhealthy, when this only happens if a restart policy or some other mechanism is specifically configured to respond to that status.
- Not accounting for the asymmetry between the failure threshold (requiring multiple consecutive failures) and recovery (requiring only a single success) when reasoning about expected health transition timing.
- Overlooking the
start-periodsetting and seeing containers prematurely marked unhealthy during legitimate startup time. - Not checking the health check log directly when diagnosing a transition, relying instead on a manual re-run of the check that may not reflect the actual conditions present at the time of the original failure.
- Assuming health status restricts what lifecycle operations (stop, restart, remove) can be performed on a container, when it is purely an informational signal with no such enforcement.
Container health states form a simple three-value model, but the specific transition rules, the start-period grace window, the asymmetric failure-versus-recovery thresholds, and the fact that an unhealthy status is a report rather than an automatic action, are details that materially affect how the signal should be interpreted and what, if anything, needs to be built to actually respond to it.