✦ For everyone, free.

Practical knowledge for real and everyday life

Home

15.3.2.3 Health Unhealthy State

A focused guide to Health Unhealthy State, connecting core concepts with practical Docker and container operations.

The health unhealthy state is Docker's signal that a container has failed enough consecutive health checks to exceed its configured retry threshold, and what actually happens once a container reaches this state, and what should happen, depends entirely on what other mechanisms, restart policies, external orchestration, alerting, are configured to respond to it, since the unhealthy status itself is purely a report.

What triggers the transition

A container enters the unhealthy state only after the number of consecutive check failures configured through --retries is reached; a single failure, or even several non-consecutive failures interspersed with successes, does not trigger this transition:

HEALTHCHECK --interval=10s --retries=3 \
  CMD curl -f http://localhost:3000/healthz || exit 1
docker events --filter event=health_status --filter container=my-api
health_status: unhealthy

The events stream provides a precise timestamp for exactly when this transition occurred, which is valuable for correlating against logs and metrics from the same moment during an investigation.

Docker's own default behavior on unhealthy

Critically, Docker does not automatically restart a container purely because it becomes unhealthy unless a restart policy is configured to react to it, and even then, the relationship between health status and restart policy is not as direct as it might seem: standard restart policies (on-failure, unless-stopped, always) trigger based on the container's process exit, not directly based on health status alone:

docker run -d --restart=on-failure my-api

A container that is marked unhealthy but whose main process continues running (the health check is failing, but the process itself has not exited) will not be restarted by a standard restart policy alone, since nothing about that policy is triggered purely by the health status field; something else, an external watcher, an orchestration layer, or the application itself choosing to exit upon detecting its own sustained unhealthy condition, needs to act on the unhealthy signal directly.

Building a response to the unhealthy state

Because Docker itself does not automatically act on unhealthy status beyond reporting it, a practical response mechanism typically needs to be built or configured explicitly, watching for the transition and taking action:

docker events --filter event=health_status | while read -r event; do
  if echo "$event" | grep -q "unhealthy"; then
    container=$(echo "$event" | grep -oP '(?<=container=)\w+')
    docker restart "$container"
  fi
done
docker run -d --health-cmd="curl -f http://localhost:3000/healthz || exit 1" my-api

A monitoring script like this, watching Docker's event stream and restarting a container the moment it transitions to unhealthy, is a simple, direct way to convert the unhealthy signal into an actual remediation action when nothing more sophisticated, like an orchestrator with native support for this, is in place.

Unhealthy and traffic routing

For a container behind a reverse proxy or load balancer, the more immediate and important consequence of unhealthy status is typically removal from active traffic routing, provided the proxy itself is configured to check and respect health status:

http:
  services:
    api:
      loadBalancer:
        healthCheck:
          path: /healthz

If the proxy is not configured to check health independently and does not consume Docker's own health status, a container can remain unhealthy and continue receiving traffic indefinitely, since nothing in that configuration actually removes it from rotation; the unhealthy status, in this case, would be purely informational, visible through inspection but with no automatic effect on the traffic it continues to receive.

Investigating why a container became unhealthy

The health check log, retained by Docker for recent check attempts, is the most direct source of evidence for understanding exactly what caused the transition:

docker inspect --format='{{json .State.Health.Log}}' my-api | jq .
[{ "Start": "2024-06-01T14:32:01Z", "ExitCode": 1, "Output": "curl: (7) Failed to connect" }]

Correlating these check failure timestamps and outputs against application logs and resource metrics from the same window typically narrows down the actual root cause considerably faster than starting an investigation from scratch without this specific, precisely timestamped evidence.

A container that recovers on its own

Because a single successful check is sufficient to transition a container back to healthy, a container that was briefly unhealthy due to a transient issue, a brief network blip, a momentary database connection problem, can recover entirely on its own without any external intervention, simply by passing its next scheduled check:

docker events --filter event=health_status --filter container=my-api --since 10m
health_status: unhealthy
health_status: healthy

A health event history showing a brief unhealthy period followed promptly by a return to healthy, with no restart or manual intervention in between, indicates the underlying issue was transient and self-resolved, which is still worth investigating to understand the root cause, but does not necessarily require the same urgency as a sustained, unrecovered unhealthy state.

A container that never recovers

A container that remains unhealthy indefinitely, never returning to a successful check, represents a genuinely persistent problem, and if no restart policy or external mechanism is set up to act on this, the container can simply sit in this state indefinitely, continuing to consume resources and, depending on proxy configuration, potentially continuing to receive traffic it cannot serve correctly:

docker inspect --format='{{.State.Health.Status}} {{.State.StartedAt}}' my-api

This scenario underscores why relying on health status alone, without a corresponding action plan for what should happen when it is reached, leaves a meaningful gap: the signal exists, but nothing changes as a result unless something has been explicitly built to respond to it.

Common mistakes

  • Assuming Docker automatically restarts a container that becomes unhealthy, without configuring or building a mechanism that actually acts on that status.
  • Not configuring the load balancer or proxy in front of a container to check health independently, leaving an unhealthy container continuing to receive routed traffic indefinitely.
  • Investigating an unhealthy transition without first checking the retained health check log, missing the most direct, precisely timestamped evidence available.
  • Treating every unhealthy transition with the same urgency regardless of whether it self-resolved quickly or has persisted, rather than distinguishing transient blips from genuinely sustained problems.
  • Leaving a persistently unhealthy container running indefinitely with no automated remediation in place, relying entirely on a human noticing it manually.

The health unhealthy state is a report, not an action, and its practical value depends entirely on whatever mechanism, a restart policy, an external watcher, a proxy's own independent health checking, has been explicitly built or configured to respond to it; without that response mechanism in place, a container can remain unhealthy indefinitely with no consequence beyond what is visible through manual inspection.