✦ For everyone, free.

Practical knowledge for real and everyday life

Home

15.3.2.1 Health Starting State

A focused guide to Health Starting State, connecting core concepts with practical Docker and container operations.

The health starting state is the initial status every container with a configured health check begins in, and it exists specifically to give an application a fair, defined window to complete its own startup sequence before the normal failure-counting rules apply, which makes correctly configuring and reasoning about this window one of the more consequential but easily overlooked details of health check configuration.

Why a dedicated starting state is necessary

Without a distinct starting state and grace period, the very first few health checks against a container that takes a genuine, non-trivial amount of time to initialize, establishing database connections, loading a large in-memory cache, warming up a JIT-compiled runtime, would simply fail, and depending on the configured retry threshold, could flip the container straight to unhealthy before it ever had a realistic chance to finish starting:

HEALTHCHECK --interval=10s --retries=3 \
  CMD curl -f http://localhost:3000/healthz || exit 1

Without an explicit start-period, a container taking 45 seconds to become ready, checked every 10 seconds with a retry threshold of 3, would be marked unhealthy after roughly 30 seconds, well before it had actually finished starting, purely because no exemption existed for this expected, legitimate early-failure window.

How start-period changes the behavior

Setting start-period exempts failures occurring within that window from counting toward the retry threshold at all, effectively giving the container a grace period during which it can fail repeatedly without that counting against it, while still allowing it to be marked healthy immediately the moment a check does succeed within that same window:

HEALTHCHECK --start-period=60s --interval=10s --timeout=5s --retries=3 \
  CMD curl -f http://localhost:3000/healthz || exit 1

With this configuration, a container can fail every check for up to 60 seconds without being marked unhealthy; if it becomes ready at, say, 35 seconds in, the very next successful check immediately transitions it to healthy, and the remaining grace period is simply unused, since the start period only changes how failures are counted, not whether a success is recognized.

What happens if a container never becomes healthy during start-period

If the start-period elapses and the container still has not passed a single check, normal retry counting begins from that point forward, and continued failures will count toward the threshold and can flip the container to unhealthy shortly after the grace period ends:

docker events --filter event=health_status --filter container=my-api
health_status: unhealthy

This means a container that is fundamentally broken, rather than merely slow to start, will still eventually be marked unhealthy; the start-period only delays this for legitimate startup time, it does not provide indefinite tolerance for a container that genuinely never becomes ready.

Sizing the start-period correctly

The start-period should be set based on the application's actual, measured worst-case startup time under realistic conditions, including a cold cache, a freshly provisioned database connection, or other startup-time work that might not be present during a quick local test:

time docker run --rm my-api:1.4.0 sh -c 'until curl -sf http://localhost:3000/healthz; do sleep 1; done'

Measuring this directly, including under conditions that approximate a worst-case cold start rather than only a typical, already-warm scenario, gives a more reliable basis for the start-period value than guessing or copying a number used for an unrelated service with different startup characteristics.

Interaction with deployment automation

Deployment pipelines that wait for a container to report healthy before considering a rollout successful need to account for the start-period in their own timeout expectations, since a pipeline timeout shorter than the configured start-period plus the time actually needed to become healthy will report a deployment failure even for an application that was always going to succeed, just not within the pipeline's assumed window:

timeout 90 sh -c 'until [ "$(docker inspect --format="{{.State.Health.Status}}" my-api)" = "healthy" ]; do sleep 2; done'

Setting the deployment pipeline's own wait timeout meaningfully longer than the configured start-period, rather than shorter or only marginally longer, avoids a class of false deployment failure that has nothing to do with the application actually being broken.

A container stuck in starting

A container that remains in the starting state well past its expected startup time, without ever transitioning to either healthy or unhealthy, is itself a diagnosable symptom; this can happen if the start-period is configured generously enough that the retry threshold has not yet been reached despite the container genuinely being broken, leaving it in an ambiguous, undecided state for longer than ideal:

docker inspect --format='{{.State.Health.Status}} {{.State.StartedAt}}' my-api

Comparing how long a container has actually been running against its configured start-period quickly clarifies whether it is still legitimately within its grace window or has overstayed it without yet reaching the unhealthy threshold, which can happen if the start-period and retry-threshold combination was not tuned tightly enough relative to actual expected behavior.

Default behavior without an explicit start-period

If no start-period is specified, it defaults to a short value (commonly 0 seconds in many Docker versions, meaning no grace period at all), which means any application with meaningful startup time and no explicit start-period configured is at real risk of being marked unhealthy purely due to normal, expected startup latency rather than any genuine problem:

HEALTHCHECK --interval=10s --retries=3 \
  CMD curl -f http://localhost:3000/healthz || exit 1

Explicitly setting start-period for any application with non-trivial startup time, rather than relying on whatever the implicit default happens to be, is the safer and more deliberate choice.

Common mistakes

  • Omitting start-period entirely for an application with meaningful startup time, risking premature unhealthy status during normal, expected initialization.
  • Setting start-period based on a quick, warm local test rather than measuring actual worst-case cold-start behavior.
  • Configuring deployment pipeline timeouts shorter than the configured start-period plus realistic startup time, producing false deployment failures.
  • Assuming a container will eventually transition out of starting on its own without checking whether it has actually exceeded its configured grace window without becoming healthy.
  • Treating the start-period as providing indefinite tolerance, when in reality normal failure counting resumes the moment the grace window elapses.

The health starting state and its start-period configuration exist to separate legitimate startup latency from genuine application failure, and getting the duration right, based on measured, realistic worst-case startup behavior rather than assumption, prevents both premature false-unhealthy reports during normal startup and unnecessarily prolonged ambiguity for a container that is actually broken.