17.2.3.2 Request Draining Practice

A focused guide to Request Draining Practice, connecting core concepts with practical Docker and container operations.

Request draining practice, beyond the application's own internal shutdown sequence, concerns coordinating with whatever sits in front of a container, a load balancer, a reverse proxy, a service mesh, so that traffic actually stops arriving before the container begins shutting down, rather than relying entirely on the application's own shutdown handler to absorb requests that continue arriving throughout the entire process.

The race between deregistration and termination

If a container receives SIGTERM and begins its shutdown sequence at the same moment a load balancer is only just beginning to notice it should stop routing new traffic there, a window exists where new requests continue arriving at a container that is already in the process of shutting down:

T+0s   container receives SIGTERM, begins shutdown
T+0s   load balancer health check still passing, continues routing new traffic
T+2s   load balancer's next check finally detects unhealthy, stops routing

During that two-second window in this example, new requests arrive at a container already shutting down, which the application's own internal logic needs to handle, returning an explicit, immediate rejection for genuinely new requests rather than attempting to process them, but this window is avoidable entirely with better coordination.

A two-phase drain sequence

A more deliberate sequence separates deregistration from the actual termination signal: first, signal the load balancer or service mesh to stop routing new traffic, wait for that change to actually propagate, and only then send the termination signal to the container itself:

curl -X POST http://load-balancer-admin/deregister?target=my-api-3
sleep 10
docker stop my-api-3

This explicit pause between deregistration and actual termination gives the load balancer's own propagation delay, however long that genuinely takes for the specific load balancer or service mesh in use, time to complete before the container itself begins shutting down, removing the race entirely rather than relying on the application to absorb it.

Health check failure as the deregistration trigger

For load balancers and orchestrators that route traffic based on health check status, having the application proactively report unhealthy the moment it begins shutting down, rather than waiting for the orchestrator to notice through a delayed restart or removal process, triggers deregistration as fast as the health check's own polling interval allows:

let shuttingDown = false;

app.get('/healthz', (req, res) => {
  res.status(shuttingDown ? 503 : 200).send();
});

process.on('SIGTERM', () => {
  shuttingDown = true; // immediately fail health checks
  setTimeout(() => {
    server.close(() => process.exit(0));
  }, 5000); // give the load balancer a few seconds to notice and stop routing
});

This pattern intentionally delays the actual server close slightly after the health check starts reporting unhealthy, giving the load balancer's own check interval a window to notice and deregister the target before the server actually stops accepting connections at all.

Orchestrator-specific pre-stop hooks

Some orchestration platforms support an explicit pre-stop hook, executed before the termination signal is sent to the container's main process, which is the structurally cleanest way to implement this two-phase sequence, since the platform itself coordinates the delay rather than requiring the application to estimate and hardcode an appropriate wait duration:

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 10"]

A pre-stop hook like this runs before SIGTERM is ever delivered to the main process, providing exactly the deregistration propagation window needed without requiring the application's own shutdown handler to guess at or hardcode an appropriate delay itself.

Connection draining versus request draining

It is worth distinguishing draining at the connection level, allowing already-established TCP or HTTP keep-alive connections to be reused for a period after deregistration, from draining at the request level, allowing only currently in-flight individual requests to complete; a load balancer's own configuration for how long to continue allowing existing connections to be reused after a target is marked unhealthy is a separate setting from the application's own in-flight request handling:

upstream backend {
    server api-1:3000;
    server api-2:3000;
}

Configuring an appropriate connection draining timeout at the proxy or load balancer level, distinct from but complementary to the application's own in-flight request handling, ensures both layers are coordinated rather than only one of them accounting for graceful termination.

Verifying the drain sequence end to end

Testing the complete drain sequence, deregistration, propagation wait, and actual termination, together rather than testing the application's shutdown handler in isolation, confirms the coordination genuinely works as intended:

for i in $(seq 1 50); do curl -s -o /dev/null -w "%{http_code}\n" http://load-balancer/api/test & done
docker stop my-api-3
wait

Running a burst of requests against the load balancer at the same moment a target container is stopped, and confirming none of them fail or are routed to the now-stopping target, provides direct, empirical confidence that the drain sequence is coordinated correctly across both layers.

Common mistakes

Sending the termination signal to a container without first signaling or waiting for the load balancer to actually stop routing new traffic to it.
Relying entirely on the application's own internal shutdown handler to absorb the brief window of new requests that arrive during load balancer deregistration propagation.
Not configuring an appropriate connection draining timeout at the proxy or load balancer level, leaving that layer uncoordinated with the application's own in-flight request handling.
Hardcoding an arbitrary, unverified delay between deregistration and termination rather than confirming it actually covers the specific load balancer's real propagation time.
Testing the application's shutdown handler in isolation without testing the full, coordinated drain sequence including the load balancer or service mesh layer together.

Request draining practice extends graceful shutdown beyond the application's own internal handling into explicit coordination with whatever routes traffic to it, ensuring deregistration genuinely completes, with enough propagation time accounted for, before the container itself actually begins terminating, which closes a race condition that the application's own shutdown handler alone cannot fully address.