14.3.3.4 Blue Green Traffic Cutover

A focused guide to Blue Green Traffic Cutover, connecting core concepts with practical Docker and container operations.

A blue-green traffic cutover is the process of actually moving live request volume from the previously active environment to the newly validated one, and while the underlying mechanism is a proxy or load balancer configuration change, the cutover itself can be executed instantly or gradually, and each approach carries different risk and observability trade-offs worth choosing deliberately rather than defaulting to whichever is simplest to implement.

Instant cutover

The simplest cutover moves all traffic at once, the moment the new environment is considered validated:

upstream backend {
    server green.internal:3000;
}

docker exec proxy nginx -s reload

Instant cutover is fast and simple to reason about, but it also means that if a problem exists that pre-switch validation did not catch, every single request after the switch is immediately affected, with no smaller exposure window to limit the impact while the problem is detected.

Gradual, weighted cutover

A gradual cutover shifts traffic incrementally, starting with a small percentage routed to the new environment and increasing that percentage over a defined period while watching for problems at each step:

upstream backend {
    server blue.internal:3000 weight=9;
    server green.internal:3000 weight=1;
}

upstream backend {
    server blue.internal:3000 weight=5;
    server green.internal:3000 weight=5;
}

upstream backend {
    server green.internal:3000 weight=10;
}

This approach narrows the blast radius of an undetected problem to whatever fraction of traffic was routed to the new environment at the time the problem is noticed, at the cost of a longer overall cutover process and the operational complexity of monitoring and adjusting weights at each step.

Monitoring during the cutover window

Whichever cutover approach is used, the period during and immediately after the cutover deserves closer-than-usual monitoring attention, since this is precisely when a problem missed during pre-switch validation is most likely to first become visible:

watch -n 5 'curl -s https://metrics.example.com/api/error_rate?service=my-api&window=1m'

docker logs --since 2m myapp-green-api-1 | grep -c ERROR

A deliberate, focused monitoring period immediately following the cutover, rather than treating the switch as the end of the deployment process, catches problems faster than waiting for them to surface through ordinary, lower-attention monitoring.

Handling long-lived connections during cutover

WebSocket connections, long-polling requests, and Server-Sent Events streams present a specific challenge during cutover: a connection already established with the old environment does not automatically move to the new one when the proxy's upstream configuration changes, since the proxy typically only applies the new routing decision to new connections:

const socket = io('wss://api.example.com');
socket.on('disconnect', () => {
  setTimeout(() => socket.connect(), 1000);
});

A client-side reconnection strategy, with automatic retry on disconnect, is generally necessary to handle this gracefully; without it, clients connected before the cutover continue talking to the old environment until their connection happens to drop for an unrelated reason, which can leave a long tail of traffic on the old environment well after the cutover was supposed to be complete.

Session affinity and cutover

If the application relies on server-side session state tied to a specific environment, a traffic cutover can disrupt active user sessions unless that state is either shared between both environments or externalized entirely:

app.use(session({
  store: new RedisStore({ client: sharedRedisClient }),
}));

Externalizing session state to a shared store reachable by both the blue and green environments removes session continuity as a concern during cutover, since either environment can serve a request for any active session regardless of which environment originally created it.

Database connection pool warm-up

Immediately after a cutover, the newly live environment may experience a sudden spike in connections to shared resources like the database, as every replica that was previously idle starts actively serving its full share of production traffic at once:

const pool = new Pool({ min: 5, max: 20 });
await pool.query('SELECT 1'); // warm up at least one connection before cutover

Pre-warming connection pools in the idle environment shortly before the cutover, rather than letting them establish connections lazily on the first real request after traffic arrives, avoids a brief period of elevated latency immediately following the switch.

Reversing a cutover partway through

A gradual cutover has a meaningful advantage during rollback: if a problem is detected while only a fraction of traffic has been shifted, reversing the weight back to the previous environment removes the new environment from the traffic path immediately, without needing to wait for or coordinate anything beyond the same weight adjustment used to shift traffic forward:

upstream backend {
    server blue.internal:3000 weight=10;
    server green.internal:3000 weight=0;
}

This is one of the strongest arguments for gradual cutover over instant cutover for services where the cost of a partial, contained incident is meaningfully lower than the cost of a full, all-traffic incident.

Common mistakes

Defaulting to instant cutover without considering whether the service's risk profile would benefit from a gradual, weighted approach instead.
Failing to account for long-lived connections like WebSockets, leaving a confusing long tail of traffic still hitting the old environment well after the cutover was considered complete.
Relying on environment-local session state without externalizing it, disrupting active user sessions at the moment of cutover.
Not monitoring with extra attention immediately after the cutover, missing a problem that takes a few minutes of real traffic to manifest.
Allowing connection pools and caches in the newly live environment to warm up lazily under full production load instead of pre-warming them just before the cutover.

A blue-green traffic cutover should be designed around the service's actual risk tolerance, choosing between instant and gradual approaches deliberately, accounting explicitly for long-lived connections and session state, and treating the period immediately following the cutover as deserving the most focused monitoring attention of the entire deployment.