18.2.2.1 Swarm Replicated Services
A focused guide to Swarm Replicated Services, connecting core concepts with practical Docker and container operations.
Swarm replicated services run a declared, fixed number of task instances of the same image, and beyond the basic mode distinction from global services, this category includes replicated-job mode for finite, run-to-completion workloads, considerations around load distribution across replicas via Swarm's routing mesh, and the deeper design question of whether a given service is actually suited to running as multiple, simultaneous replicas at all.
Replicated-job mode for finite workloads
Distinct from an ordinary replicated service that runs continuously, replicated-job mode runs a specified number of tasks to completion and then stops, suited to batch processing, migrations, or any workload with a genuine, natural endpoint rather than an ongoing, continuously running service:
docker service create --mode replicated-job --replicas 10 my-batch-processor
docker service ps my-batch-processor
Each of the ten task instances runs once and exits successfully (or fails), with the job considered complete once all instances have finished, which is meaningfully different from an ordinary replicated service's continuous, indefinite running expectation and its corresponding restart-on-failure behavior.
Load distribution across replicas through the routing mesh
For a published service port, Swarm's routing mesh accepts incoming connections on every node in the cluster and distributes them across the service's available replicas, regardless of which specific node a given replica happens to be running on:
docker service create --publish 80:80 --replicas 3 my-api
This means a client can connect to the published port on any cluster node, even one not currently running a replica of that service, and still be routed correctly to one of the actually running replicas elsewhere in the cluster, which is the mechanism that makes a multi-node, multi-replica service appear externally as a single, unified endpoint regardless of internal replica placement.
Stateless services as the natural fit for replication
A service genuinely suited to running as multiple, simultaneous, interchangeable replicas needs to be stateless at the request level, with no replica-specific data that a client's subsequent request depends on having been handled by that same specific replica again, the same fundamental scaling design principle covered for horizontal scaling generally:
app.get('/api/orders/:id', async (req, res) => {
const order = await db.query('SELECT * FROM orders WHERE id = $1', [req.params.id]);
res.json(order);
});
A service relying on local, in-memory state that a specific client's session depends on consistently reaching the same replica again breaks under Swarm's routing mesh, since there is no guarantee a client's subsequent request lands on the same replica that handled an earlier one.
Scaling replicated services in response to load
Adjusting a replicated service's replica count directly addresses changing load, and while Swarm does not include built-in automatic scaling based on observed metrics the way some more elaborate orchestration platforms do, a simple, external script or scheduled check can implement basic, threshold-based scaling using the same scale command:
CURRENT_LOAD=$(curl -s http://metrics.example.com/api/load?service=my-api)
if (( $(echo "$CURRENT_LOAD > 80" | bc -l) )); then
docker service scale my-api=$(($(docker service inspect my-api --format '{{.Spec.Mode.Replicated.Replicas}}') + 2))
fi
This kind of external, scripted scaling is a reasonable, lightweight approach for a cluster whose actual scaling needs do not warrant a fuller, dedicated autoscaling system, while still benefiting from Swarm's own scheduling and reconciliation handling the actual placement of whatever replica count the script decides on.
Avoiding excessive replicas for stateful, single-writer services
A database or similarly stateful service generally should not run as a multi-replica Swarm service at all, since most such systems are not designed to operate correctly as multiple, independent writers without additional clustering technology specifically built for that purpose, distinct entirely from ordinary, stateless application scaling:
docker service create --replicas 1 my-db
A single-replica service for this kind of stateful component is the deliberate, correct default, with genuine database high availability addressed through the database's own clustering or replication technology rather than naively increasing the Swarm replica count.
Common mistakes
- Using ordinary replicated mode for a finite, run-to-completion workload rather than replicated-job mode, which is specifically designed for that distinct use case.
- Not understanding that Swarm's routing mesh distributes connections across replicas regardless of which node they run on, leading to confusion about how external traffic actually reaches the right instance.
- Designing a service with replica-specific, local state that breaks once a client's requests are distributed unpredictably across multiple replicas by the routing mesh.
- Implementing no scaling response to changing load at all, relying entirely on a fixed replica count regardless of actual, observed demand.
- Scaling a stateful, single-writer service like a database to multiple replicas naively, without the underlying database technology's own clustering support to make that genuinely correct.
Swarm replicated services cover both continuously running, scalable application workloads and finite, run-to-completion jobs through replicated-job mode, with the routing mesh providing transparent load distribution across whatever replicas exist, but genuine suitability for replication depends entirely on the service being stateless at the request level, a design requirement that applies regardless of which specific orchestrator is doing the actual scheduling.