✦ For everyone, free.

Practical knowledge for real and everyday life

Home

18.2.1.3 Swarm Worker Nodes

A focused guide to Swarm Worker Nodes, connecting core concepts with practical Docker and container operations.

Swarm worker nodes are where actual application workload runs, and their operational concerns center on labeling for scheduling control, adding capacity as a cluster grows, and gracefully removing a node for maintenance or decommissioning, each distinct from the consensus and quorum concerns specific to manager nodes.

Labeling nodes for scheduling constraints

Applying labels to worker nodes allows services to be scheduled specifically onto nodes meeting a particular requirement, a specific hardware capability, a geographic location, a designated role, rather than being scheduled onto any arbitrary, available node in the cluster:

docker node update --label-add region=us-east node-3
docker node update --label-add gpu=true node-4
services:
  ml-inference:
    deploy:
      placement:
        constraints:
          - node.labels.gpu == true

This labeling and constraint mechanism is the primary way to express scheduling requirements beyond Swarm's default, unconstrained placement behavior, useful whenever a specific service genuinely needs to run only on nodes with a particular characteristic rather than anywhere in the cluster.

Adding worker capacity as a cluster grows

Joining additional worker nodes is a straightforward operation using the worker join token, and Swarm automatically begins considering newly joined nodes for scheduling new or rebalanced workload without any additional configuration needed beyond the join itself:

docker swarm join-token worker
docker swarm join --token SWMTKN-1-xxxxx 10.0.1.5:2377
docker node ls

Confirming a newly joined node shows as Ready and Active immediately verifies it joined successfully and is genuinely available to receive scheduled work, rather than assuming success purely from the join command completing without an obvious error.

Resource reservations and limits per service

Beyond node-level labeling, services support explicit resource reservations and limits, which Swarm's scheduler uses to make placement decisions, ensuring a node is not over-committed beyond its actual available capacity when scheduling new replicas onto it:

services:
  api:
    deploy:
      resources:
        reservations:
          cpus: "0.5"
          memory: 256M
        limits:
          cpus: "1"
          memory: 512M

The reservation value is what the scheduler actually considers when deciding whether a node has sufficient available capacity for a new replica; the limit value bounds what that replica can actually consume once running, and the two together give the scheduler accurate information for sound placement decisions across the cluster's available worker capacity.

Draining a worker for planned maintenance

Setting a worker's availability to drain causes Swarm to reschedule its currently running tasks onto other available nodes before the maintenance work begins, avoiding the abrupt disruption that simply rebooting or shutting down an active node without first draining it would cause:

docker node update --availability drain node-3
docker node ls

Confirming the drained node's tasks have actually been rescheduled elsewhere, visible through docker service ps for the affected services, before proceeding with the planned maintenance, ensures no workload is left disrupted by the maintenance action itself.

Decommissioning a worker node permanently

For a worker being permanently removed from the cluster rather than only temporarily drained for maintenance, leaving the swarm cleanly from the node itself, followed by removing its record from the manager's node list, keeps the cluster's own bookkeeping accurate and avoids a stale, no-longer-existing node lingering in docker node ls output indefinitely:

docker swarm leave
docker node rm node-3

Running docker swarm leave from the worker node itself is the clean, cooperative removal path; if the node is already gone or unreachable, docker node rm --force from a manager removes its stale record without needing the node's own cooperation.

Balancing workload distribution across workers

Swarm's default scheduling strategy distributes replicas reasonably evenly across available nodes, but for services with particularly uneven resource needs or specific anti-affinity requirements, explicit placement preferences can express a more deliberate distribution strategy than the scheduler's own default behavior would otherwise produce:

services:
  worker-task:
    deploy:
      placement:
        preferences:
          - spread: node.labels.zone

This spread preference, distributing replicas across nodes grouped by a specific label value, is useful for ensuring replicas of a given service end up spread across distinct zones or racks rather than concentrated, by chance, onto a smaller subset of available nodes.

Common mistakes

  • Not using node labels and placement constraints when a service genuinely needs to run only on nodes meeting a specific requirement, relying instead on the scheduler's unconstrained default placement.
  • Omitting resource reservations from service definitions, leaving the scheduler without accurate information to make sound placement decisions across available worker capacity.
  • Rebooting or shutting down a worker node without first draining it, causing an avoidable disruption to whatever workload was actively running on it at that moment.
  • Leaving a permanently decommissioned worker's stale record in the cluster's node list rather than cleanly removing it after the node itself has left.
  • Relying entirely on default scheduling behavior for services with genuine anti-affinity or distribution requirements, rather than expressing those requirements explicitly through placement preferences.

Swarm worker nodes are managed primarily through labeling and placement constraints for scheduling control, accurate resource reservations for sound scheduler decisions, and graceful draining before any planned maintenance or permanent decommissioning, each of which keeps the cluster's actual workload distribution deliberate and predictable rather than left entirely to default scheduling behavior.