18.2.2.4 Swarm Service Discovery

A focused guide to Swarm Service Discovery, connecting core concepts with practical Docker and container operations.

Swarm service discovery resolves a service name to a reachable address for any other service attempting to communicate with it, supporting two distinct modes, a virtual IP that load-balances transparently across replicas, and DNS round-robin that returns every replica's individual address directly, each with different implications for how a client actually distributes its connections.

Virtual IP mode as the default

By default, a Swarm service is assigned a single virtual IP (VIP), and Swarm's internal load balancing transparently distributes traffic sent to that VIP across the service's actual, currently running replicas, with the client never directly seeing or needing to know about individual replica addresses at all:

docker service create --name my-api --replicas 3 my-api-image
docker run --rm --network my-network alpine nslookup my-api

Name:    my-api
Address: 10.0.9.4

A single resolved address here represents the service's VIP, not any individual replica's address, with Swarm's own internal networking layer responsible for actually distributing connections sent to that VIP across the currently available replicas behind it.

DNS round-robin mode as an alternative

Setting a service's endpoint mode to DNS round-robin (DNSRR) instead returns every individual replica's actual address directly through DNS, leaving the client's own DNS resolution and connection behavior responsible for choosing which specific replica to actually connect to:

docker service create --name my-api --replicas 3 --endpoint-mode dnsrr my-api-image

docker run --rm --network my-network alpine nslookup my-api

Name:    my-api
Address: 10.0.9.5
Address: 10.0.9.6
Address: 10.0.9.7

This mode is occasionally preferred for protocols or client libraries that need direct visibility into individual replica addresses rather than relying on Swarm's own internal load balancing, but it shifts the responsibility for actually distributing connections evenly onto the client's own DNS resolution behavior, which, as covered in other DNS-related content, does not always distribute evenly or predictably depending on the specific client implementation.

Choosing between the two modes deliberately

VIP mode is the better default for most ordinary services, since it provides transparent, Swarm-managed load balancing without requiring any specific client-side behavior; DNSRR mode is worth choosing specifically when a service's protocol or client library genuinely needs direct, individual replica visibility, such as certain database client libraries designed to manage their own connection pooling across multiple known backend addresses directly.

services:
  api:
    deploy:
      endpoint_mode: vip
  db-cluster:
    deploy:
      endpoint_mode: dnsrr

Task-level DNS entries for individual replica addresses

Beyond the service-level VIP or round-robin entries, Swarm also provides a tasks.<service-name> DNS entry that always resolves to every individual replica's address directly, regardless of which endpoint mode the service itself is configured with, useful when something genuinely needs individual replica visibility without changing the service's overall endpoint mode:

docker run --rm --network my-network alpine nslookup tasks.my-api

Name:    tasks.my-api
Address: 10.0.9.5
Address: 10.0.9.6
Address: 10.0.9.7

This tasks. prefix convention provides a consistent way to get individual replica addresses regardless of the service's own configured endpoint mode, which is useful for diagnostic purposes or for a specific client need without requiring a change to the service's primary, externally-facing discovery configuration.

Comparison to plain Compose DNS resolution

This service discovery mechanism is specific to Swarm mode and meaningfully more sophisticated than the simpler, single-host service name resolution plain Compose provides without Swarm; understanding this distinction matters when reasoning about behavior differences between a service running under plain Compose on a single host versus the same service definition deployed as a Swarm stack across multiple hosts, since the underlying discovery and load-balancing mechanism genuinely differs between the two.

docker compose up -d

docker stack deploy -c docker-compose.yml my-stack

The same Compose file can be used for both, but the actual service discovery behavior, transparent VIP load balancing across multiple hosts versus simpler, single-host resolution, differs because of which underlying deployment mechanism is actually in effect.

Diagnosing service discovery issues

When a service cannot be reached by name from another service, confirming both are genuinely on the same overlay network and checking the actual DNS resolution behavior directly isolates whether the issue is network attachment, the chosen endpoint mode behaving unexpectedly, or something else entirely:

docker exec -it $(docker ps -q -f name=worker) nslookup my-api

Common mistakes

Assuming a resolved single address for a VIP-mode service represents one specific replica, rather than understanding it as the service's own virtual IP with load balancing happening transparently behind it.
Choosing DNSRR mode without confirming the client or protocol in question actually needs and correctly handles direct, individual replica address visibility.
Not knowing about the tasks.<service-name> DNS convention when individual replica visibility is needed without changing a service's primary endpoint mode configuration.
Assuming service discovery behavior is identical between plain Compose on a single host and the same Compose file deployed as a Swarm stack, when the underlying mechanism genuinely differs.
Not checking whether two services are genuinely attached to the same overlay network before assuming a name resolution failure is related to endpoint mode configuration specifically.

Swarm service discovery's two endpoint modes, VIP for transparent, Swarm-managed load balancing and DNSRR for direct, client-managed replica visibility, address different needs, with the tasks. prefix convention providing individual replica addresses regardless of which primary mode a service is actually configured with, and understanding this mechanism's specific sophistication clarifies why behavior can genuinely differ between a single-host Compose deployment and the same definition deployed as a multi-host Swarm stack.