16.4.2.5 Compose DNS Alias Confusion
A focused guide to Compose DNS Alias Confusion, connecting core concepts with practical Docker and container operations.
Compose DNS alias confusion frequently centers on a specific, easily overlooked interaction: application-level DNS caching holding onto a now-stale IP address for a dependency that was restarted and received a new address, which produces a connectivity failure that looks like a Compose networking or alias misconfiguration but is actually rooted in how long the application itself, not Docker, decided to remember a previous DNS lookup result.
Docker's embedded DNS server and per-lookup freshness
Docker runs its own embedded DNS server, conventionally reachable at 127.0.0.11 inside each container, which resolves service names to current container IP addresses, and this resolution itself is always accurate and current; a fresh DNS query made at any given moment correctly returns whatever IP the target container currently has:
docker compose exec api cat /etc/resolv.conf
nameserver 127.0.0.11
docker compose exec api nslookup db
A direct, fresh lookup like this always reflects current reality; the confusion described here arises specifically because many application runtimes do not perform a fresh lookup on every single connection, instead caching a previous result for some duration.
Why a restarted dependency causes stale connections
When a dependency container is restarted, recreated, or replaced, it generally receives a new internal IP address, even though its service name remains exactly the same and continues resolving correctly to that new address through Docker's DNS:
docker compose restart db
docker compose exec api curl http://db:5432
If api's underlying networking library or runtime has cached the previous, now-stale IP address from before the restart, this connection attempt fails or times out, attempting to reach an address that no longer corresponds to any running container, even though a fresh DNS lookup for db at this exact moment would correctly return the new, current address.
Runtime-specific DNS caching behavior
Different language runtimes and networking libraries have different default DNS caching behavior, and understanding the specific behavior in use clarifies whether and for how long a stale result might actually be retained:
const dns = require('dns');
dns.lookup('db', (err, address) => console.log(address));
Node.js's underlying DNS resolution generally relies on the operating system's own resolver behavior rather than maintaining its own long-lived cache by default, though specific libraries or connection pooling implementations built on top of it can introduce their own caching layer independently.
java.security.Security.setProperty("networkaddress.cache.ttl", "30");
The JVM, by contrast, has historically cached successful DNS lookups indefinitely by default in certain configurations, unless explicitly configured with a finite TTL, which makes Java-based applications a particularly common source of this specific class of stale-connection issue after a dependency restarts.
Diagnosing whether stale DNS caching is the actual cause
A connectivity failure that resolves on its own after the affected service is itself restarted, without any other change, is a strong, specific signal pointing toward application-level DNS caching as the cause, since restarting the dependent service clears whatever cached resolution it was holding, forcing a fresh lookup on its next connection attempt:
docker compose restart api
docker compose exec api curl http://db:5432
If this resolves the connectivity issue without any change to the dependency itself, the original failure was very likely caused by api holding a stale cached address for db from before db's own restart, rather than anything wrong with Docker's networking or the alias configuration itself.
Reducing application-level DNS cache duration
For runtimes where DNS caching duration is configurable, reducing it to a short, reasonable value balances the performance benefit caching provides against the staleness risk this scenario illustrates:
java.security.Security.setProperty("networkaddress.cache.ttl", "10");
const agent = new http.Agent({ keepAlive: false });
For connection-pooling libraries specifically, ensuring pooled connections are periodically validated or refreshed, rather than held indefinitely once established, addresses the related but distinct issue of an already-established, pooled connection to a now-defunct address remaining in the pool after the dependency it pointed to has since restarted.
Designing for inevitable dependency restarts
Because dependency containers restarting, being rescheduled, or being replaced is a normal, expected, and frequent event in a containerized environment, the more robust long-term fix is ensuring the application's own connection and retry logic gracefully handles a connection failure by performing a fresh lookup and retry, rather than assuming a dependency's address, once resolved, remains valid indefinitely:
async function connectWithRetry(retries = 3) {
for (let i = 0; i < retries; i++) {
try {
return await db.connect();
} catch (err) {
if (i === retries - 1) throw err;
await new Promise((resolve) => setTimeout(resolve, 1000));
}
}
}
A retry that genuinely performs a fresh connection attempt, rather than reusing a cached, now-invalid resolution from a connection pool that was never properly invalidated, is what actually closes this gap reliably, regardless of the underlying runtime's specific DNS caching defaults.
Distinguishing this from a genuine alias misconfiguration
It is worth confirming this specific cause directly rather than assuming it, since a genuine alias or network configuration mistake produces a consistent, every-time failure rather than one that resolves itself simply by restarting the dependent service:
docker compose exec api nslookup db
A fresh lookup succeeding, immediately after a connection attempt failed, strongly suggests the failure was caused by a stale cached result rather than a genuine resolution problem, since Docker's own DNS is clearly resolving the name correctly at the moment the lookup was actually performed.
Common mistakes
- Assuming a connectivity failure after a dependency restart indicates a Compose networking or alias misconfiguration, without first checking whether application-level DNS caching is the actual, more likely cause.
- Not knowing the specific DNS caching default behavior of the language runtime or networking library in use, particularly for JVM-based applications with historically aggressive default caching.
- Relying entirely on DNS caching configuration adjustments without also ensuring connection retry logic performs a genuinely fresh lookup and connection attempt on failure.
- Not testing whether restarting the dependent service alone resolves an apparent networking issue, which is a fast, direct way to confirm or rule out stale caching as the cause.
- Treating a connection pool's already-established, pooled connections as immune to staleness, when a dependency restart can leave pooled connections pointing at a now-defunct address just as readily as a fresh DNS lookup cache can.
Compose DNS alias confusion frequently traces back not to Docker's own DNS resolution, which remains accurate on every fresh lookup, but to application-level caching of a previous lookup result that becomes stale the moment a dependency restarts and receives a new address, and the reliable fix combines reasonable cache duration tuning with connection retry logic that performs a genuinely fresh attempt rather than assuming a previously resolved address remains valid indefinitely.