16.1.3.2 Cache Invalidation Miss
A focused guide to Cache Invalidation Miss, connecting core concepts with practical Docker and container operations.
A cache invalidation miss is the specific failure mode where Docker's layer caching reuses a previous build's result even though the actual, real-world content that instruction depends on has genuinely changed, a true blind spot in the caching model rather than a misunderstanding of how it works, most commonly arising from RUN instructions that fetch content over the network using a command Docker has no way to inspect for content changes.
Why RUN instructions are uniquely vulnerable
Docker's cache key for a RUN instruction is based on the literal command text and the state of the filesystem layer it builds on top of, not on anything the command might fetch externally while it executes:
RUN curl -sf https://example.com/install.sh | sh
If the script at that URL changes upstream, but the RUN instruction's text itself does not change at all, Docker has no mechanism to detect this and will happily reuse the cached layer from a previous build, silently continuing to use the old, fetched-and-executed result rather than re-fetching and re-running against the now-different remote content.
Why COPY and ADD do not have this problem for local files
This is specifically a RUN-instruction problem; COPY and ADD calculate their cache key from the actual content hash of the files being copied from the build context, which means a genuine local file change is reliably detected and correctly invalidates the cache:
COPY install.sh .
RUN ./install.sh
Restructuring a remote fetch into a local file plus a COPY and a separate execution step, downloading the script ahead of time and committing it (or fetching it through a build step that is itself periodically refreshed deliberately) into the repository, converts an undetectable remote dependency into a properly content-hashed, cache-aware local one.
ADD with remote URLs
ADD supports fetching directly from a URL, and unlike a RUN curl invocation, this specific usage does have at least some content-awareness, though its exact caching behavior around remote URL freshness has varied somewhat across Docker versions and should be verified directly rather than assumed:
ADD https://example.com/file.tar.gz /tmp/file.tar.gz
docker build --no-cache -t my-api .
Given the inconsistency and version-dependent nuance here, the safer, more predictable approach for anything where remote content freshness genuinely matters is treating any remote fetch as something requiring an explicit, deliberate cache-busting mechanism rather than trusting either RUN curl or ADD with a URL to detect upstream changes reliably on their own.
Explicit cache-busting for intentional re-fetches
When a build genuinely needs to re-fetch remote content on a schedule, or under specific conditions, rather than relying on accidental or unreliable cache invalidation, an explicit cache-busting argument forces the relevant instruction to always execute fresh:
ARG CACHE_BUST=1
RUN curl -sf https://example.com/install.sh -o install.sh && cat install.sh
docker build --build-arg CACHE_BUST=$(date +%s) -t my-api .
Passing a value that changes on every build, a timestamp or a build number, as a build argument referenced within the relevant instruction forces that specific layer (and everything after it) to be treated as changed, deliberately bypassing the cache for exactly the instruction that needs fresh, current content every time.
Package manager update steps and the same underlying issue
The same blind spot applies to package manager update commands, where the actual set of available packages and their versions changes upstream over time, but the RUN apt-get update && apt-get install -y curl instruction's text never changes, meaning a cached layer can continue installing a stale package version indefinitely:
RUN apt-get update && apt-get install -y curl
docker build --no-cache -t my-api .
Building with --no-cache periodically, or as a routine, scheduled maintenance practice rather than only when a problem is suspected, is the most direct way to force these always-stale-prone-by-design instructions to actually re-execute against current upstream state.
Detecting that a cache invalidation miss has occurred
Because a cache invalidation miss produces no error or warning, by definition, the build succeeds and reports nothing unusual, detecting it requires either deliberately comparing a fresh, no-cache build's actual output content against the cached version, or noticing the consequence indirectly, a security scan flagging an outdated package version that should have been updated by a RUN apt-get install step that was, in fact, silently serving a cached, stale result:
docker build --no-cache -t my-api:fresh .
docker build -t my-api:cached .
docker run --rm my-api:fresh dpkg -l | md5sum
docker run --rm my-api:cached dpkg -l | md5sum
A difference between the two confirms that caching was indeed serving stale content for at least one instruction somewhere in the Dockerfile, which is the clearest, most direct evidence that this specific category of problem was actually occurring.
Designing Dockerfiles to minimize this risk
The most reliable mitigation is structuring Dockerfiles so that anything depending on external, mutable state, a package repository's current contents, a script fetched from a URL, is either pinned to a specific, immutable version or digest, or deliberately and explicitly cache-busted rather than relying on accidental invalidation that the caching model was never designed to detect for this category of dependency:
RUN curl -sf https://example.com/releases/v2.1.3/install.sh -o install.sh
Pinning to a specific, versioned URL, rather than one that always points at "latest" regardless of when it is fetched, makes the actual content genuinely immutable for a given build, which sidesteps the cache invalidation miss problem entirely rather than needing to detect or work around it after the fact.
Common mistakes
- Relying on a
RUN curlinstruction to automatically reflect upstream content changes, when Docker's cache key for that instruction is based purely on the command text, not on what it actually fetches. - Treating
ADDwith a remote URL as reliably cache-aware of upstream content changes without verifying the specific behavior for the Docker version actually in use. - Never building with
--no-cacheas a routine practice, allowing package manager update and install steps to silently serve stale, outdated versions indefinitely. - Not detecting a cache invalidation miss until its downstream consequence, an outdated and vulnerable package version, is flagged by an unrelated security scan.
- Fetching from a URL that always points at the latest version of something rather than a specific, immutable, versioned reference, leaving freshness entirely dependent on unreliable cache behavior rather than deliberate, explicit control.
A cache invalidation miss is a genuine blind spot in Docker's caching model, specific to instructions that depend on external state the cache key calculation cannot see, and the reliable fix is either pinning that external dependency to something genuinely immutable or deliberately busting the cache for it explicitly, rather than hoping the caching mechanism will somehow detect a change it was never designed to be aware of.