19.1.3.5 Pull Layer Reuse

A focused guide to Pull Layer Reuse, connecting core concepts with practical Docker and container operations.

Pull layer reuse is the mechanism that lets Docker skip downloading any layer already present in the local image store, regardless of which specific image originally caused that layer to be stored, which is precisely why pulling several images sharing a common base becomes progressively faster after the first one, and why understanding this content-addressed behavior clarifies otherwise surprising pull speed differences.

How the daemon checks before downloading

Before downloading any layer, the daemon checks whether a layer with that exact digest already exists locally, regardless of which image or pull originally produced it, and skips the download entirely if a match is found:

docker pull node:20
docker pull node:20-slim

3f29a8c1: Already exists
8a1f3c9b: Pulling fs layer

The "Already exists" line here confirms a specific layer required by node:20-slim was already present locally, left over from the earlier node:20 pull, since both images share at least that one underlying layer in common; only the layers genuinely unique to node:20-slim actually need to be downloaded.

Why this benefits pulling many related images

An organization with many services all built from the same base image, or the same family of related base image variants, experiences this benefit directly: the first pull of any image from that family downloads the shared base layers in full, and every subsequent pull of a different image sharing that same base completes considerably faster, downloading only the layers genuinely unique to each specific image:

docker pull my-api:1.4.2
docker pull my-worker:1.2.0
docker pull my-scheduler:1.1.0

If all three of these images share an identical base layer set, only the first pull actually downloads that shared portion in full; the second and third benefit from layer reuse for that shared portion, downloading only their own, individually distinct application layers on top of it.

Content addressing is what makes this possible

This reuse works specifically because layers are identified by a content hash rather than by any association with a particular image name or tag; two layers with byte-for-byte identical content always produce the identical digest regardless of which Dockerfile or which image they originated from, which is the underlying property that makes this kind of cross-image deduplication possible at all:

docker inspect node:20 --format '{{.RootFS.Layers}}'
docker inspect node:20-slim --format '{{.RootFS.Layers}}'

Comparing these layer digest lists directly reveals exactly which specific layers, if any, the two images genuinely share in common, which is a useful, concrete way to confirm the actual degree of overlap between two specific images beyond simply assuming they share a base.

Why a rebuilt base layer breaks reuse

If a base image is rebuilt, even with no meaningful functional change, any layer whose content genuinely differs at the byte level from the previously cached version produces a new, different digest, breaking the reuse benefit for that specific layer going forward until the newly rebuilt version has itself been pulled and cached locally:

docker pull node:20
# upstream rebuilds node:20 with a security patch
docker pull node:20

The second pull here genuinely re-downloads the layers that changed as part of the security patch, since their content, and therefore their digest, is now different from what was previously cached; this is expected, correct behavior, not a failure of the reuse mechanism, since the underlying content genuinely is different now.

Interaction with multi-stage build cache

The same underlying content-addressing principle that enables pull layer reuse also enables BuildKit's own build cache to recognize when an intermediate build stage's output matches something already cached, covered in dedicated build cache content; both mechanisms rely on the identical, fundamental property that identical content always produces an identical, comparable digest regardless of the specific operation, a pull or a build step, that originally produced it.

docker buildx build --cache-from=type=registry,ref=my-api:cache .

Verifying layer reuse is actually happening

Watching the pull output directly for "Already exists" lines, rather than only noting that an overall pull completed unusually quickly, confirms specifically which layers benefited from reuse and which were genuinely freshly downloaded:

docker pull my-worker:1.2.0 2>&1 | grep -c "Already exists"

A high count here, relative to the total number of layers the image actually has, confirms substantial layer reuse occurred; a low or zero count despite an expectation of shared layers suggests the images may not actually share as much common content as assumed, worth investigating directly through the layer digest comparison shown earlier.

Common mistakes

Assuming layer reuse only applies between different pulls of literally the same image, missing that it applies equally across entirely different images that happen to share common underlying layers.
Not recognizing that a rebuilt base image's changed layers genuinely require fresh downloading, mistaking this expected behavior for a failure of the reuse mechanism.
Assuming two images share more common layers than they actually do without directly comparing their layer digest lists to confirm the genuine degree of overlap.
Not watching pull output for "Already exists" lines specifically when trying to confirm whether expected layer reuse is genuinely occurring for a given pull.
Treating pull layer reuse and build cache as entirely separate, unrelated mechanisms, rather than recognizing both rely on the identical underlying content-addressing principle.

Pull layer reuse delivers genuine, often substantial speed benefits for any organization pulling multiple images sharing common base layers, made possible entirely by content-addressed layer identification, and confirming this benefit directly through "Already exists" output and layer digest comparison provides concrete evidence of exactly how much overlap exists between any two specific images rather than relying on assumption alone.