20.2.2.2 Layer Cache Practice

A focused guide to Layer Cache Practice, connecting core concepts with practical Docker and container operations.

Layer caching is Docker's mechanism for reusing previously built image layers instead of re-executing Dockerfile instructions when their inputs have not changed. Understanding how Docker decides whether a layer is valid for cache reuse — and how to arrange Dockerfile instructions to maximize the number of cache hits — is the most impactful single practice for reducing build times in active development and CI pipelines.

How Docker Evaluates Cache Validity

Docker checks each Dockerfile instruction against its local cache using a chain of rules:

Docker computes a cache key for the instruction based on the instruction text and, for COPY and ADD, the checksum of the files being copied.
If a matching entry exists in the cache and the parent layer's ID matches, the layer is a cache hit — Docker uses the stored layer and moves to the next instruction.
If no match is found, or if the parent layer changed (because an earlier instruction was a cache miss), Docker executes the instruction, creates a new layer, and adds it to the cache.

The critical implication: a cache miss at any instruction invalidates all subsequent layers. Docker does not cherry-pick cache hits further down the Dockerfile once a miss occurs at any point.

The Cascade Effect

FROM node:20-alpine          # cache hit
WORKDIR /app                 # cache hit
COPY . .                     # cache miss (source file changed)
RUN npm install              # re-executed — cache is invalidated
CMD ["node", "server.js"]    # re-executed

Even though npm install and its output have not changed, it must re-execute because the COPY . . step above it was a cache miss. This is the most common source of slow builds in naive Dockerfiles — a source code change triggers a full dependency reinstall.

The Dependency Manifest Pattern

Split the copy into two steps: copy the dependency manifest files first, install dependencies, then copy the source code:

FROM node:20-alpine
WORKDIR /app
COPY package.json package-lock.json ./    # rarely changes
RUN npm ci                                # slow, but cached when package files unchanged
COPY . .                                  # changes on every source edit
CMD ["node", "server.js"]

When only source files change (not package.json):

FROM node:20-alpine          # cache hit
WORKDIR /app                 # cache hit
COPY package.json ...        # cache hit (files unchanged)
RUN npm ci                   # cache hit
COPY . .                     # cache miss (source changed)
CMD ["node", "server.js"]    # re-executed

npm ci is now a cache hit every time only source code changes. A 30-second dependency install becomes a sub-second cache lookup.

The Same Pattern for Other Package Managers

Python (pip):

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]

Python (Poetry):

FROM python:3.12-slim
WORKDIR /app
COPY pyproject.toml poetry.lock ./
RUN pip install poetry && poetry install --no-root
COPY . .
CMD ["poetry", "run", "python", "app.py"]

Go:

FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o /app/server .

Java (Maven):

FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src/ ./src/
RUN mvn package -DskipTests

In each case, the dependency resolution file (requirements.txt, go.mod/go.sum, pom.xml) is copied first, dependencies are downloaded and cached, and source code is copied afterward.

Instruction Order Within RUN

For RUN instructions that install packages and clean up, the order matters for both correctness and caching:

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
       curl \
       git \
    && rm -rf /var/lib/apt/lists/*

Combining the update, install, and cleanup into a single RUN ensures the package list cache is never stored as part of a layer. Splitting them across three RUN instructions would mean the cache survives into a stored layer even after cleanup.

What Invalidates a COPY Cache Entry

For COPY instructions, Docker computes a checksum over the files being copied. Any change to the file content, the file's modification timestamp (in some configurations), or file permissions causes a cache miss.

This means:

Editing a source file causes a cache miss on the COPY . . that includes it.
Running npm install on the host (which modifies package-lock.json) causes a cache miss on the COPY package-lock.json ./ step.
Regenerating auto-generated files causes cache misses even if the logical content is the same.

Forcing a Cache Bypass

To rebuild all layers from scratch:

docker build --no-cache -t my-image .

This is useful when a RUN apt-get install step caches an old package version and you want to pull the current version, or when troubleshooting a build issue where caching might be masking a problem.

To invalidate the cache starting from a specific instruction without editing the Dockerfile, a common trick is to add a build argument with a changing value:

ARG CACHE_BUST=1
RUN apt-get update && apt-get install -y ...

docker build --build-arg CACHE_BUST=$(date +%s) -t my-image .

The CACHE_BUST argument changes on each run, causing RUN apt-get update and everything after it to miss the cache.

Visualizing Cache Behavior

With good instruction order, a typical source-code-only change rebuilds one layer instead of three. In a project with a 30-second npm install, the good-order Dockerfile runs in under 1 second for most builds; the bad-order Dockerfile always takes 30+ seconds.

Cache in CI Pipelines

Build caches are stored locally on the machine running the build. In CI environments where each build runs on a fresh ephemeral runner, the local cache is empty on every build. To benefit from caching in CI:

docker build --cache-from my-image:latest -t my-image:latest .

--cache-from tells Docker to use the specified image's layers as a cache source, even if they were built on a different machine. Pull the previous image before the build, use it as a cache source, then push the new image:

docker pull my-registry/my-image:latest || true
docker build --cache-from my-registry/my-image:latest -t my-registry/my-image:latest .
docker push my-registry/my-image:latest

Modern Docker Buildx supports registry-based cache with dedicated cache manifests, which is more efficient than using the full image as a cache source:

docker buildx build \
  --cache-from type=registry,ref=my-registry/my-image:cache \
  --cache-to type=registry,ref=my-registry/my-image:cache,mode=max \
  -t my-registry/my-image:latest \
  .