✦ For everyone, free.

Practical knowledge for real and everyday life

Home

16.1 Build Troubleshooting

A focused guide to Build Troubleshooting, connecting core concepts with practical Docker and container operations.

Build troubleshooting addresses the specific category of Docker problems that occur while constructing an image, before any container from it ever runs, and effective diagnosis depends heavily on reading the build output carefully, since docker build generally reports the exact instruction and step at which a failure occurred, considerably narrowing the search compared to a runtime failure.

Reading build output carefully

Modern Docker builds, particularly with BuildKit, report progress with a clear step-by-step breakdown, and the failure point is usually marked explicitly:

docker build --progress=plain -t my-api .
#8 [4/6] RUN npm ci
#8 ERROR: process "/bin/sh -c npm ci" did not complete successfully: exit code: 1

The --progress=plain flag produces more verbose, complete output than the default condensed progress display, which is particularly useful when a step's output is being truncated or summarized in a way that hides the actual underlying error message.

Stale build cache producing unexpected results

A frequent source of confusing build behavior, where a change to source code does not seem to take effect, is a cached layer from an earlier build being reused incorrectly, particularly for instructions whose cache invalidation depends on file content that Docker is not detecting as having changed:

docker build --no-cache -t my-api .
COPY package*.json .
RUN npm install
COPY . .

Building with --no-cache rules out stale caching as the explanation immediately; if the build succeeds differently without cache, the layer ordering or cache invalidation logic in the Dockerfile itself, often related to copying files in an order that does not correctly trigger cache invalidation when they change, is worth reviewing.

Dependency installation failures

Failures during dependency installation steps are common and often have a root cause unrelated to Docker itself, network connectivity to a package registry, an incompatible version pin, or a missing system-level dependency that the package manager assumes is already present:

docker build --progress=plain -t my-api . 2>&1 | grep -A 5 "npm ERR"
RUN apt-get update && apt-get install -y build-essential

Many dependency installation failures inside a minimal base image stem from a missing system library or compiler that the package being installed assumes is available; installing the specific missing system dependency, often discoverable directly from the error message, resolves this category of failure more often than attempting to work around it in some other way.

Context size and unexpected file inclusion

A build that is unexpectedly slow to start, or that fails due to an unrelated large file being copied, often points to the build context (the files and directories sent to the Docker daemon at the start of a build) including more than intended:

docker build -t my-api . 2>&1 | head -5
Sending build context to Docker daemon  1.2GB

A build context this large for a typical application is a strong signal that .dockerignore is missing or incomplete, allowing directories like node_modules, .git, or large local data files to be included unnecessarily:

node_modules
.git
*.log

Adding a .dockerignore file excluding these directories both speeds up the build significantly and avoids accidentally including files that should never end up inside the image at all.

Multi-stage build confusion

For multi-stage builds, a common source of confusion is referencing a file or artifact from an earlier stage incorrectly, either misnaming the stage or forgetting that each stage starts from a clean filesystem unless explicitly copying from a previous one:

FROM node:20 AS build
RUN npm run build

FROM nginx:alpine
COPY --from=build /app/dist /usr/share/nginx/html

A build failing with a "file not found" error during a COPY --from= instruction usually means either the referenced stage name is misspelled, the source path within that stage does not actually contain the expected file, or the build step that was supposed to produce that file in the earlier stage did not actually run successfully or did not place the output where expected.

Platform and architecture mismatches

Building an image intended for a different CPU architecture than the build machine, or pulling a base image that defaults to the wrong architecture, can produce a build that succeeds but a resulting image that fails or behaves unexpectedly when actually run on the target architecture:

docker build --platform=linux/amd64 -t my-api .
docker buildx build --platform=linux/amd64,linux/arm64 -t my-api .

Explicitly specifying the target platform, rather than relying on the build machine's own default architecture, avoids a class of subtle failure that only manifests once the image is deployed to infrastructure with a different CPU architecture than where it was built.

Build arguments and secrets not being passed correctly

A build step that depends on a build argument or secret not actually being supplied at build time produces a failure that can look like an unrelated configuration problem if the missing argument's absence is not the first thing checked:

docker build --build-arg NODE_ENV=production -t my-api .
ARG NODE_ENV
RUN echo "Building for $NODE_ENV"

Confirming the exact build arguments and secrets actually passed to a specific build invocation, rather than assuming they were supplied correctly based on the command that was intended to be run, is a useful first check when a build step behaves as though a value it depends on is missing or empty.

Reproducing a build failure interactively

For a build step that fails in a way that is not immediately clear from the error output alone, starting an interactive container from the last successfully cached layer and running the failing command manually inside it often reveals more detail than the build output alone provides:

docker build --target build -t my-api:debug .
docker run -it --rm my-api:debug sh

Running the specific failing command manually inside this interactive session, with the ability to inspect the filesystem state and try variations, is frequently the fastest way to understand a build failure that the automated build output describes only briefly.

Common mistakes

  • Not using --progress=plain when the default condensed output is hiding or truncating the actual underlying error message.
  • Assuming a build failure is caused by something in the Dockerfile without first ruling out a stale build cache with --no-cache.
  • Missing a .dockerignore file, leading to an unnecessarily large and slow build context that can also cause unrelated files to be inadvertently copied into the image.
  • Misnaming or mismatching stage references in a multi-stage build, producing a "file not found" error that initially looks unrelated to the actual cause.
  • Building on one CPU architecture without explicitly specifying the target platform when the image is intended to run on a different one.

Build troubleshooting is generally more tractable than runtime troubleshooting precisely because the build process reports failures at a specific, identifiable step, and working through the build output carefully, ruling out cache and context-size issues early, and reproducing a stubborn failure interactively from the last successful layer resolves the large majority of build problems without needing to guess at the cause.

Content in this section