17.1.2.5 Clean Source Builds

A focused guide to Clean Source Builds, connecting core concepts with practical Docker and container operations.

Clean source builds means building an image from a working directory that genuinely reflects only committed, version-controlled source code, rather than one also containing uncommitted local changes, stray build artifacts from a previous run, or files that exist only on one specific developer's machine, since a build context contaminated by any of these can silently produce an image that does not actually correspond to anything traceable in version control.

Why an uncommitted local change is a hidden risk

A build that succeeds against a working directory containing uncommitted changes produces an image that cannot be reproduced from version control alone, since the actual content built includes something that was never committed and exists nowhere except that one developer's local filesystem at that exact moment:

git status

Changes not staged for commit:
  modified:   config/production.json

docker build -t my-api:1.4.2 .

This build incorporates the uncommitted modification to production.json, but nothing about the resulting image indicates this; anyone later checking out the exact commit tagged 1.4.2 and rebuilding would get a different result than what was actually built and potentially deployed, since the uncommitted change is invisible to anyone without access to that specific developer's local working directory at that specific moment.

Verifying a clean working directory before building

A simple, direct check before any build intended for anything beyond casual local iteration confirms the working directory contains no uncommitted changes:

git status --porcelain

if [ -n "$(git status --porcelain)" ]; then
  echo "ERROR: uncommitted changes present, refusing to build"
  exit 1
fi
docker build -t my-api .

Wrapping this check into the build process itself, rather than relying on a developer remembering to check manually, makes clean source builds the enforced default rather than something dependent on individual discipline.

CI environments and the structural advantage they provide

A CI pipeline that performs a fresh checkout of the exact, specific commit before building inherently produces a clean source build, since there is no possibility of accumulated local state, uncommitted changes, or stray files from a previous, unrelated build ever being present in a freshly cloned working directory:

build:
  script:
    - git clone --depth 1 --branch "$CI_COMMIT_TAG" "$CI_REPOSITORY_URL" /tmp/build
    - cd /tmp/build
    - docker build -t my-api:"$CI_COMMIT_TAG" .

This is one of the strongest practical arguments for treating CI as the authoritative build environment for anything deployed beyond casual local testing, rather than ever deploying directly from a developer's local build, since CI structurally cannot include the kind of accidental local contamination a developer's own working directory might.

Stray files left over from a previous build or unrelated process

Even without any uncommitted git change, a working directory can accumulate files that are not tracked by version control at all but still get included in the build context unless explicitly excluded, build output from a previous, different build configuration, downloaded dependencies cached outside of the project's normal dependency directory, or editor-generated temporary files:

git clean -ndx

Would remove dist/
Would remove .cache/
Would remove debug.log

Running git clean in dry-run mode (-n) before a build surfaces exactly what untracked content exists in the working directory, which is useful for identifying anything that should either be added to .gitignore and .dockerignore or genuinely removed before the build proceeds.

The relationship to reproducible builds

Clean source builds are a necessary precondition for reproducible builds covered elsewhere; a build that incorporates uncommitted or stray, untracked content cannot be reproduced from version control alone regardless of how carefully dependency versions and base images are otherwise pinned, since the actual source content itself is the part that diverges from what version control can reconstruct.

git log -1 --format="%H"
docker inspect my-api:1.4.2 --format '{{index .Config.Labels "org.opencontainers.image.revision"}}'

Embedding the exact commit hash as a label during the build, and confirming it matches the commit actually checked out, provides a direct, verifiable link between the built image and version control, but this link is only meaningful if the build genuinely reflected that exact commit's content with nothing additional or different mixed in.

Tagging and labeling to capture build provenance

Recording the exact commit, branch, and build environment directly as image labels creates a durable, traceable record tying a specific image back to its exact source, which is valuable for any later investigation needing to confirm precisely what was built and from where:

ARG GIT_COMMIT
LABEL org.opencontainers.image.revision="${GIT_COMMIT}"

docker build --build-arg GIT_COMMIT="$(git rev-parse HEAD)" -t my-api .

This provenance information is only trustworthy if the build that produced it was genuinely a clean source build in the first place; embedding an accurate commit hash into an image that was actually built from a working directory containing additional, uncommitted changes creates a misleading, false sense of traceability.

Local development builds versus deployment builds

It is reasonable and expected for local, interactive development builds to include uncommitted, in-progress changes, since that is precisely the point of local iteration; the clean source build requirement applies specifically to any build whose result might actually be deployed, shared, or relied upon beyond that one developer's own immediate, local testing session.

docker build -t my-api:dev-local .

Clearly distinguishing, through tagging convention or process, between these casual local iteration builds and anything intended for actual deployment prevents the clean source requirement from becoming unreasonable friction during normal day-to-day development work.

Common mistakes

Building and deploying directly from a local working directory without verifying it contains no uncommitted changes, producing an image that cannot be reproduced from version control alone.
Not running CI from a fresh, clean checkout of the exact target commit, allowing accumulated local CI runner state to contaminate the build context.
Overlooking untracked, stray files that are not part of any git change but still get included in the build context unless explicitly excluded.
Embedding commit hash labels into an image without first confirming the build was genuinely a clean source build, creating misleading, inaccurate provenance information.
Applying the clean source build requirement uniformly to casual local development iteration, creating unnecessary friction where it provides little corresponding benefit.

Clean source builds ensure an image's content can actually be traced back to and reproduced from version control alone, which requires verifying no uncommitted changes or stray, untracked files are present before any build intended for deployment, a guarantee CI environments provide structurally through fresh checkouts in a way that local developer machines generally cannot without deliberate, explicit verification.