17.1.1.5 Package Cache Cleaning
A focused guide to Package Cache Cleaning, connecting core concepts with practical Docker and container operations.
Package cache cleaning is the practice of removing the downloaded package archives and metadata that system and language package managers retain after an installation completes, a step required in every major ecosystem but with a different specific command and cleanup target for each, and one that must happen within the same layer that performed the installation to actually reduce the final image's size.
Why the same-layer requirement applies universally
Regardless of which package manager is involved, Docker's layer model means a file's removal only actually reduces image size if that removal happens within the same RUN instruction that created the file; a separate, later instruction's removal leaves the earlier layer's content, including whatever was supposedly cleaned up, fully intact and still counted toward the image's total size:
RUN apt-get update && apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*
RUN apt-get update && \
apt-get install -y curl && \
rm -rf /var/lib/apt/lists/*
The second form actually reduces final image size, since the cache removal happens within the same layer as the installation that created it; the first form does not, despite appearing to clean up, since the original layer containing the cache is already committed by the time the second, separate RUN instruction executes.
Debian and Ubuntu-based images (apt)
RUN apt-get update && \
apt-get install -y --no-install-recommends curl && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
apt-get clean removes downloaded .deb package archives from the local cache, while removing /var/lib/apt/lists/* clears the package index data that apt-get update downloaded; both contribute meaningfully to the layer's size and should be removed together.
Alpine-based images (apk)
RUN apk add --no-cache curl
Alpine's apk package manager supports a --no-cache flag that avoids retaining the package index cache in the first place, which is simpler than a separate cleanup step entirely, since it prevents the cache from being created rather than requiring it to be removed afterward.
RHEL, CentOS, and Fedora-based images (yum/dnf)
RUN yum install -y curl && \
yum clean all && \
rm -rf /var/cache/yum
RUN dnf install -y curl && \
dnf clean all && \
rm -rf /var/cache/dnf
Both yum and its successor dnf support a clean all subcommand that removes cached package data, and explicitly removing the underlying cache directory afterward provides additional certainty that no residual cache files remain.
Node.js (npm, yarn, pnpm)
RUN npm ci && npm cache clean --force
RUN yarn install --frozen-lockfile && yarn cache clean
Node package manager caches are generally less impactful on final image size than system package manager caches, particularly when paired with a multi-stage build that excludes the entire dependency installation stage from the final image, but explicit cache cleaning remains relevant for a single-stage build or for the install stage's own intermediate size during the build itself.
Python (pip)
RUN pip install --no-cache-dir -r requirements.txt
Python's pip supports a --no-cache-dir flag directly on the install command, which, similar to Alpine's apk --no-cache, avoids creating the cache in the first place rather than requiring a separate removal step afterward.
Go modules
RUN go mod download
Go's module cache is typically not a significant concern for final image size specifically because compiled Go binaries are usually deployed through a multi-stage build's final stage, which never includes the module cache at all, only the resulting compiled binary; explicit cache cleaning is rarely necessary for Go specifically as a result of this structural difference from interpreted language ecosystems.
Ruby (gem and bundler)
RUN bundle install && \
rm -rf /usr/local/bundle/cache
RUN bundle config set --local without 'development test' && \
bundle install
Configuring bundler to skip development and test dependency groups entirely, in addition to cleaning the gem cache afterward, addresses both the dependency pruning and cache cleaning concerns together for Ruby-based images specifically.
Java (Maven and Gradle)
RUN mvn dependency:go-offline && \
mvn clean package -DskipTests
Maven and Gradle's dependency caches are typically large and, like Go's module cache, are generally excluded from the final image entirely through a multi-stage build pattern rather than cleaned within a single stage, since the final image for a Java application typically contains only the compiled JAR or WAR artifact and a JRE, not the build tooling or its dependency cache at all.
Verifying cache cleanup actually reduced layer size
After applying cache cleaning to any package manager, confirming the actual size impact directly through docker history verifies the cleanup genuinely took effect within the same layer rather than assuming based on the Dockerfile's apparent structure:
docker history my-api:1.4.2 --no-trunc
A layer's reported size should reflect the net effect of both the installation and the same-layer cleanup combined; a layer still showing a large size despite an apparent cleanup instruction within it suggests the cleanup did not actually target the correct cache location for that specific package manager and version.
Common mistakes
- Performing cache cleanup in a separate
RUNinstruction from the installation that created the cache, failing to actually reduce the final image size despite the cleanup appearing to happen. - Using a generic cleanup approach copied from a different package manager's documentation without confirming the actual correct cache location for the specific package manager and image in use.
- Not using a package manager's own no-cache installation flag where available, performing unnecessary cleanup work for a cache that did not need to be created in the first place.
- Assuming language ecosystem package caches matter as much as system package manager caches, when many language-specific caches are already excluded from the final image entirely through standard multi-stage build patterns.
- Not verifying actual layer size impact directly through
docker historyafter applying cache cleaning, assuming success based on the Dockerfile's apparent correctness alone.
Package cache cleaning requires a different specific command for each package manager ecosystem, but the universal requirement, performing the cleanup within the same layer that created the cache, applies identically across all of them, and verifying the actual size impact directly through docker history confirms the cleanup genuinely worked rather than only appearing to.