✦ For everyone, free.

Practical knowledge for real and everyday life

Home

19.1.3.1 Pull Image Download

A focused guide to Pull Image Download, connecting core concepts with practical Docker and container operations.

Pull image download covers the lower-level mechanics of how layers are actually retrieved and verified during a pull, daemon-configured concurrency and bandwidth limits, per-layer digest verification, the storage driver's role during extraction, and the temporary disk space a pull genuinely requires beyond the final image's own reported size.

Daemon-level concurrency configuration

The daemon's default concurrent download and upload limits can be adjusted directly, which affects how many layers download simultaneously during any given pull:

{
  "max-concurrent-downloads": 3,
  "max-concurrent-uploads": 5
}
systemctl restart docker

Increasing this limit can improve overall pull speed on a host with ample bandwidth pulling many-layered images, while decreasing it is sometimes appropriate on a host with genuinely constrained network capacity shared with other critical traffic, where saturating available bandwidth with a single large pull would be undesirable.

Bandwidth throttling considerations

Docker itself does not provide a direct, built-in bandwidth throttling option for pulls specifically; achieving this requires either host-level traffic shaping tools or, in some environments, a network policy applied at the infrastructure layer rather than anything configured directly within Docker's own daemon settings.

tc qdisc add dev eth0 root tbf rate 10mbit burst 32kbit latency 50ms

This kind of host-level traffic control is a more involved, lower-level approach than anything Docker itself exposes directly, worth knowing about specifically for environments where pull bandwidth genuinely needs active, deliberate management rather than being left entirely uncontrolled.

Per-layer digest verification during download

Each layer's content is verified against its manifest-declared digest as it completes downloading, before being considered successfully retrieved, which is the same content-addressed integrity guarantee covered in dedicated OCI image content, applying transparently and automatically during every pull regardless of whether anyone explicitly requests this verification:

3f29a8c1: Verifying Checksum

A layer that fails this verification is automatically retried rather than silently accepted with potentially corrupted content, which is a structural property of how the pull process works rather than an optional, separately configured safety check.

The storage driver's role during extraction

After downloading and verification, each layer is extracted and integrated into the local image store by the active storage driver, commonly overlay2 on modern Linux systems, which handles the actual filesystem-level work of unpacking the layer's content into the appropriate location for the union filesystem to later present correctly:

docker info --format '{{.Driver}}'

This extraction step is a genuinely CPU and disk I/O-intensive operation distinct from the network-bound download step, which is exactly why a pull's apparent bottleneck can shift from network-bound (during download) to disk-and-CPU-bound (during extraction) within the same overall pull operation, as covered in the progress-stage diagnosis content for the pull command generally.

Temporary disk space requirements beyond final image size

A pull genuinely requires more temporary disk space than the final image's own reported size, since compressed layer downloads need space before extraction, and the extracted, uncompressed content needs additional space alongside the still-present compressed download before the temporary download data is eventually cleaned up:

df -h
docker pull large-image:latest

A pull failing with a disk space error on a host that appears to have just barely enough free space for the final image's reported size is a common, specific symptom of this temporary space requirement not having been accounted for; a host should generally have meaningfully more free space available than just the final image size alone to safely accommodate a pull's full process.

Cleanup of temporary pull artifacts

Once a pull completes successfully, any genuinely temporary, intermediate artifacts from the download and extraction process are cleaned up automatically; a pull that fails partway through, however, can occasionally leave partial, orphaned data behind that a subsequent docker system prune or daemon restart typically resolves:

docker system df
docker system prune

Checking disk usage after a failed or interrupted pull, and running a prune if unexpectedly high usage persists, addresses this occasional cleanup gap directly.

Common mistakes

  • Assuming a host with just barely enough free space for an image's final reported size has sufficient capacity for the pull process, without accounting for the additional temporary space genuinely required during download and extraction.
  • Not adjusting daemon-level concurrency settings when pull speed genuinely needs tuning, either to take advantage of ample available bandwidth or to avoid saturating genuinely constrained capacity.
  • Expecting Docker itself to provide direct, built-in bandwidth throttling for pulls, rather than recognizing this requires host-level traffic shaping or infrastructure-level policy instead.
  • Not recognizing that a pull's bottleneck can genuinely shift between network-bound download and disk-and-CPU-bound extraction within the same overall operation.
  • Not checking for and cleaning up orphaned temporary artifacts after a failed or interrupted pull, leaving unexpected disk usage unaddressed.

Pull image download mechanics involve daemon-configurable concurrency, automatic per-layer digest verification as a structural property of the process, a storage-driver-mediated extraction step distinct from the network-bound download step, and a genuine temporary disk space requirement beyond the final image's reported size, all of which matter for correctly diagnosing pull performance and disk space issues beyond what the CLI command's own flags and basic progress output alone reveal.