✦ For everyone, free.

Practical knowledge for real and everyday life

Home

20.3.1 Container Hardening Step

A focused guide to Container Hardening Step, connecting core concepts with practical Docker and container operations.

Container hardening reduces the blast radius of a compromised container. A default Docker container runs as root inside the container, has access to a broad set of Linux capabilities, and runs with no filesystem write restrictions. If an attacker achieves code execution inside such a container, they have significant leverage to escalate privileges, access the host filesystem through volume mounts, or pivot to other containers. Hardening applies a series of restrictions that limit what a compromised process can do, making containers meaningfully more secure without requiring changes to the application code.

Non-Root User

The most impactful single hardening measure is running the container process as a non-root user. Most container processes have no need for root privileges. When a process runs as root and the application is compromised, the attacker has full root access inside the container and to any mounted volumes.

In the Dockerfile:

FROM node:20-alpine
WORKDIR /app
COPY --chown=node:node package.json package-lock.json ./
RUN npm ci --only=production
COPY --chown=node:node . .
USER node
CMD ["node", "server.js"]

The USER node instruction switches to the node user (UID 1000) for all subsequent instructions and for the container's runtime process. The --chown=node:node on COPY ensures the application files are owned by the node user, allowing it to read them.

At runtime (if the Dockerfile does not set the user):

docker run -u 1000:1000 my-image

Verifying the running user:

docker exec my-container id

Expected output: uid=1000(node) gid=1000(node) groups=1000(node) — not uid=0(root).

Read-Only Root Filesystem

Mounting the container's root filesystem as read-only prevents the compromised process from modifying system files, installing tools, or creating persistence mechanisms. Legitimate write activity can be directed to explicitly allowed paths via tmpfs mounts or volume mounts:

docker run -d \
  --read-only \
  --tmpfs /tmp \
  --tmpfs /run \
  -v app_data:/app/data \
  my-image

--read-only makes the root filesystem immutable. --tmpfs /tmp and --tmpfs /run provide writable in-memory directories for temporary files and runtime state. The named volume app_data provides a writable persistent location for application data.

In Docker Compose:

services:
  api:
    image: my-image
    read_only: true
    tmpfs:
      - /tmp
      - /run

Capability Dropping

Linux capabilities divide root's privileges into independent units. A standard Docker container grants 14 capabilities to the container process by default, many of which most applications never use. Dropping all capabilities and adding back only those required follows the principle of least privilege:

docker run -d \
  --cap-drop ALL \
  --cap-add NET_BIND_SERVICE \
  my-image

--cap-drop ALL removes every capability. --cap-add NET_BIND_SERVICE adds back only the capability to bind ports below 1024, if needed. An application listening on port 3000 or above needs no capabilities at all:

docker run -d \
  --cap-drop ALL \
  my-image

Common capabilities to re-add when needed:

CapabilityWhen needed
NET_BIND_SERVICEBinding ports < 1024
CHOWNChanging file ownership
DAC_OVERRIDEBypassing file permission checks
SETUID / SETGIDChanging UID/GID

Capabilities not in this list are almost never needed by application containers.

Seccomp Profiles

Seccomp (secure computing mode) filters the system calls a process can make. Docker applies a default seccomp profile that blocks around 44 of the ~300+ available syscalls — the dangerous ones like ptrace, kexec_load, and mount. The default profile is sufficient for most applications, but custom profiles can restrict further.

To apply a custom profile:

docker run -d \
  --security-opt seccomp=/path/to/custom-seccomp.json \
  my-image

To disable the seccomp filter (reduces security, used only for debugging):

docker run -d --security-opt seccomp=unconfined my-image

No New Privileges

The no-new-privileges flag prevents the container process from gaining additional privileges through setuid binaries or file capabilities:

docker run -d --security-opt no-new-privileges:true my-image

In Compose:

services:
  api:
    security_opt:
      - no-new-privileges:true

This is inexpensive (no performance impact) and should be applied to every container that does not explicitly need to use setuid binaries.

Avoiding Privileged Mode

--privileged grants the container nearly all Linux capabilities and full access to the host's device tree. It is almost never appropriate for application containers:

# Never in production application containers:
docker run --privileged my-image   # DO NOT DO THIS

--privileged is occasionally needed for containers that manage infrastructure (building images with Docker-in-Docker, running container management agents). Even in those cases, consider using specific capability additions instead of full privilege escalation.

Resource Limits

Hardened containers include resource limits to prevent a compromised or misbehaving container from consuming all host resources (a denial-of-service against other containers):

docker run -d \
  --memory 256m \
  --cpus 0.5 \
  --pids-limit 100 \
  my-image

--pids-limit 100 prevents a fork bomb from spawning unlimited processes. This is a simple and effective protection against a class of container escape and resource exhaustion attacks.

A Hardened Docker Run Command

Combining all measures:

docker run -d \
  --name my-hardened-api \
  --user 1000:1000 \
  --read-only \
  --tmpfs /tmp \
  --cap-drop ALL \
  --security-opt no-new-privileges:true \
  --memory 256m \
  --cpus 0.5 \
  --pids-limit 100 \
  -p 3000:3000 \
  my-image:latest

A Hardened Compose Service

services:
  api:
    image: my-image:latest
    user: "1000:1000"
    read_only: true
    tmpfs:
      - /tmp
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    mem_limit: 256m
    cpus: 0.5
    pids_limit: 100
    ports:
      - "3000:3000"

Hardening Checklist

MeasureDockerfileRuntime
Non-root userUSER instruction-u 1000:1000
Read-only filesystem--read-only
All capabilities dropped--cap-drop ALL
No new privileges--security-opt no-new-privileges:true
Memory limit--memory
CPU limit--cpus
PID limit--pids-limit
Minimal base imageFROM alpine / FROM distroless

Applying all of these measures to every application container is the baseline for production Docker security. None of them require changes to the application code, and the performance impact of all measures combined is negligible.

Content in this section