20.3.1 Container Hardening Step
A focused guide to Container Hardening Step, connecting core concepts with practical Docker and container operations.
Container hardening reduces the blast radius of a compromised container. A default Docker container runs as root inside the container, has access to a broad set of Linux capabilities, and runs with no filesystem write restrictions. If an attacker achieves code execution inside such a container, they have significant leverage to escalate privileges, access the host filesystem through volume mounts, or pivot to other containers. Hardening applies a series of restrictions that limit what a compromised process can do, making containers meaningfully more secure without requiring changes to the application code.
Non-Root User
The most impactful single hardening measure is running the container process as a non-root user. Most container processes have no need for root privileges. When a process runs as root and the application is compromised, the attacker has full root access inside the container and to any mounted volumes.
In the Dockerfile:
FROM node:20-alpine
WORKDIR /app
COPY --chown=node:node package.json package-lock.json ./
RUN npm ci --only=production
COPY --chown=node:node . .
USER node
CMD ["node", "server.js"]
The USER node instruction switches to the node user (UID 1000) for all subsequent instructions and for the container's runtime process. The --chown=node:node on COPY ensures the application files are owned by the node user, allowing it to read them.
At runtime (if the Dockerfile does not set the user):
docker run -u 1000:1000 my-image
Verifying the running user:
docker exec my-container id
Expected output: uid=1000(node) gid=1000(node) groups=1000(node) — not uid=0(root).
Read-Only Root Filesystem
Mounting the container's root filesystem as read-only prevents the compromised process from modifying system files, installing tools, or creating persistence mechanisms. Legitimate write activity can be directed to explicitly allowed paths via tmpfs mounts or volume mounts:
docker run -d \
--read-only \
--tmpfs /tmp \
--tmpfs /run \
-v app_data:/app/data \
my-image
--read-only makes the root filesystem immutable. --tmpfs /tmp and --tmpfs /run provide writable in-memory directories for temporary files and runtime state. The named volume app_data provides a writable persistent location for application data.
In Docker Compose:
services:
api:
image: my-image
read_only: true
tmpfs:
- /tmp
- /run
Capability Dropping
Linux capabilities divide root's privileges into independent units. A standard Docker container grants 14 capabilities to the container process by default, many of which most applications never use. Dropping all capabilities and adding back only those required follows the principle of least privilege:
docker run -d \
--cap-drop ALL \
--cap-add NET_BIND_SERVICE \
my-image
--cap-drop ALL removes every capability. --cap-add NET_BIND_SERVICE adds back only the capability to bind ports below 1024, if needed. An application listening on port 3000 or above needs no capabilities at all:
docker run -d \
--cap-drop ALL \
my-image
Common capabilities to re-add when needed:
| Capability | When needed |
|---|---|
NET_BIND_SERVICE | Binding ports < 1024 |
CHOWN | Changing file ownership |
DAC_OVERRIDE | Bypassing file permission checks |
SETUID / SETGID | Changing UID/GID |
Capabilities not in this list are almost never needed by application containers.
Seccomp Profiles
Seccomp (secure computing mode) filters the system calls a process can make. Docker applies a default seccomp profile that blocks around 44 of the ~300+ available syscalls — the dangerous ones like ptrace, kexec_load, and mount. The default profile is sufficient for most applications, but custom profiles can restrict further.
To apply a custom profile:
docker run -d \
--security-opt seccomp=/path/to/custom-seccomp.json \
my-image
To disable the seccomp filter (reduces security, used only for debugging):
docker run -d --security-opt seccomp=unconfined my-image
No New Privileges
The no-new-privileges flag prevents the container process from gaining additional privileges through setuid binaries or file capabilities:
docker run -d --security-opt no-new-privileges:true my-image
In Compose:
services:
api:
security_opt:
- no-new-privileges:true
This is inexpensive (no performance impact) and should be applied to every container that does not explicitly need to use setuid binaries.
Avoiding Privileged Mode
--privileged grants the container nearly all Linux capabilities and full access to the host's device tree. It is almost never appropriate for application containers:
# Never in production application containers:
docker run --privileged my-image # DO NOT DO THIS
--privileged is occasionally needed for containers that manage infrastructure (building images with Docker-in-Docker, running container management agents). Even in those cases, consider using specific capability additions instead of full privilege escalation.
Resource Limits
Hardened containers include resource limits to prevent a compromised or misbehaving container from consuming all host resources (a denial-of-service against other containers):
docker run -d \
--memory 256m \
--cpus 0.5 \
--pids-limit 100 \
my-image
--pids-limit 100 prevents a fork bomb from spawning unlimited processes. This is a simple and effective protection against a class of container escape and resource exhaustion attacks.
A Hardened Docker Run Command
Combining all measures:
docker run -d \
--name my-hardened-api \
--user 1000:1000 \
--read-only \
--tmpfs /tmp \
--cap-drop ALL \
--security-opt no-new-privileges:true \
--memory 256m \
--cpus 0.5 \
--pids-limit 100 \
-p 3000:3000 \
my-image:latest
A Hardened Compose Service
services:
api:
image: my-image:latest
user: "1000:1000"
read_only: true
tmpfs:
- /tmp
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
mem_limit: 256m
cpus: 0.5
pids_limit: 100
ports:
- "3000:3000"
Hardening Checklist
| Measure | Dockerfile | Runtime |
|---|---|---|
| Non-root user | USER instruction | -u 1000:1000 |
| Read-only filesystem | — | --read-only |
| All capabilities dropped | — | --cap-drop ALL |
| No new privileges | — | --security-opt no-new-privileges:true |
| Memory limit | — | --memory |
| CPU limit | — | --cpus |
| PID limit | — | --pids-limit |
| Minimal base image | FROM alpine / FROM distroless | — |
Applying all of these measures to every application container is the baseline for production Docker security. None of them require changes to the application code, and the performance impact of all measures combined is negligible.