20.3.1.2 Capability Drop Practice

A focused guide to Capability Drop Practice, connecting core concepts with practical Docker and container operations.

Linux capabilities divide the monolithic root privilege into distinct, individually grantable units. A standard Docker container runs with a subset of capabilities granted by default — not full root, but more than a typical application needs. Capability drop practice is the work of identifying which capabilities a container actually requires, dropping all others, and verifying that the application still functions correctly. This reduces the set of kernel operations the container process can perform and limits what an attacker can do with a compromised container.

What Linux Capabilities Are

Traditionally, a process is either unprivileged (UID non-zero) or fully privileged (UID 0, root). Linux capabilities split root's privileges into independently grantable units. A process can be granted specific capabilities without being root, or a root process can have specific capabilities removed.

Examples:

CAP_NET_BIND_SERVICE — bind to ports below 1024
CAP_CHOWN — change file ownership
CAP_DAC_OVERRIDE — bypass discretionary access control (read/write any file)
CAP_SYS_PTRACE — trace/debug other processes
CAP_NET_RAW — use raw sockets (used by ping)
CAP_SYS_ADMIN — a catch-all for many administrative operations (loading kernel modules, configuring namespaces, mounting filesystems, etc.)

CAP_SYS_ADMIN is the most dangerous capability — it is sometimes called "the new root" because it enables so many privileged operations that it effectively grants root-level access.

Docker's Default Capability Set

A standard Docker container is granted these capabilities by default:

AUDIT_WRITE
CHOWN
DAC_OVERRIDE
FOWNER
FSETID
KILL
MKNOD
NET_BIND_SERVICE
NET_RAW
SETFCAP
SETGID
SETPCAP
SETUID
SYS_CHROOT

Most application containers use none of these. A Node.js API serving HTTP requests on port 3000 needs zero capabilities from this list.

Dropping All Capabilities

The safest starting point is to drop everything and add back only what testing confirms is required:

docker run -d \
  --cap-drop ALL \
  -p 3000:3000 \
  my-api:latest

Test the application with all capabilities dropped. If it functions normally, no capabilities are needed. If it fails with a permission error, identify which capability is required and add it back.

Verifying Current Capabilities

To see what capabilities a running container process has:

docker run --rm my-api:latest cat /proc/1/status | grep -i cap

CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000000000000000
CapAmb: 0000000000000000

All zeros across every capability field indicates zero capabilities — the minimum possible. On a container with the default set, CapPrm and CapEff will show non-zero hexadecimal bitmasks representing the granted capabilities.

The capsh tool (available in some base images) decodes the bitmask into human-readable capability names:

docker run --rm my-api:latest capsh --decode=00000000a80425fb

Dropping All and Adding Back Selectively

If the application genuinely requires specific capabilities:

docker run -d \
  --cap-drop ALL \
  --cap-add NET_BIND_SERVICE \
  -p 80:80 \
  my-nginx:latest

An nginx container serving on port 80 inside the container needs NET_BIND_SERVICE to bind to port 80 (below 1024). Everything else is dropped.

For a container that needs to send ICMP ping packets:

docker run -d \
  --cap-drop ALL \
  --cap-add NET_RAW \
  my-monitoring:latest

Diagnosing Capability Requirements

When an application fails after --cap-drop ALL, the error message often indicates the required capability:

Error: EACCES: permission denied, open '/etc/ssl/certs/ca-certificates.crt'

This is a file permission error (DAC_OVERRIDE) or a missing ownership (FOWNER) — check whether the file is readable by the container's user first, before granting a capability.

Error: listen EACCES: permission denied 0.0.0.0:80

This is a low-port binding error — the process needs NET_BIND_SERVICE or the application should listen on a high port instead.

Error: EPERM: operation not permitted, open '/proc/net/if_inet6'

This is a procfs access error, often seen in network monitoring tools that require raw socket access. Evaluate whether NET_ADMIN or NET_RAW is necessary, or reconsider whether the container needs this operation at all.

Using strace to Identify Required Capabilities

On a development container, strace can log which system calls fail with EPERM (operation not permitted), which helps identify which capabilities are being exercised:

docker run -d \
  --cap-drop ALL \
  --cap-add SYS_PTRACE \
  my-app:latest \
  strace -e trace=all -f my-binary 2>&1 | grep EPERM

This approach requires SYS_PTRACE temporarily to run strace, which defeats the purpose in production but is useful for capability auditing in a development environment.

Capabilities in Docker Compose

services:
  api:
    image: my-api:latest
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

  nginx:
    image: my-nginx:latest
    cap_drop:
      - ALL

Capability Impact on Common Container Types

Container type	Required capabilities
Web API (port >1024)	None — `--cap-drop ALL`
Web server (port 80/443, root user)	`NET_BIND_SERVICE`
ICMP-based monitoring	`NET_RAW`
Syslog forwarder	`SYSLOG`
Network packet capture	`NET_ADMIN`, `NET_RAW`
Kernel module loader	`SYS_MODULE` (avoid in production)
Container-in-container (DinD)	Many — avoid using this pattern

Why NET_RAW Is Worth Dropping

NET_RAW is in Docker's default set but enables raw socket access — the ability to craft arbitrary network packets. This can be exploited for network-level attacks if the container is compromised. Most applications do not need it. Dropping it as part of --cap-drop ALL is safe for standard HTTP/HTTPS services.

docker run -d --cap-drop ALL --cap-drop NET_RAW my-api:latest

Dropping only NET_RAW without dropping everything else provides partial hardening if you are not ready to audit full capability requirements.