✦ For everyone, free.

Practical knowledge for real and everyday life

Home

15.1.3 Log Troubleshooting

A focused guide to Log Troubleshooting, connecting core concepts with practical Docker and container operations.

Log troubleshooting in a Docker context covers the specific, recurring problems that prevent logs from being available, complete, or useful exactly when they are needed most, diagnosing issues like missing output, truncated history, or a logging pipeline that has silently stopped delivering, separate from the question of what to do with logs once they are actually available.

Logs appear empty or missing entirely

The most common starting point for log troubleshooting is a container that produces no visible output through docker logs at all, which has a small number of likely causes worth checking in order:

docker logs my-api
docker inspect --format='{{.HostConfig.LogConfig.Type}}' my-api

If the configured driver does not support reading back (syslog, fluentd, gelf, and others forward without local retention), docker logs will report an explicit error rather than empty output, which is the first thing to rule out. If the driver does support reading and the output is genuinely empty, the application itself may be buffering its output and never flushing it, or writing to a file inside the container instead of to stdout and stderr.

ENV PYTHONUNBUFFERED=1
process.stdout.write('test\n'); // bypasses higher-level buffering for diagnostic purposes

Logs stop appearing after a period of normal output

A container that logged normally for a while and then appears to stop is a different problem than one that never logged anything, and usually points to either the application itself becoming unresponsive (hung, deadlocked, or stuck in an infinite loop without producing output) or a downstream logging pipeline component failing silently:

docker exec my-api ps aux
docker stats my-api --no-stream

If the process is still consuming CPU but producing no new log lines, that pattern suggests the application is alive but stuck, perhaps in a tight loop or blocked on a call that never returns, rather than a logging infrastructure failure; if CPU usage has also dropped to near zero with the process still technically running, the application may be blocked waiting on something, such as a full log buffer in a driver configured to block rather than drop messages when its destination is unreachable.

Logs are present but incomplete or truncated

Missing chunks of expected log output, rather than a complete absence, often points to rotation removing older entries before they were read, or a collection agent failing to keep up with the volume being produced:

docker inspect --format='{{.HostConfig.LogConfig.Config}}' my-api
ls -la /var/lib/docker/containers/*/*.log

Comparing the configured max-size and max-file rotation settings against the actual log volume a container produces over a given period reveals whether rotation is removing content faster than expected; a collector reading from local files needs to keep pace with rotation, and a collector that has fallen behind can lose access to rotated-away content entirely.

A remote logging driver appears to be silently dropping messages

When using a remote-forwarding driver in asynchronous (non-blocking) mode, the driver may be silently discarding messages once its local buffer fills, which produces no error visible through normal container operation but results in gaps in the centralized log destination:

docker run -d --log-driver=fluentd --log-opt fluentd-async=true --log-opt fluentd-buffer-limit=1048576 my-api
docker events --filter event=die --filter container=my-api

Checking the receiving endpoint's own health and connectivity from the host, and temporarily increasing buffer size or switching to synchronous mode during a controlled test, can help confirm whether dropped messages during a known period correlate with the destination being unreachable at that time.

Timestamps appear inconsistent or out of order

Logs showing timestamps that do not match the order entries actually occurred, or that appear offset from expected wall-clock time, are usually attributable to either host clock drift, a difference between when an application produced a line and when Docker's driver captured it, or a downstream aggregation system applying its own ingestion timestamp instead of preserving the original:

date -u
docker exec my-api date -u

Comparing the host's and container's clocks directly rules out drift between them; checking whether a downstream log aggregation pipeline is configured to use the original message timestamp versus its own ingestion time resolves the second, more common cause of apparent timestamp inconsistency in a centralized logging system.

A specific log line never appears even though the application claims to log it

When code that is believed to execute and log successfully never appears in docker logs or the aggregation system, confirming the assumption that the code path actually executed, rather than assuming the logging pipeline is at fault, is the more efficient first step:

console.error('DIAGNOSTIC: reached this point', { timestamp: Date.now() });

Adding a temporary, unmistakable diagnostic log line directly at the suspected code path, then checking whether even that appears, quickly distinguishes between "the logging pipeline is broken" and "the code path never actually executed the way it was assumed to."

Verifying the full pipeline end to end

For a multi-stage logging pipeline, container to local file or driver, to collector, to aggregation backend, troubleshooting a missing log often requires checking each stage independently rather than assuming the failure is at whichever stage is easiest to inspect first:

docker logs my-api | tail -5
cat /var/lib/docker/containers/*/$(docker inspect --format='{{.Id}}' my-api)*.log | tail -5
docker logs collector-agent --tail 50

Working backward from the application outward through each stage of the pipeline isolates exactly which hop is failing, rather than guessing based on which stage happens to be most familiar or accessible.

Common mistakes

  • Assuming a logging pipeline failure before confirming the application code actually executed the logging statement in question.
  • Not checking whether the configured driver supports docker logs at all before concluding logs are missing due to a deeper problem.
  • Overlooking that asynchronous remote drivers can silently drop messages under load or during destination outages, with no visible error at the container level.
  • Troubleshooting only the most accessible stage of a multi-stage logging pipeline instead of verifying each hop independently.
  • Mistaking host-container clock drift for a logging pipeline bug when investigating apparently inconsistent timestamps.

Effective log troubleshooting in Docker comes down to systematically isolating which stage of the logging path, application code, stdout capture, driver delivery, collection, or aggregation, is actually responsible for the symptom observed, rather than assuming the most visible or most recently changed component is automatically the cause.

Content in this section