15.1.3.3 Wrong Log Level

A focused guide to Wrong Log Level, connecting core concepts with practical Docker and container operations.

A wrong log level is a logging defect distinct from missing or excessive logs: the information is present and captured correctly, but assigned a severity that does not match its actual operational significance, which quietly undermines the usefulness of severity-based filtering and alerting regardless of how well the rest of the logging pipeline is configured.

Why level accuracy matters more than volume

A logging pipeline can be perfectly configured, bounded rotation, centralized aggregation, structured fields, and still be ineffective if the severity levels attached to individual log lines do not reflect reality. Alerting and dashboards built on top of log levels assume that "error" means something an operator should care about and "debug" means something they generally should not; when that assumption is violated throughout a codebase, every tool built on top of it inherits the same unreliability.

logger.error('Cache miss for key', { key }); // routine, expected event logged as an error

logger.debug('Payment processing failed', { orderId, error: err.message }); // serious event logged too quietly

Both of these lines are wrong in the same underlying way: the severity does not match the actual significance of the event, just in opposite directions.

Over-severe logging trains operators to ignore alerts

Logging routine, expected conditions at error level is the more common and more damaging mistake, since it directly degrades the signal-to-noise ratio of whatever alerting is built on error-level logs:

logger.error('Rate limit reached for user', { userId }); // expected, handled condition
logger.error('Retrying request', { attempt: 2 }); // expected, recoverable condition

If error-level alerts fire dozens of times per day for conditions the application already handles gracefully, operators learn, reasonably, to tune them out, which means a genuinely serious error arriving alongside this noise is statistically likely to be missed or dismissed along with everything else.

logger.warn('Rate limit reached for user', { userId }); // expected but worth tracking
logger.info('Retrying request', { attempt: 2 }); // routine operational detail

Reserving error level specifically for conditions that represent an actual failure requiring attention, something broke, something is unavailable, an invariant was violated, restores the signal value of error-level alerting.

Under-severe logging hides real problems

The opposite mistake, logging a genuinely serious failure at debug or info level, is less common but more dangerous, since it means the event may never be surfaced to anyone at all if debug-level logging is disabled in production, as it typically is for volume reasons:

logger.info('Payment gateway returned error', { orderId, statusCode: 502 });

If this line is filtered out entirely in production because the deployment runs at warn level or higher, a genuine payment failure becomes invisible, discovered only when a customer complains rather than through any proactive monitoring.

logger.error('Payment gateway returned error', { orderId, statusCode: 502 });

Establishing a shared definition of each level

The most durable fix for systemic level misuse is a clear, written, team-shared definition of what each level actually means, applied consistently rather than left to individual developer judgment at the moment a log line is written:

debug: detailed internal state, useful only during active investigation
info:  routine, expected events worth recording but not requiring attention
warn:  unexpected but handled conditions; worth noticing in aggregate, not urgent individually
error: a failure that affected the current operation and likely requires attention

Having this definition reviewed during code review, the same way naming conventions or error handling patterns are reviewed, catches level misuse before it reaches production rather than after an alert has already either fired too often or failed to fire when it should have.

Auditing existing log level usage

For an existing codebase with accumulated level misuse, a periodic audit of which log statements fire most frequently at each level, particularly error level, surfaces miscategorized routine events that have likely been silently degrading alert quality for some time:

docker logs --since 24h my-api | grep '"level":"error"' | jq -r '.message' | sort | uniq -c | sort -rn | head -20

A list of the most frequent error-level messages over a 24-hour period often reveals several that are clearly routine, expected conditions rather than genuine failures, which are then straightforward to identify and reclassify once surfaced this way.

Level misuse affects more than human readability

Beyond confusing a human reading logs directly, incorrect levels affect any automated system built on top of log severity: alerting rules, SLO calculations based on error rate, and dashboards that aggregate by level all silently inherit whatever inaccuracy exists in how levels were assigned at the point of logging:

sum(rate({service="api"} |= "level=error" [5m])) > 0.05

An alerting rule like this one is only as meaningful as the underlying error-level classification is accurate; if a third of what is logged as "error" is actually routine and expected, the rule's threshold has effectively been calibrated against noise rather than against genuine failure rate.

Adjusting levels without a full redeploy

Because level misuse is often discovered gradually rather than all at once, supporting per-component or runtime-adjustable log levels makes correcting it less disruptive than requiring a full code change and redeploy for every adjustment:

const logger = require('pino')({ level: process.env.LOG_LEVEL || 'info' });

docker exec my-api kill -USR1 1  # signal handler that toggles debug logging temporarily

A mechanism for temporarily raising verbosity for investigation, without permanently changing the codebase's level assignments, reduces the temptation to default everything to a higher, noisier level "just in case" during normal operation.

Common mistakes

Logging expected, handled conditions at error level, eroding trust in error-level alerting over time.
Logging genuinely serious failures at a level low enough to be filtered out by the production logging configuration entirely.
Leaving level definitions undocumented and unenforced, resulting in inconsistent severity assignment across different parts of the same codebase.
Building alerting and SLO calculations directly on top of log level without first auditing whether the underlying level assignments are actually accurate.
Defaulting to a higher log level across the board to compensate for level misuse, rather than fixing the misclassification at its source.

A wrong log level is a quieter but more corrosive problem than missing or excessive logs, since it degrades the trustworthiness of every alert, dashboard, and severity-based filter built on top of it, and the fix is a shared, enforced definition of what each level actually means combined with periodic auditing of how that definition is actually being applied in practice.