✦ For everyone, free.

Practical knowledge for real and everyday life

Home

4.6 Information Quantity

Information Quantity measures the amount of data transmitted in communication, shaping how messages are structured and processed within cybernetic systems.

Information quantity is a precise numerical measure of the amount of information contained in a message or event, defined within the framework of information theory. Unlike everyday uses of the word "information," which often refer to meaning or semantic content, information quantity in the technical sense is determined entirely by probability: the less likely an event or message is, the more information it carries upon being observed. This counterintuitive definition captures the idea that a message reporting something already known or highly predictable conveys little new information, while a message reporting a surprising event conveys a great deal.

The foundational measure of information quantity for a single event with probability p is called self-information or surprisal, defined as:

I ( x ) = - log 2 p ( x )

When the logarithm is taken in base 2, self-information is measured in bits. If the natural logarithm is used, the unit is the nat, and if base 10 is used, the unit is the hartley. The choice of base corresponds to a choice of unit and does not affect the mathematical structure of the theory.

The properties of self-information follow directly from this definition. An event with probability 1 carries zero information, since observing it confirms something that was already certain. An event with probability 1/2 carries exactly 1 bit of information, equivalent to learning the outcome of a fair coin toss. An event with probability 1/4 carries 2 bits, and an event with probability 1/8 carries 3 bits. The information quantity thus grows without bound as probability approaches zero.

A key property of self-information is additivity for independent events. If two independent events each occur with probability p and q respectively, the combined event has probability p·q, and its information content equals the sum of the individual self-information values:

I ( x , y ) = - log 2 ( p ( x ) p ( y ) ) = I ( x ) + I ( y )

This additivity property, arising from the logarithmic form, makes information quantity a natural measure for communication purposes, since the information in a long independent sequence of symbols is simply the sum of the information in each symbol.

Probability p(x) Self-information (bits) Self-information vs. Probability I(x) 0 1

While self-information quantifies the information carried by a single outcome, the average information content over all possible outcomes of a random variable is captured by Shannon entropy. For a discrete random variable X taking values in a finite alphabet with probabilities p(x), the entropy is:

H ( X ) = - x p ( x ) log 2 p ( x )

Entropy measures the expected or average information quantity per observation of the random variable. It is maximized when all outcomes are equally probable, in which case each observation is maximally surprising. It is minimized at zero when one outcome has probability 1 and no information is conveyed by observing the certain outcome.

In the context of cybernetic communication theory, information quantity connects directly to uncertainty and control. Norbert Wiener's conception of cybernetics treated information as the reduction of uncertainty: knowing the state of a system with lower uncertainty corresponds to having received more information about that system. When a system receives a measurement or signal, the quantity of information received equals the reduction in entropy about the system's state. This framing makes information quantity a natural tool for analyzing how effectively feedback signals in a control loop communicate the system's current state to the controller.

The bit as a unit of information quantity deserves special attention. One bit is the amount of information required to distinguish between two equally likely possibilities. This corresponds to the information gained by learning the outcome of a fair binary choice. In computing, a binary digit stores one bit of information, though the theoretical information-theoretic bit and the storage bit are related but not identical concepts: a storage bit can hold at most one bit of information-theoretic information, but may hold less if the stored values are not equiprobable.

Information quantity also underlies the theory of data compression. The entropy of a source sets a lower bound on the average number of bits per symbol required to represent the source without loss of information, known as the source coding theorem. No lossless compression scheme can achieve an average code length shorter than the source entropy, and codes approaching this limit, such as Huffman coding and arithmetic coding, achieve compression ratios that approach the theoretical minimum as message length grows.

For continuous random variables, the direct analog of entropy is differential entropy, which extends the self-information framework to probability density functions. However, differential entropy lacks some of the absolute meaning of discrete entropy; it can be negative and depends on the choice of units for the variable. Mutual information between continuous variables, which measures the information quantity shared between two random variables, remains a well-defined and useful quantity even in the continuous case and is central to the definition of channel capacity.