Information entropy: definition of concept, properties, system

2025 Author: Angel Austin | [email protected]. Last modified: 2025-01-23 12:19

The concept of informational entropy implies the negative logarithm of the probability mass function for a value. Thus, when the data source has a value with a lower probability (i.e., when an event with a low probability occurs), the event carries more "information" ("surprise") than when the source data has a value with a higher probability.

The amount of information conveyed by each event defined in this way becomes a random variable whose expected value is the information entropy. Generally, entropy refers to disorder or uncertainty, and its definition used in information theory is directly analogous to that used in statistical thermodynamics. The concept of IE was introduced by Claude Shannon in his 1948 paper "A Mathematical Theory of Communication". This is where the term "Shannon's informational entropy" came from.

Definition and system

The basic model of a data transmission system consists of three elements: a data source, a communication channel and a receiver,and, as Shannon puts it, the "basic communication problem" is for the receiver to be able to identify what data was generated by the source based on the signal it receives over the channel. Entropy provides an absolute constraint on the shortest possible average lossless encoding length of compressed source data. If the entropy of the source is less than the bandwidth of the communication channel, the data it generates can be transmitted reliably to the receiver (at least in theory, perhaps neglecting some practical considerations such as the complexity of the system required to transmit the data and the amount of time it may take to transmit data).

Information entropy is usually measured in bits (alternatively called "shannons") or sometimes in "natural units" (nats) or decimal places (called "dits", "bans" or "hartleys"). The unit of measurement depends on the base of the logarithm, which is used to determine the entropy.

Properties and logarithm

The log probability distribution is useful as a measure of entropy because it is additive for independent sources. For example, the entropy of a fair bet of a coin is 1 bit, while the entropy of m-volumes is m bits. In a simple representation, log2(n) bits are needed to represent a variable that can take on one of n values if n is a power of 2. If these values are equally likely, the entropy (in bits) is equal to that number. If one of the values is more likely than the others, the observation that it ismeaning occurs, is less informative than if some less general outcome would occur. Conversely, rarer events provide additional tracking information.

Because the observation of less probable events is less frequent, there is nothing in common that the entropy (considered to be average information) obtained from unevenly distributed data is always less than or equal to log2(n). Entropy is zero when one result is defined.

Shannon's information entropy quantifies these considerations when the probability distribution of the underlying data is known. The meaning of observed events (the meaning of messages) is irrelevant in the definition of entropy. The latter takes into account only the probability of seeing a particular event, so the information it encapsulates is data about the underlying distribution of possibilities, not about the meaning of the events themselves. The properties of information entropy remain the same as described above.

Information theory

The basic idea of information theory is that the more one knows about a topic, the less information one can get about it. If an event is very likely, it is not surprising when it occurs and therefore provides little new information. Conversely, if the event was improbable, it was much more informative that the event happened. Therefore, the payload is an increasing function of the inverse probability of the event (1 / p).

Now if more events happen, entropymeasures the average information content you can expect if one of the events occurs. This means that casting a die has more entropy than tossing a coin because each crystal outcome has a lower probability than each coin outcome.

Features

Thus, entropy is a measure of the unpredictability of a state or, which is the same thing, its average information content. To get an intuitive understanding of these terms, consider the example of a political poll. Usually such polls happen because the results of, for example, elections are not yet known.

In other words, the results of the survey are relatively unpredictable, and in fact, conducting it and examining the data provides some new information; they are just different ways of saying that the prior entropy of the poll results is large.

Now consider the case where the same poll is performed a second time shortly after the first. Since the result of the first survey is already known, the results of the second survey can be well predicted and the results should not contain much new information; in this case, the a priori entropy of the second poll result is small compared to the first one.

Coin Toss

Now consider the example of flipping a coin. Assuming that the probability of tails is the same as the probability of heads, the entropy of a coin toss is very high, as it is a peculiar example of the informational entropy of a system.

This is becausethat it is impossible to predict that the outcome of a coin is tossed ahead of time: if we have to choose, the best we can do is to predict that the coin will land on tails, and this prediction will be correct with a probability of 1 / 2. Such a coin toss has one bit entropy, since there are two possible outcomes that happen with equal probability, and studying the actual outcome contains one bit of information.

On the contrary, flipping a coin using both sides with tails and no heads has zero entropy since the coin will always land on this sign and the outcome can be predicted perfectly.

Conclusion

If the compression scheme is lossless, meaning you can always recover the entire original message by decompressing, then the compressed message has the same amount of information as the original, but is transmitted in fewer characters. That is, it has more information or higher entropy per character. This means that the compressed message has less redundancy.

Roughly speaking, Shannon's source code coding theorem states that a lossless compression scheme cannot reduce messages on average to have more than one bit of information per message bit, but any value less than one bit of information per bit can be achieved. messages using the appropriate encoding scheme. The entropy of a message in bits times its length is a measure of how much general information it contains.