4

It is explained that we have log for additivity of information in the entropy formula. But, why is the $p$ factor? It is redundant, since we already have it in the $\log p$!

Start wearing purple
  • 53,234
  • 13
  • 164
  • 223
Val
  • 1
  • 2
    For a random variable $X$ having support ${x_1,\dots x_n}$ with probabilities ${p_1,\dots p_n}$, the amount of information (in bits) you gain from a specific realisation of $X$, i.e. $X=x_i$, is given by $\log_2(1/p_i)$. The average amount of information you gain from $X$, i.e. the entropy of $X$, is given by:

    $$\mathbb{E}[\log_2(1/p_i)]=\sum_{j=1}^n p_j\log_2(1/p_j)=:H(X).$$

    Does this help at all?

    – David Simmons Oct 24 '13 at 21:48

1 Answers1

3

From Shannon's original paper ["A Mathematical Theory of Communication", The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656, July, October, 1948].

If there is such a measure, say $H(p_1,p_2,...,p_n)$ , it is reasonable to require of it the following properties:

  1. $H$ should be continuous in $p_i$

  2. If all the $p_i$ are equal, $p_i = 1/n$ , then $H$ should be a monotonic increasing function of $n$. With equally likely events there is more choice, or uncertainty, when there are more possible events.

  3. If a choice be broken down into two successive choices, the original $H$ should be the weighted sum of the individual values of $H$.

Basically requirement (3) entails the first $p$. Moverover, it can be shown that the form $H(\{p_i\}_{i=1,...n})=-\sum_i p_i \log p_i$ is the only one that satisfies the above 3 propositions.

Hope it helps.

jgyou
  • 950
  • 7
  • 24