It is explained that we have log for additivity of information in the entropy formula. But, why is the $p$ factor? It is redundant, since we already have it in the $\log p$!
1 Answers
From Shannon's original paper ["A Mathematical Theory of Communication", The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656, July, October, 1948].
If there is such a measure, say $H(p_1,p_2,...,p_n)$ , it is reasonable to require of it the following properties:
$H$ should be continuous in $p_i$
If all the $p_i$ are equal, $p_i = 1/n$ , then $H$ should be a monotonic increasing function of $n$. With equally likely events there is more choice, or uncertainty, when there are more possible events.
If a choice be broken down into two successive choices, the original $H$ should be the weighted sum of the individual values of $H$.
Basically requirement (3) entails the first $p$. Moverover, it can be shown that the form $H(\{p_i\}_{i=1,...n})=-\sum_i p_i \log p_i$ is the only one that satisfies the above 3 propositions.
Hope it helps.
- 950
- 7
- 24
$$\mathbb{E}[\log_2(1/p_i)]=\sum_{j=1}^n p_j\log_2(1/p_j)=:H(X).$$
Does this help at all?
– David Simmons Oct 24 '13 at 21:48