I'm currently reading a Machine Learning textbook (by Tom Mitchell) and in the ID3 algorithm, they have used a term called entropy.
The entropy of a collection whose classification is boolean is defined as
$$Entropy(S) \equiv -p_\oplus log_2 p_\oplus - p_\ominus log_2 p_\ominus$$
Why is it defined as this? I understand that it has properties like Entropy has to be zero if $p_\oplus = 0$ or $1$, and Entropy is $1$ when $p_\oplus = 0.5$ but there are other functions also which can work. For example,
$$Entropy(S) = -4(p_\oplus-0.5)^2 + 1$$
Why do we use logaritmic functions to calculate entropy? Is there a reason why we prefer this?
Asked
Active
Viewed 90 times
2
B2VSi
- 1,005
-
There is a lot out there, look up Shannon entropy and Shannon information for a start. – Ian Sep 17 '17 at 05:47
-
Look here: https://math.stackexchange.com/questions/330272/definition-of-the-entropy?rq=1 – Epiousios Sep 17 '17 at 06:18
-
1Shannon (the inventor of this concept) has a very short and accessible book The Mathematical Theory of Communication that explains where the $\log$ comes from and why it makes sense. – user3658307 Sep 17 '17 at 17:42