2

I'm currently reading a Machine Learning textbook (by Tom Mitchell) and in the ID3 algorithm, they have used a term called entropy.
The entropy of a collection whose classification is boolean is defined as $$Entropy(S) \equiv -p_\oplus log_2 p_\oplus - p_\ominus log_2 p_\ominus$$ Why is it defined as this? I understand that it has properties like Entropy has to be zero if $p_\oplus = 0$ or $1$, and Entropy is $1$ when $p_\oplus = 0.5$ but there are other functions also which can work. For example, $$Entropy(S) = -4(p_\oplus-0.5)^2 + 1$$ Why do we use logaritmic functions to calculate entropy? Is there a reason why we prefer this?

B2VSi
  • 1,005
  • There is a lot out there, look up Shannon entropy and Shannon information for a start. – Ian Sep 17 '17 at 05:47
  • Look here: https://math.stackexchange.com/questions/330272/definition-of-the-entropy?rq=1 – Epiousios Sep 17 '17 at 06:18
  • 1
    Shannon (the inventor of this concept) has a very short and accessible book The Mathematical Theory of Communication that explains where the $\log$ comes from and why it makes sense. – user3658307 Sep 17 '17 at 17:42

0 Answers0