Shannon entropy is defined by: $H(X) = -\sum_{i} {P(x_i) \log_b P(x_i)}$, where b could be $e$, 2 or 10 (bit, nat, dit, respectively).
My interpretation of the formula is: $H(X)$ is equal to the negative sum of: probability of $x_i$ multiplied by $log_b$(probability of $x_i$).
So far, my implementation of Shannon entropy in R is (here is an example):
mystring <- c(1,2,3,1,3,5,4,2,1,3,2,4,2,2,3,4,4)
myfreqs <- table(mystring)/length(mystring)
# vectorize
myvec <- as.data.frame(myfreqs)[,2]
# H in bit
-sum(myvec * log2(myvec))
[1] 2.183667
So for the string used in my example, $H(X)=2.183667$.
Now take a look to the entropy package. The function entropy.empirical computes the Shannon entropy:
mystring <- c(1,2,3,1,3,5,4,2,1,3,2,4,2,2,3,4,4)
entropy.empirical(mystring, unit="log2")
[1] 3.944667
If we look at the code, is seems that the formula used is:
freqs <- mystring / sum(mystring)
H <- -sum(freqs * log(freqs) / log(2)
[1] 3.944667
My simple question: who is wrong? Why R use that code? Is that the Shannon entropy, or another entropy's calculation?