2

I have $n$ objects. Every object has a random value in $[0;k)$ (in $\mathbf{N}$). How high is the probability for every object to be unique within the set of $n$ objects?

This is obviously a case of the birthday paradox.

From there we know, that $u = \frac{k!}{(k-n)!}$ is the possible number of combinations in which all objects are unique and $m=k^n$ is the total number of possible combinations. Therefore, the probability that all objects are unique must be $P=u/m$.

My problem is I'm dealing with really big numbers. WolframAlpha cannot handle it and neither can my brain. It's hard for me to even estimate if P is small or high, i.e. $m >> n$?

In my case, I need to compute the formula for $n=3.2*10^7$ and $k=2^{32768}$.

Can anyone at least give an approximation for the result?

Cutaraca
  • 123
  • 4

2 Answers2

5

The birthday problem actually gets easier, not harder, as the numbers get bigger (or more precisely as $k$ gets bigger relative to $n$). That's because there's a very simple approximation that gets more accurate as the numbers gets bigger. That approximation is the following: write the uniqueness probability as

$$\mathbb{P} = \prod_{i=1}^{n-1} \left( 1 - \frac{i}{k} \right)$$

and use the inequality $1 - x \le \exp(-x)$, which is an excellent approximation with error $O(x^2)$ for $x$ small (meaning the bigger $k$ gets the better this approximation is) to conclude that

$$\mathbb{P} \le \exp \left( -\sum_{i=1}^{n-1} \frac{i}{k} \right) = \boxed{ \exp \left( - \frac{ {n \choose 2} }{k} \right) }.$$

The error in this approximation is a multiplicative factor of $1 + O \left( \frac{n^3}{k^2} \right)$ which in your case, with $k$ so large, is totally negligible.

We can see from here that it's not even close: $k$ is much, much larger than ${n \choose 2}$ so the exponent is extremely small and the probability of uniqueness is extremely close to $1$. ${n \choose 2}$ is about $5.1 \times 10^{14}$ while $k$ is the enormously larger $1.4 \times 10^{9864}$. $k$ is so much larger than any of the other numbers in this problem that the above approximation is effectively exact and $\mathbb{P}$ is for all practical purposes equal to $1$.

It is completely unnecessary to use Stirling's approximation here, which is much harder than the above very simple and very accurate approximation (try it on the original birthday problem!). The exponent $\frac{ {n \choose 2} }{k}$ has the intuitive interpretation of being the expected number of collisions, and the fact that it appears in the exponent like this reflects the fact that as long as $k$ is big enough compared to $n$, the number of collisions is asymptotically Poisson with $\lambda = \frac{ {n \choose 2} }{k}$.

Qiaochu Yuan
  • 419,620
1

a good, useful estimate, is that for big $n$,

$n! \sim n^n e^{-n} \sqrt{2\pi n}$

This is called Stirling's formula. Actually, those quantities are asymptotically equivalent (which means their ratio goes to $1$ as $n\to \infty$).

So in particular, you can use it to approximate big binomial coefficients :)

Azur
  • 2,194
  • 6
  • 18
  • You do not need to use Stirling's approximation here, it actually makes things harder. Did you actually try it? – Qiaochu Yuan Oct 28 '22 at 08:31
  • Haha, no, I did not try computing the probability, I just mentioned Stirling's cause I thought it might be relevant. But indeed, it is much less efficient than the solution you offered :) – Azur Oct 28 '22 at 09:31