0

I'm going to try to rewrite my question in a better way:

I have a set of $N$ boxes, and one of those boxes is filled. I sample the boxes with uniform probability and without replacement until I find the filled box. What is the mean and variance for the number of empty boxes I've opened? For example, if $N = 2$, and I open two boxes to find the full box, I've opened one empty box.

My guess is that I need to open $\mu = \frac{N}{2}$ boxes to find the full box. Thus, I need to open $\frac{N}{2}-1$ empty boxes prior to opening the full box. However, because each trial depends on the next, I don't know how to calculate the variance?

I suppose I'd like something like the negative binomial distribution, but where we're sampling without replacement?


Update: Byron Schmuland answers the first part of my question here: Expectation of number of trials before success in an urn problem without replacement

We just need to recast my empty boxes as "red" balls and my full box as a "blue" ball, and ask how many red balls we need to draw before we draw a "blue" ball. However, how might we calculate the variance?

Jack
  • 33
  • I would not "use the hypergeometric distribution", rather I would note that the number of empty boxes one must open before finding the filled box is uniformly distributed on 0..N. – Did May 05 '13 at 20:29
  • Mean: look better. Variance: try. – Did May 05 '13 at 21:02
  • If $X$ is uniformly distributed on ${0,1,\ldots,N}$, what is $E[X]$, say, when $N=2$? Is it $\frac{N+1}2-1$? – Did May 05 '13 at 21:34
  • Quote: the number of empty boxes I need to open. There are $N$ empty boxes hence $X=N+1$ is impossible. – Did May 05 '13 at 21:38
  • Why do you erase your comments after I answered them? Why do you modify your post after it was discussed in the comments? Such a heavy (and silent) rewriting is not how the site is supposed to function. – Did May 06 '13 at 05:07
  • @Did I felt I wrote poorly worded question confused in its intent, and I was embarrassed. I'm sorry if I offended you. – Jack May 06 '13 at 11:17
  • The point is that one is not supposed to erase one's footprints like you did, if only for the sake of future readers (and sorry but "offense" is quite offtopic here). – Did May 06 '13 at 11:21
  • @Did Let me try that again: I'm sorry that I undermined comments that you spent your time writing by deleting mine and causing them to lose their context. I felt an apology was in order for that. – Jack May 06 '13 at 11:34

1 Answers1

0

Each box is equally likely to be the filled box, so the number of empty boxes opened before you open the fill box is uniformly distributed from $0$ to $N-1$. Thus the expected number of empty boxes opened is $(N-1)/2$, and the variance is

$$ \frac1N\sum_{k=0}^{N-1}k^2-\left(\frac{N-1}2\right)^2=\frac16\frac{(N-1)N(2N-1)}N-\left(\frac{N-1}2\right)^2=\frac{N^2-1}{12}\;. $$

joriki
  • 238,052
  • The variance should be the same for the total number of boxes opened, right? – Jack May 05 '13 at 22:38
  • @Jack: Yes, only the mean is shifted by $1$ in that case. – joriki May 05 '13 at 22:42
  • Sorry, one more question if you have the time - how might the mean and variance change if we had multiple filled boxes in the population? – Jack May 05 '13 at 22:49
  • 1
    @Jack: On your first comment: No, I don't think there's much of a connection there. On your second comment: That's slightly more complicated; I suggest to ask that in a question of its own. – joriki May 05 '13 at 22:51