98

Hi all!

Google published recently questions that are asked to candidates on interviews. One of them caused very very hot debates in our company and we're unsure where the truth is. The question is:

In a country in which people only want boys every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country?

Despite that the official answer is 50/50 I feel that something wrong with it. Starting to solve the problem for myself I got that part of girls can be calculated with following series:

$$\sum_{n=1}^{\infty}\frac{1}{2^n}\left (1-\frac{1}{n+1}\right )$$

This leads to an answer: there will be ~61% of girls.

The official solution is:

This one caused quite the debate, but we figured it out following these steps:

  • Imagine you have 10 couples who have 10 babies. 5 will be girls. 5 will be boys. (Total babies made: 10, with 5 boys and 5 girls)
  • The 5 couples who had girls will have 5 babies. Half (2.5) will be girls. Half (2.5) will be boys. Add 2.5 boys to the 5 already born and 2.5 girls to the 5 already born. (Total babies made: 15, with 7.5 boys and 7.5 girls.)
  • The 2.5 couples that had girls will have 2.5 babies. Half (1.25) will be boys and half (1.25) will be girls. Add 1.25 boys to the 7.5 boys already born and 1.25 girls to the 7.5 already born. (Total babies: 17.5 with 8.75 boys and 8.75 girls).
  • And so on, maintianing a 50/50 population.

Where the truth is?

nkrkv
  • 1,107
  • 1
    This is the very first problem in a widely available book of mathematical problems. – Q.Q.J. Mar 12 '10 at 10:41
  • This type of problem is often passed on as a sort of urban legend, but I think this one is more interesting than most such problems because the usual explanation is wrong. – Douglas Zare Mar 12 '10 at 10:53
  • 7
    As is almost always the case with these sorts of things it's not that "the usual explanation is wrong" or anything, it's simply that the question is ambiguously stated. – Kevin Buzzard Mar 12 '10 at 11:10
  • Kevin, what's the ambiguity? – Tom Leinster Mar 12 '10 at 11:30
  • 42
    The answer to the question "what is the proportion...?" is "it's whatever the proportion is". The question should perhaps say something like "what is the average proportion" and this is ambiguous already. Imagine for example there were just 1 boy-girl family, and they produced offspring until they got a boy and then stopped. Then the proportion #boys/#girls is 2/1 with probability 1/2, 2/2 with probability 1/4, 2/3 with probability 1/8,... , meaning the "average proportion" is something like 1.386... . Now people will say "well that's not what the question meant" and that's exactly my point. – Kevin Buzzard Mar 12 '10 at 11:40
  • 9
    @tom: if you look at the question here, clearly the sum I've done is what the questioner does more formally above. The issue is how one works out an "average proportion"---is it "average # boys" / "average # girls" or is it "average of #boys/#girls". These are visibly going to be different. There may be other ambiguities too. It's always the same with these sorts of questions. I'm tempted to vote to close as not a real question. – Kevin Buzzard Mar 12 '10 at 11:47
  • 1
    @Kevin I'm not talking about ambiguities. I think there is a consensus for the natural formalization, compute the expected proportion of girls in the population of children in a generation, and yet people give an argument which relies on the multiplicativity of expectation, which does not hold in this setting when the number of families is finite. – Douglas Zare Mar 12 '10 at 14:02
  • 6
    @Kevin: Based on the answers and questions people have had about this question, it seems that it is still interesting. I am not voting to close and I don't think others should either. I think that sometimes an acceptable answer to a mathematical question is that the question is actually ill posed. When that happens and it is not obvious, it is interesting. Clearly this question is confusing, and it is a subtle point that the question is ambiguous. If I had asked the question and you had posted your replies to Tom as an answer, I would have accepted it. – Chris Schommer-Pries Mar 12 '10 at 14:02
  • 5
    @Chris: The question is "interesting" in the same way that the Monty Hall problem is "interesting", and the question is generating a lot of noise, just as Monty Hall always does. @Douglas: I will happily accept that you are highlighting a different issue with the question. – Kevin Buzzard Mar 12 '10 at 14:23
  • 1
    Do we assume monogamy and immortality? I think that the results may change otherwise. – Federico Poloni Dec 21 '10 at 09:15
  • 2
    @Rhett Butler: You are confused. You think everyone else is making trivial mistakes, and that we don't understand martingales. This is wrong. The proportion of the boys:girls ($B/(B+G)$) is a random variable. This random variable can be affected by the choices made, as shown by the fact that the choices can affect the expected value of the random variable. – Douglas Zare May 13 '13 at 21:06
  • @Rhett Butler: You are confused. As has been pointed out much earlier here, the payoff in roulette is not the proportion $B/(B+G)$. So, if you think there is an application to roulette of a method for changing the expected value of $B/(B+G)$, the burden is on you to show it. Your claims that my correct statements are equivalent to claiming to have a winning roulette strategy are wrong and insulting. I tried to help you to understand some math, and you lie and ridicule me. Any more lies claiming that I am trying to invent a winning roulette strategy will be flagged as spam. – Douglas Zare May 14 '13 at 16:46
  • Love all the work you guys are doing. Very interesting math. However, as English goes, there is no ambiguity here. The question does NOT ask for the expected proportion within the family--it says for the country. It requires a truly tortured reading of English to add a family weighting to your answer. – Codure Aug 17 '18 at 16:08

17 Answers17

166

The proportion of girls in one family is a biased estimator of the proportion of girls in a population consisting of many families because you are underweighting the families with a large number of children.

If there were just 1 family, then your formula would be wrong, but the average of the percentage of girls you would observe would be

$$\sum_{n=0}^\infty \frac{1}{2^{n+1}} \bigg(\frac{n}{n+1}\bigg) = 1-\log2 = 30.69\%.$$

Half of the time, you would observe $0\%$ girls.

If you have multiple families, the average of the observed percentage of girls in the population will increase.

For 2 families, the average percentage of girls would be

$$\sum_{n=0}^\infty \frac{n+1}{2^{n+2}} \bigg(\frac{n}{n+2}\bigg) = \log 4 - 1 = 38.63\%.$$

More generally, the average percentage for $k$ families is

$$\sum_{n=0}^\infty \frac{n+k-1 \choose k-1}{2^{n+k}} \bigg(\frac{n}{n+k}\bigg) = \frac{k}{2}\bigg(\psi\left(\frac{k+2}2\right)-\psi\left(\frac{k+1}2\right)\bigg)$$

where $\psi$ is the digamma function which satisfies

$$ \psi(m) = -\gamma + \sum_{i=1}^{m-1} \frac1i = -\gamma + H_{m-1}$$ $$ \psi\left(m+\frac12\right) = -\gamma -2\log 2 + \sum_{i=1}^m \frac{2}{2i-1}.$$

With a little work, one can verify that this goes to $1/2$ as $k\to \infty$. So, for a large population such as a country, the official answer of $1/2$ is approximately correct, although the explanation is misleading. In particular, for $10$ couples, the expected percentage of girls is $10 \log 2 - 1627/252 = 47.51\%$ contrary to what the official answer suggests. With $k$ families, the expected proportion is about $1/2 - 1/(4k)$.

It is not enough to argue that the expected number of boys equals the expected number of girls, since we want $E[G/(G+B)] \ne E[G]/E[G+B].$ Expectation is linear, but not multiplicative for dependent variables, and $G$ and $G+B$ are not independent even though $G$ and $B$ are.

Douglas Zare
  • 27,806
  • 34
    This is really nice! I voted to close because this is an old chestnut, but you have found new life in it. – David E Speyer Mar 12 '10 at 14:10
  • 3
    Douglas, in your calculation, you assume (as instructed by the question) that every family continues having children until they have a boy, even if the mother has already had, say, hundreds of girls. But this is not realistic. After all, parents may die before ever having their first boy, or perhaps the child policy is recent. (This seems related to the stopping-time issue in the gamble you mention in comments below.) Could I kindly ask that you tell us the statistics of the situation if parents have only had N chances to have babies? I would be interested in your remarks. – Joel David Hamkins Mar 13 '10 at 18:38
  • 2
    Actually, I was planning to write my next GammonVillage column on that case, which resembles the win/loss record for someone who plays $k$ single-elimination tournaments with $N$ rounds. I haven't come up with an explicit formula yet, but the win/loss record after any finite number of tournaments is still a biased estimator of the player's chance to win each game. – Douglas Zare Mar 13 '10 at 18:49
  • 1
    A simpler expression for the average for a sample of $k$ families is $k*|\log 2 - (1 - 1/2 + 1/3 - ... + (-1)^k/k)|$. – Douglas Zare Apr 05 '10 at 16:43
  • 21
    You've changed the question by assuming that the population (i.e., the set of births whose G/(G+B) we are interested in) is a union of families that have stopped reproducing. All we know is that in $k$ families, some number $n$ of births have occurred through time $t$, each birth equivalent to a fair coin toss. This of course implies a symmetrical distribution of $(G,B)$ and consequently an expected value of exactly 1/2 for $G/(G+B)$ --- independent of $n, k$ and $t$. The difference in answers between the original problem and the completed-families problem is the "bias" you calculated. – T.. Jul 09 '10 at 00:28
  • 11
    @T, you are right that the bias depends on the formalization of the problem, but not that its existence depends on this particular one. If you assume that some families have not stopped reproducing, there is still a bias. If the size of the population is not constant, and larger populations tend to have more girls, then girls tend to be underweighted when you compute G/(G+B), and the expected value of G/(G+B) is under 1/2. I do not see a reasonable way to interpret the problem so that the population size is fixed or does not depend on the sexes of the children, but feel free to point one out. – Douglas Zare Jul 10 '10 at 03:53
  • 7
    To interpret the problem means to specify over what finite set of births (that were subject to the stopping rule) the expected value of G/(G+B) is to be calculated. The canonical choices are "all births", "all births of living persons", or "all births in some given (long) time interval". These lead to boy-girl symmetry and consequently to E[G/(G+B)]=1/2. If you know of any other specification of a set of births that is a reasonable interpretation of the problem but (under this or any other stopping rule) leads to an expected value different from 1/2, what is it? – T.. Jul 10 '10 at 07:28
  • 6
    @T: There isn't a boy-girl symmetry. The timing of the births depends on the sexes of the past children. The number of children after the second year depends on the sexes of the children born in the first year. The populations which are larger will be those which have more girls born in the first year. The result is that E[G/(G+B)] is less than 1/2, however counterintuitive that may be. Whether you use "all births in a generation" or "all births in the first 5 years" or "all births by 2010 from several generations" the expected proportion of girls is lower than 50%. – Douglas Zare Jul 11 '10 at 04:31
  • 2
    @DZ: The statement "larger populations tend to have more girls" is false; the proportion of girls in each family decreases as more children are born (either it stays constant at 0, or stays at 1 until dropping to 1/familysize). It is true only in your model that excludes families-in-progress, a model equivalent to computing the proportion of Heads in a series of coin tosses that ends with Heads. The latter condition does bias the proportion of Tails (girls) downward, but the problem about the whole population corresponds to computing the proportion with no such symmetry-breaking condition. – T.. Jul 11 '10 at 08:41
  • 2
    That should read "...dropping to 1 -(1/familysize)". Typo or no, the point should be clear that intuitions from a model computing expectation on a biased subset (i.e., one that excludes the incomplete families, which have 100% girls) do not reflect what happens in computing E[G/(B+G)] on an unbiased subset, such as the whole population. – T.. Jul 11 '10 at 08:55
  • 6
    @T: You have made a lot of false statements. Perhaps you should think about this problem more carefully. My statement that larger populations have more girls is true, and almost a tautology when you look at the whole generation since the number of boys equals the number of families. Larger populations mean more girls. It is also true if you assume that only enough time has passed for each family to have up to 3 children, or if you assume that the population is not broken into generations. I'm afraid it does not appear worthwhile to continue this exchange. – Douglas Zare Jul 11 '10 at 11:43
  • 2
    @DZ: Your comment "the number of boys equals the number of families" is clearly false. At any given time a positive proportion of families in the population will have only girls (and thus will eventually have more children under the stopping rule). As has been pointed out a number of times, your statements are true only for to the union-of-($k$)-completed-families model, which pre-biases the calculation by excluding families with higher (100 percent) proportions of girls. Do you know of a precisely specified model that restores those families but gives calculable answers less than 1/2? – T.. Jul 11 '10 at 16:34
  • 3
    Also, the claim that "larger populations tend to have more girls" is not only false (populations of any size have E[G/(B+G)]=1/2, as the stopping rule can't break boy/girl symmetry in the population), but circular as a justification of the purported bias. Positive correlation between fraction of girls G/(B+G) and population size B+G is, by definition and in any model of the problem, equivalent to E[G/(B+G)] < E[G]/E[B+G]. The latter equals 1/2, also in any model. So the correlation statement is not evidence that the bias actually exists, but only a rephrasing of the alleged existence. – T.. Jul 13 '10 at 03:53
  • 2
    "... larger populations have more girls ..."

    No, since the population size is meaningful only if something is known about the number of families.

    In order to use the Strong Law of Large Numbers we assume infinitely many families #1,#2,#3,... with each one's sequence of births (G^n)B being concatenated in order of family #. Then the stochastic process generating this infinite sequence of G's and B's is isomorphic (up to measure 0) with repeated flips of a fair coin.

    Hence the SLLN implies the asymptotic fraction of B's or G's is 1/2, each with probability = 1.

    This is, to me, conclusive.

    – Daniel Asimov Jul 15 '10 at 09:26
  • 4
    re: "the bias depends on the formalization of the problem, but not its existence". Consider formalizations where the population size n is chosen first (such as fixing it at one million, or selecting n from a particular probability distribution), and then n births are a stream of random coin tosses served in some fashion to an unending queue of "families" eligible under the stopping rule. [Families not fixed in advance, but constructed dynamically to fit the births.] Boy/girl distribution in such populations will be symmetrical by construction, and in particular E[G/(B+G)] = 1/2. – T.. Jul 15 '10 at 18:04
  • 4
    Note also that if the answer is (exactly) 1/2 conditional on any fixed finite value of $n$, it would then be the same without the conditioning. The question appears to be whether there is any a dynamical model of how a finite population evolves, not fixing $n$ or $k$ or stipulating the completion of any number of families, that leads to an expected proportion lower than 1/2, or a gender-asymmetrical finite population. – T.. Jul 15 '10 at 20:18
  • @Daniel Asimov: Again, I maintain that larger populations have more girls which means girls are weighted less in E[G/(G+B)]. The SLLN is an asymptotic result which does not tell you that the proportion is 50%, and you can check that it is not 50% by explicit calculations. Suppose there are 50 families, and they have time to have 1-5 children each. If there are 150 children, then there are more girls than if there are 70 children. The same phenomenon occurs if you take the union of several populations spread out in time. Once again, when the population size can vary, then there is a bias. – Douglas Zare Jul 16 '10 at 06:38
  • 2
    By the way, in the formalizations where there are a fixed number of boys, Jensen's inequality applied to $G/(G+B)$ implies that $E[G/(G+B)] \le 1/2$ with equality only if the number of girls is constant. Again, this applies even to more complicated models where the number of boys is not constant, such as asking about the expected value of $E[G/(G+B)]$ after 10 years of following the rule. By contrast, I do not think asking about the first 100 children is a reasonable interpretation of the problem. – Douglas Zare Jul 16 '10 at 07:12
  • 3
    Stipulating the number of families carries (artificially introduced, exogenous) information on the number of boys and this fully accounts for the "bias" in the models displayed thus far. The number of boys and girls is not constant in the gender-symmetric models that first determine the population size and then fill the families, so clearly some additional assumptions are needed to get $E < 1/2$, and the question is whether they are just another form of artificial conditioning. i.e., is the bias from the model selection or internal to the model itself. – T.. Jul 16 '10 at 08:09
  • 3
    Jensen's inequality does clarify the circularity of the argument, though. $E[G/(G+B)] \leq 1/2$ is equivalent to bivariate convexity of $f(G,B)= -G/(G+B)$. But $f$ is convex for fixed $B$ (fixing the number of boys biases $E$ downward), concave for fixed $G$ (setting the number of girls first biases $E$ upward, $\geq 1/2$), and linear for constant $x+y$ (no bias in models with population size determined prior to $B$ and $G$). To get $E \leq 1/2$ in any model you have to somehow build in an asymmetry that concentrates the number of boys and disperses the number of girls. – T.. Jul 16 '10 at 18:13
  • 8
    I just wanted to say, that you assume that all children of one family are born instantaneous (with the last child a boy). If you take into account "unfinished" families, than the proportion is directly 50/50 (I think, because how matter what, the change of boy for a child is 50/50). – Lucas K. Dec 20 '10 at 23:27
  • 5
    @Lucas K. No, that simplification is not a necessary assumption. The expected value of $G/(G+B)$ is not $1/2$ even if you allow unfinished families. If the population size is not constant, and there is a higher proportion of girls when the population is larger, then girls tend to make up a smaller portion of the population, so $E[G/(G+B)] \lt 1/2$. I invite you to make explicit computations as I did instead of just stating your intuition. Many people find this counterintuitive. – Douglas Zare Dec 21 '10 at 10:22
  • 1
    Your last sentence says that G and G+B are not independent, even though G and B are. This strikes me as (ever so slightly) misleading, because the independence of G and B is a red herring, in the sense that all the same phenomena would occur whether or not we had this independence. (E.g. in a model where everybody stops after two children, this independence goes away but all the really interesting phenomena remain.)

    That teeny quibble aside, thanks for this extremely enlightening explanation.

    – Steven Landsburg Dec 21 '10 at 16:04
  • 2
    @Steven Landsburg: Thanks for pointing that out. That paragraph has been bugging me since it isn't even the independence of G with G+B which is needed. Expectation just doesn't commute with most operations. – Douglas Zare Dec 21 '10 at 20:11
  • 6
    I'm not entirely sure this comment is appropriate here, and I'll happily delete it (or let someone else delete it) if more experienced users tell me to, but my blogpost citing this response has stirred up a considerable firestorm of comment, some fraction of which is thoughtful:

    http://www.thebigquestions.com/2010/12/27/win-landsburgs-money/

    – Steven Landsburg Dec 28 '10 at 00:21
  • 4
    @DZ: the statement "[allowing unfinished families] ... there is a higher proportion of girls when the population is larger" is in general false. It is true only in your a priori asymmetrical model conditioned on the number of families. The asymmetry arises not from the stopping rule, but because the stopping rule allows phrasing of boy/girl asymmetric conditions ("the number of boys is at most $k$") in equivalent terms without direct reference to $B$ or $G$ (i.e., "the number of families is $k$", as in your model allowing unfinished families). This asymmetry is foreign to the Google puzzle. – T.. Jan 03 '11 at 14:17
  • 15
    @Steve Landsburg: There were some very interesting and thoughtful comments on your post. However, since each argument continues until the participants finally agree, I expected that the fraction of thoughtful comments would be just over half. Sadly this seemed not to be the case... – Tom Church Jan 05 '11 at 04:26
  • 2
    @Douglas: You have found a roulette winning strategy without raising the stakes. No doubts? –  May 13 '13 at 10:40
  • 4
    @Rhett: I'm not sure I understand your comment about the "winning roulette strategy," but maybe this will explain things. You seem to want to apply some martingale theory where it isn't appropriate. The fraction of girls in the population is not a martingale. That is, let $G_n$ and $B_n$ be the number of girls and boys respectively in the first $n$ births, and let $X_n = \frac{G_n}{B_n+G_n} = \frac{G_n}{n}$. It's easy to see that $E[X_n] = \frac{1}{2}$. However, $X_n$ is not a martingale, since $E[X_n | X_{n-1} ] = \frac{n}{n+1}X_n + \frac{1}{2(n+1)}$. – Jon Peterson May 13 '13 at 12:36
  • @Jon: The answer calculated the average of the percentage of girls. But the question asks for the percentage of all girls. Further what is an expectation value? It has no meaning for a single case but only for a big number of cases. If 31 % girl expectation for a single family would be correct, then the ensemble of all families of the country would get close to it. If you don't believe in my explanations, then play roulette. Always bet 5 bucks on black and stop a sequence after black has appeared. Within 3000 sequences you should have earned more than 1000 bucks. Good luck! –  May 13 '13 at 14:39
  • 1
    @Jon: The probability of a girl to be born is a martingale, completely independent of the number of girls and of the history of their mothers. The expectation value of additional girls within the next 200 births is 100 with an error marge of 10. Same holds for boys. Therefore the population will never deviate by more than the statistical fluctuations from the 50:50 equipartition. –  May 13 '13 at 14:50
  • 8
    @Rhett Butler: It sounds like you are confused about basic probability. I assure you that I am not. You ask what the definition is of expected value. You say expected value is only defined for a large number of trials. You incorrectly try to apply the optional stopping theorem. EV is a basic idea I have taught many times and which you can find explained in many introductions or my book. It is not restricted to large repeated samples. I explained multiple times that $B/(B+G)$ is not a martingale, unlike $B-G$, so the OST for martingales does not apply to $B/(B+G)$, and the conclusion fails. – Douglas Zare May 13 '13 at 18:10
  • @Douglas Zare: Whatever you did, you did not correctly answer the question whether family planning can influence the equilibrium beteen boys and girls. The answer is no. But you claim that this answer is correct only for large populations and that your answer is different and the only correct one for smaller populations. Once you will recognize that the sex of a child is in no way dependent on the intentions or history of the mother, you should see your error. Your calculation of the weighted average over the ratios is not what you claim it was, namely an answer to the original question. –  May 14 '13 at 06:56
  • 6
    @Rhett Butler: You never addressed the main point of my remark - that $\frac{G_n}{B_n + G_n}$ is not a martingale. My point is that agreeing that Douglas's answer is correct does not imply that one has a "winning roullete strategy."

    Secondly, I agree that when $n$ is large this fraction is very unlikely to deviate from $1/2$. However, you go too far when you say "the population will never deviate by more than the statistical fluctuations from the 50:50 equipartition." This is plainly false. In fact the law of the iterated logarithm shows that there will always be some such deviations.

    – Jon Peterson May 14 '13 at 10:54
  • 4
    @Rhett Butler: Do you really think I have no idea that the sex of a child is modeled as $1/2$ independently of what came before, even though this is used in my calculations? You think that triviality is what everyone else is missing, too? How stupid. The mathematically interesting thing to me is that when the families can choose how many children to have based on the previous sexes, the proportion $B/(B+G)$ is a biased estimator of that $1/2$, as I stated in my answer, which means the expected value is not $1/2$. And $E[B/(B+G)]$ is what the OP's summation tried to calculate. – Douglas Zare May 14 '13 at 16:35
  • 1
    @Jon Peterson: If for 10 rounds/families always stopping at b (for black or boy) will supply 47 % red/girl and 53 % black/boy, then you have a winning strategy. Always bet the same amount of money on black. You will win more frequently than you will lose. –  May 14 '13 at 18:11
  • 1
    @Jon Peterson: $\frac{G_n}{B_n + G_n}$ is not a martingale. But it is completely irrelvant to calculate it. $\frac{E(G)}{E(B + G)}$ is asked for. The latter is 1/2 for every country and every number of families. –  May 14 '13 at 18:20
  • 4
    @Rhett Butler: You misunderstand what it means to have a "winning strategy." A winning strategy is one which guarantees you win money in the long run. Using this as a roulette strategy only gives a strategy where the expected fraction of wins is 53% for a short period of time (up to the 10-th black). If you try this repeatedly, then you essentially have increased the number of "families" and the answer approaches 50%. Therefore, this strategy will (not surprisingly) not guarantee you make money. – Jon Peterson May 17 '13 at 10:35
  • 4
    @Rhett Butler: One more comment. The fraction of wins up until a random time isn't even the correct thing to look at in roulette. Under the strategy above, even though the expected fraction of wins would be 53% by the time of the 10-th win, the expected actual winnings would be 0 dollars (assuming the casino pays out even money, which is of course wrong). This is because $B_n-G_n$ is a martingale. – Jon Peterson May 17 '13 at 11:01
  • 10
    I want to make clear that the upvote for Rhett Butler's comment "You have found a roulette winning strategy..." is mine and was cast in error when I was trying to flag this comment as spam. – Steven Landsburg May 17 '13 at 21:51
  • 1
    @Jon Peterson: A winning strategy also is one which guarantees many different people to win money in short runs - in paricular if these people cannot be distinguished. –  May 27 '13 at 06:10
  • 1
    A tiny correction, years later: In the fourth comment, $(-1)^k$ should be $(-1)^{k+1}$. – Steven Landsburg Apr 26 '15 at 02:42
  • With a bit algebra, one can show that the expected girl ratio can be expressed in terms of hypergeometric function: 2F1(k,1,k+1,−1). https://randycity.github.io/blog/girl-ratio.html – Randy Lai Sep 11 '17 at 16:07
  • @DouglasZare, could you explain your solution for 1 family? I got confused a bit considering the possibilities b, gb, ggb, gggb, etc, where g and b are girl and boy born in sequence. The probability of $g^nb$ would be $2^{-n}$, therefore the proportion of girls would be $\Sigma_{n=0}^{\infty}n2^{-n-1}$, which leads to an incorrect result. – Michael Jun 19 '19 at 18:38
  • @Michael: The probability of $g^nb$ is not $2^{-n}$ but rather $2^{-(n+1)}$, and the proportion of girls in that case is $\frac{n}{n+1}$, leading to the expected proportion $\sum_{0}^{\infty} \frac{n}{n+1}2^{-(n+1)}$. – Douglas Zare Jun 21 '19 at 11:50
  • @DouglasZare, oops, thanks, don't know what I was thinking. – Michael Jun 21 '19 at 15:29
  • @DouglasZare Could you please explain where you got the expression from for 2 families? I understand the expression for 1 family, which is simply just the formula for expected proportion of girls for a family. – anonuser01 Apr 15 '20 at 19:55
  • @DouglasZare I enumerated out a bunch of terms and it seems to simplify to what you have. For example, I enumerated the probability of family 1 having 1 girl and the probability of family 2 have 1,2,3,4, etc... girls and finding the joint probability and the values (proportion of girls) among the 2 families, and then did it for family 1 having 2 girls, and so on. Is that what you did, or did were you able to come up with that formula by inspection? If it's the latter, could you go into some of your thought process on how you got that formula quickly? – anonuser01 Apr 15 '20 at 20:09
  • @lamanon The calculation is similar to the binomial probability mass function. See the negative binomial distribution. – Douglas Zare Apr 16 '20 at 19:55
  • This post is all correct except for the last paragraph. This property you describe is true in general, but this problem is a simpler case. It's trivial to show that E[G]/E[G+B] = 1/2 if G and B are i.i.d variables...

    E[G]/E[G+B] => 1/(E[G+B]/E[G])=> 1/(E[G]/E[G]+E[B]/E[G])=> 1/(1+1) => 1/2`

    – user4722818 Nov 02 '21 at 05:34
37

There is a closely related puzzle about cards. I was told it by Vin de Silva, who said he was told it by Imre Leader, but I have no idea what the original source is.

An ordinary deck of cards, face down, is placed in front of you in a stack. A dealer turns the top card of the stack face up and puts it on a separate pile, and does this repeatedly until you say "now". At that point he turns over the next card and stops. You can say "now" at any time from the very beginning (before the first card is turned over) until almost the very end (just before the last card is turned over). You win if the last card turned over --- the one turned over just after you say "now" --- is red. What is the winning strategy?

You can get yourself into all sorts of convolutions trying to solve this. For example, you might think that it's good to wait until lots of the cards revealed so far are black, because then the probability that the next card will be red is relatively high.

But the solution is that it makes no difference at all what you do. Your probability of winning is always 0.5. To see this easily, imagine that after you say "now", the dealer turns over not the top card of the stack, but the bottom one. Clearly this game is equivalent to the original one, and clearly your probability of winning is 0.5 no matter what you do.

I'd like to take this easy solution and translate it into an equally compelling solution to the boy/girl puzzle, but right now I can't see how.

Tom Leinster
  • 27,167
  • It's funny that you mention that. I just discussed that puzzle in the StoxPoker.com forums (private), and was thinking of posting here to ask for the source. I learned of it on the TwoPlusTwo.com poker forums. – Douglas Zare Mar 12 '10 at 13:57
  • I also posted a variant in the ProjectEuler forums. http://forum.projecteuler.net/viewtopic.php?f=4&t=1445 Besides the symmetry argument, the probability of success in that puzzle is a martingale. – Douglas Zare Mar 12 '10 at 14:19
  • 3
    For the history of this problem, you could try asking Peter Winkler (at Dartmouth), who calls the bottom card of the deck the "Predestination Card." – Timothy Chow Apr 05 '10 at 17:58
  • 3
    It is not so clear to me why the answer is 0.5. So I understand that the dealer could have pulled out any card form the remaining deck and the game is equivalent. However, if a lot of black cards had been pulled already from the deck, then the number of black cards remaining in the deck would be low so it would be sensible to say any card pulled from the deck would have a high probability of being red. – Sandeep Silwal Dec 13 '15 at 04:22
  • 3
    @Sandeep: I think you are biased toward the favourable scenario where you've just seen X (say 5) black cards in a row and are thinking you should yell 'now'. You are dicounting the equallly probable scenario where you start by seeing X red cards in a row which is equally likely. For every favourable scenario there is an equally unfavourable one to counter-balance the average. So when you start playing, you might as well take your 50-50 chances and yell 'now'. – Alexandros Dec 02 '16 at 01:10
  • 1
    @SandeepSilwal My first thought was the same as yours. I even considered the extreme scenario where the first 26 cards all happened to be black, so there are no more black cards in the deck. Surely now the probability of winning is 100%? Actually, the key is to realize that you could have stopped at any point before this. At the point where there was still 1 black card left in the deck, you chose to play on instead of saying "now". So that's the point where you took a decision which had a 50/50 success chance. When you see the last black card, you simply know that you have won. – hb20007 Sep 01 '20 at 13:23
  • Peter Winkler also wrote a paper on this and similar problems: Games Pople Don't Play – BlueRaja Dec 19 '20 at 03:43
  • @Alexandros ok, let's say the probability is 0.5 whatever I do, the question is: how is this implied by considering the alternative case when the cards are turned from the bottom instead of the top? – Marco Disce Apr 21 '21 at 20:44
23

For those who still don't get it, it might help to consider this ultrasimplified example:

A certain family has a 3/4 chance of having 1 girl and a 1/4 chance of having 3 boys.

What is the expected number of girls in this family? 3/4. What is the expected number of boys? 3/4. What is the expected difference between the number of girls and the number of boys? Zero.

But what is the expected fraction of girl-births? There's a 3/4 chance that it's 100%, and a 1/4 chance that it's 0%. Therefore the expected fraction is 75%. Which, notably, is not 50%.

Moral: Just because the expected difference is zero, you can't conclude that the expected ratio is one.

(There is of course nothing new here beyond what Douglas Zare has already made crystal clear, but I'm thinking the starkness of the example might help.)

  • 6
    Ratios versus differences doesn't address the main point, which is whether the family reproduction rule can break boy/girl symmetry in the underlying distribution of $(B,G)$. [It does gender-asymmetrize the allocation of boys and girls into sets called "families", but this extra structure does not play a role in the calculation requested by Google.] If the distribution is symmetrical then the proportion of girls will have expected value 1/2, because the random variables "proportion of girls" and "proportion of boys" will have the same probability distribution, and their sum is equal to 1. – T.. Jan 04 '11 at 05:40
  • 2
    You ask: "But what is the expected fraction of girl-births?" Why should this number interest anyone who cares whether familiy planning can influence the population equilibrium? or "what fraction of the population is female?". Even if it is made crystal clear that average of fractions is not equal to the fraction of average, this does not entitle anybody to chose the wrong number. –  May 13 '13 at 20:02
  • 4
    Rhett Butler: The question asks about the ratio of boys to girls, not about the ratio of expected boys to expected girls. You ask why the ratio should interest anyone. I suggest you address that query to the person who posed the question, not the people who answered it. PS. Since it's been well established elsewhere that you're not the least bit interested in the question that was posed, or any of the interesting subsidiary questions that it raises, but only in blustering and hurling insults, I won't be responding to any followups. – Steven Landsburg May 13 '13 at 20:19
  • 1
    The question asks about the ratio of boys and girls and not about the mean value of averages in families. My own treatment and my recognition of this fact should show anybody that I am very interested in this question. I attribute your insulting manner to the fact that you recognize to have lost against Lubos Motl (who, as a Harvard string theorist, is certainly not less than you able to understand the simple error made by Douglas and accepted by you). –  May 13 '13 at 20:47
21

Let $X$ be the number of daughters of a certain couple. The probability that the first son of this couple is the $n$-th kid is $\frac{1}{2^{n}}$ and so $\mathbb E (X)=\sum_{n=0}^{\infty}\frac{n}{2^{n+1}}=1$. On the other hand the couple will have exactly one son so the expected proportion is 50-50.

Gjergji Zaimi
  • 85,056
  • 5
    E[A/(A+B)] is not E(A)/(E(A+B)). Showing that E(A)/E(A+B) = 1/2 is not enough. You need another assumption. – Douglas Zare Mar 12 '10 at 09:42
  • 3
    I was finding E(A/B), since B=1 this reduces to E(A). – Gjergji Zaimi Mar 12 '10 at 09:49
  • 6
    aha! So there are really two questions here: expected proportion of girls to total population and expected proportion of boys to girls. Tricky! – zeb Mar 12 '10 at 09:56
  • 4
    I guess there is some ambiguity about what number is meant by a proportion A:B. I read it as A/(A+B), which has the nice property that B:A is the complement so that computing E[A:B] is essentially the same as E[B:A]. If you interpret the proportion A:B as A/B, then this may be infinite, and E[A:B] can be 1 while E[B:A] is not 1, and may not exist. This calculation shows E[A/B]=1, but it does not compute E[B/A]. – Douglas Zare Mar 12 '10 at 11:07
  • 3
    Of course I agree, both in the ambiguity of the question and in your argument, which is why I gave you a +1. Your answer is more complete, and I took the lazy man's approach :-) – Gjergji Zaimi Mar 12 '10 at 11:21
  • @Douglas Zare: "Showing that E(A)/E(A+B) = 1/2 is not enough. You need another assumption." No, you need nothing else. The original question is: Can the family planning of a set of families result in a set of children such that E(A) differs from E(B)? The answer is no, since probability of 1/2 per girl is a martingale. You have answered another question not relevant in the present context. –  May 14 '13 at 09:37
  • @GjergjiZaimi. I didn't understand how E(X)=∑n/2^n+1=1. It looks to me the case of arithmetico-geometric series (1/20+1/41+1/8*2.....) with answer of 2.5 http://en.wikipedia.org/wiki/Arithmetico-geometric_sequence – David Jun 29 '13 at 06:28
  • @ Gjergji Zaimi : When I try to find out expected number of boy children (Y) for a couple, I was hoping to get $E(Y)=1$, but surprisingly I get $\frac12$. Can you help me understand what's mistake I'm doing. My calculation is: $E(Y)= 0.P(Y=0) + 1.P(Y=1)+ 2.P(Y=2)+.. = 0 + 1. \frac12 + 2.0 + 3.0 + .. = \frac12$ – KGhatak Jul 12 '17 at 09:29
18

I think this is already implicit in the heavily up-voted answer, but it may be worth clarifying: there are two kinds of expectations that we can talk about.

The first is the distribution of G/B, G/(G + B), B/G, B/(B + G), values for the entire population (along with its expected value, standard deviation, etc.). Here, the distribution is over all possible "runs of history", so to speak, in the sense that we average over all possible ways history could turn out. If the population is large enough (thousands? millions?), then the expected values of all these quantities are what you would expect from a 50:50 split, and the standard deviations are near zero. Thus, as far as demographic estimations of the overall population are concerned, 50:50 is the way to go. In fact, at the population level, the ratio of girls to boys cannot be influenced by stopping strategies; any influence must either (i) affect the relative probability of conception of male versus female fetus (ii) adopt a post-conception filtering mechanism, such as induced abortion or infanticide).

The second is the expected G/B, G/(G + B), B/G, B/(B + G), etc., values over families. More generally, we may be interested in the distribution of different (G,B) values for different families. If we are interested in understanding family dynamics more thoroughly, we may also be interested in the birth orders, i.e., in what order girls and boys arise. Here, family stopping strategies could affect the distribution of (G,B) values and also of the birth orders. In particular, the strategy here ("stop as soon as you have a boy") gives 50% of the families with a single boy, 25% with one (older) girl and one (younger) boy, 12.5% with two older girls and one younger boy, and so on (assuming the complication of twins and triplets does not arise). This could have important demographic implications in the long term, when mating is done for the next generation (since birth order and the age gaps between children and their parents all play a role in mating and the creation of chlidren). However, that is getting beyond the current question.

For this second sense, it is not just the expected value per family that matters, but rather, the specific distribution of families. As already pointed out, since the variables are not independent, E[G/B] is not the same thing as E[G]/E[B], so what variable we choose to average over affects what answer we get. Looking at the whole distribution conveys more information.

When demographers are making short-term population estimates, it is the first sense (expected values for the population over runs of history) that is relevant, so stopping strategies can be discounted unless they are accompanied by post-conception selective strategies or strategies that affect conception probabilities. A deeper understanding of society would require knowing things in both the first and the second sense.

Vipul Naik
  • 7,240
13

Another way to look at the "official" solution is to notice that for statistical purposes it does not matter which couple gets the next child. You "request" children from the couples in whatever manner you want, you always get a 1:1 expected ratio in boys and girls, regardless of the pattern in which you choose the next couple to produce another child.

Thorny
  • 1,618
  • 6
    Suppose you make a financial instrument which pays the proportion heads/(heads+tails) for a sequence of fair coin flips whose stopping point I control, with at most 20 flips. I'll pay 0.6 for this. Deal? The expected payoff for my stopping rule will not be 1/2. – Douglas Zare Mar 12 '10 at 14:27
  • 2
    @Douglas: You answered the false question. The ratio B/G of boys to girls in a population is (nearly) 1. And by a birth that has same probabilities b = 0.5, g = 0.5, b/g = 1, it is impossible to change this ratio. It has been asked whether the ratio B/G can be subject to manipulations. It can not. And as Thorny says and as I also few minutes ago found myself: It is completely irrelevant which couple decides to cease fire and which will continue. Therefore the independent variables will remain equal within the statistical margin. –  May 12 '13 at 16:42
8

It doesn't make much sense to compute the expected proportion of girls per family. Take two families, one with just a single boy, and another with eight girls and a boy. The average number of girls (resp. boys) per family is four (resp. one); and indeed, four times as many girls as boys were born. But the average proportion, which is what you are calculating, is (0 + 8/9) / 2 = 4/9, which is less than 1/2! So although your calculation may be correct, the answer doesn't really mean very much.

TonyK
  • 2,191
  • 15
  • 15
8

When I posted this problem on my blog, one commenter (who prefers to remain anonymous but gave me permission to repost here) noted a cool way to estimate the expected value of $B/(B+G)$.

Write $f(G)=B/(B+G)$ and expand in a Taylor series around $B$:

$$f(G)=f(B)+f'(B)(G-B)+(1/2)f''(B)(G-B)^2+...$$

Now take expected values: We have $E(G-B)=0$ and $E(G-B)^2=2B$, so

$$E[f(G)]=f(B)+(1/2)f''(B)(2B)+...$$ $$=(1/2)+1/(4B)+...$$

Now the number of boys is equal to the number of families, so for $k$ families, the proportion of boys is well estimated by

$$(1/2)+(1/4k)$$

and of course it's easy to get better estimates by going to higher terms in the Taylor series.

My commenter also adds the following (in my opinion, quite insightful) remarks:

Independence (or, more precisely, correlation) isn’t the only issue. Even for independent variables, the expected value of a ratio is not equal to the ratio of the expected values. (The expected value of a product of uncorrelated variables is the product of the expected values, though.) This is one of the most important keys to understanding this problem, I believe. And this is why I suggested the Taylor series to expand the ratio about its mean. I also think it is a little easier to find the expected proportion of boys because the random part (G) only appears in the denominator. Also, B is equal to the number of mothers, so I don’t believe B and G are independent because I don’t believe the number of girls is independent of the number of mothers.
  • 3
    Steven, that is incorrect. The issue is whether the proportion of boys, denoted $f(B,G)$ above, is convex as a function of two variables so that $E[f(B,G)] > f(E[B],E[G]) = 1/2$. It isn't convex, as simple calculations demonstrate. (See the comments on convexity and Jensen's inequality under Douglas Zare's posting). It is convex if you condition on B (i.e., restrict f to lines B=constant), and concave if you condition on G. Such conditioning is foreign to the Google problem and imposed artificially in Doug's model. – T.. Jan 04 '11 at 06:05
  • T: I can't tell what assumptions you're making. Surely if we assume a fixed finite number of families, all of whose children are infertile, the calculation above is correct (or if it isn't, I hope you can point to the exact place where it goes wrong). If you are making some alternative assumptions, it would be good for you to state them. – Steven Landsburg Jan 04 '11 at 06:20
  • 7
    I'm not assuming any specific model, but pointing out that differences between your answer and 1/2 arise from artificial (i.e., gender-asymmetric) conditioning of the problem. Assuming $k$ families as in Zare's model or your present suggestion, is equivalent to assuming "at most $k$ boys in population", or exactly $k$ boys if it is also assumed the families complete their reproduction. No such asymmetric conditioning was part of the Google problem. Your calculations show that a symmetrical distribution can be approximated by asymmetric ones, not that the Google problem is asymmetric. – T.. Jan 04 '11 at 07:20
  • 2
    T.: If you're not assuming any specific model, I don't see how you can be getting a specific answer. I think it would help me understand your point if you could give an explicit example (specifying number of families, mortality rates, fertility of the chidren, etc) in which the answer is 1/2. – Steven Landsburg Jan 04 '11 at 14:58
  • 2
    Dear Steven, I think that T. is saying something similar to the second para. of Vipal Naik's answer, namely, that in a large polulation, a random birth (happening somewhere in the population) will be a girl 1/2 the time, and a boy the other 1/2 of the time. So 1/2 the population will be boy, and half girl. As Vipal then goes on to discuss, the growth of the population, for example, may be affected by stopping strategies of the kind discussed here. T.'s point then is that, if you look at how things depend on the number of families, i.e. on $k$, you can find an apparent asymmetry, ... – Emerton Jan 05 '11 at 14:20
  • 3
    ... but this is only apparent, because $k$ is not a quantity which is independent of the stopping rule, but is in fact being influenced by the stopping rule (it is related to the growth of the population, which as Vipal note's will potentially be affected by the stopping rule). Does this make any sense? – Emerton Jan 05 '11 at 14:22
  • Emerton: I am having trouble following your argument.
    1. Certainly E(G) = E(B), whether the population is large or small.

    2. Certainly it does not follow from this that "1/2 the population will be boy and half girl". If this means G/G+B = 1/2, well, G/G+B could in fact be just about anything at all. If it means E(G/G+B) = 1/2, that's a better conjecture but it's still false.

    3. It's easy to write down a simple model where k is independent of the stopping rule and still E(G/G+B) != 1/2. In fact, it's quite difficult to construct a model in which E(G/G+B) is 1/2.

    – Steven Landsburg Jan 05 '11 at 14:37
  • 1
    Dear Steven, I don't see how the population is not 1/2 boy and 1/2 girl, but maybe I'm being dense. I imagine a population somewhere. Now children are being born. 1/2 the time the child is a boy, the other 1/2 its a girl. These children keep appearing over time. How does the population not (in the long run) stablize to be 1/2 boy and 1/2 girl? Isn't the same as imagining a big pile of marbles, some red, some black. Whatever is there to begin with, I start adding marbles, red half the time, black half the time. In the long run, the pile will be half red and half black. Of course, ... – Emerton Jan 05 '11 at 16:12
  • ... marbles don't die, so they are not being removed from the pile, whereas in a population, people are being removed. Is this what you are worried about? – Emerton Jan 05 '11 at 16:13
  • 1
    "Isn't the same as" was supposed to read "Isn't it the same as" (and sorry for the slightly ungrammatical sentences that follow, as well as the other sundry typos and misspellings). – Emerton Jan 05 '11 at 16:16
  • Emerton: Your argument proves that E(G)=E(B). It does not address the question of E(G/G+B). It will help a lot to clarify things, I think, if you can tell me which of the following statements is the first one you disagree with: a) For a single family, E(G/G+B) = 1-log(2). b) Therefore for a country with just one family, E(G/G+B)= 1-log(2). c) Therefore, there exists a model in which, for the country, E(G/G+B) != 1/2. d) Therefore it is not true that in every model, for the country, E(G/G+B) = 1/2. (CONTINUED...) – Steven Landsburg Jan 05 '11 at 18:13
  • e) Therefore to conclude that E(G/G+B) = 1/2, one must make at least one additional assumption that is not stated in the problem. – Steven Landsburg Jan 05 '11 at 18:14
  • Emerton: PS--- the example I've posted here might help:

    http://www.landsburg.org/alt.txt

    – Steven Landsburg Jan 05 '11 at 18:18
  • Dear Steve, I agree that in a single family the situation is different, but in a country, the dynamics of individual families might be very complicated. This is why I am think just in terms of the overall population. If we forget about children for a moment, and just think about marbles in a pile, am I right in arguing that if I add marbles to the pile with an even chance of red or black, that in the long run the proportion of red to black will be 1:1? Note: I am not asking about expected values, but about the actual proportions in the pile. – Emerton Jan 05 '11 at 18:23
  • 1
    (Also, I should add that I'm happy to keep discussing this, since I would like to clarify my misunderstandings, but if you would prefer not to, please just let me know.) – Emerton Jan 05 '11 at 18:24
  • Dear Steven, I looked at your example, and of course I don't disagree with the mathematics; I guess my disagreement is one of interpretation (which might be what you mean by model). The thing is, I don't think you can call it a country. E.g. in scenario 1, when the outcome is B/B, the country dies off, so in the long run the population is 0 and there are neither boys nor girls. I am imagining a large country in which their are many births happening over a long period of time (hence my analogy to the pile of marbles and my claims about what happens "in the long run"). – Emerton Jan 05 '11 at 18:47
  • Emerton: I'm happy to keep discussing, but let me know if you think there's a better forum than these comments. Here is what I see as the main point: If I call my example a "country", then it's a country in which E(G/G+B) is not 1/2. You say this doesn't count as a country. I think (and I hope this doesn't too argumentative) that at this point it's incumbent on you to give an explicit example of a "country" in which the answer is 1/2 --- specifying things like number of families, mortality rates, fertility of subsequent generations, etc. I suspect you won't find this easy. – Steven Landsburg Jan 05 '11 at 19:02
  • Dear Steven, Thanks for your reply. I think after reading your example and your last comment, I understand your point, and presumably you understand what I am arguing. I agree that my model is idealized (e.g. the ratio will presumably never be exactly 1:1, just because even in a large sample population, the actual value of a random variable won't precisely equal the expected value, just be very close to it), but I think it gives a good way to think about the question (and, although I can't be sure of course, I think it explains the sense in which T. was not "assuming any specfic model"). – Emerton Jan 06 '11 at 00:39
  • I don't think I want to try to model an actual country with this stopping policy, since I don't think I could write down anything like a realistic model for any kind of country-sized population, whatever the policy on reproduction is! So I might bow out for now. Thanks very much for taking the time to explain your point of view. Best wishes, Matt – Emerton Jan 06 '11 at 01:56
  • @Emerton: It would be mathematically shocking if the expected proportion were less than 1/2 for the easily computable cases of k=1, k=2, k=10, but exactly 1/2 for country-sized numbers of families. And this doesn't happen, there is a bias for any k. If your intuition tells you that for large enough k, the expected value is not just close to 1/2 but equal to 1/2, then your intuition is wrong. Let me ask something which may help. Suppose there are unusually many girls born in one year. Is the population unusually large, or small, the next year? Ans: Large. That means girls get weighted less. – Douglas Zare Jan 06 '11 at 05:44
  • Another way to look at it is as follows: Suppose each year, $1 billion will be split equally among the members of the population. If you are a 2-year-old boy, you expect to get a slightly greater share of this than a 2-year-old girl does, since you expect to split it with fewer others since the population includes no younger siblings. That your younger siblings affect the population size, and boys have no younger siblings, means that boys get weighted more than girls. – Douglas Zare Jan 06 '11 at 05:52
  • 1
    Dear Douglas, As I wrote above, I've bowed out. Regards, Matthew – Emerton Jan 06 '11 at 07:56
  • 5
    Although Matthew has bowed out, I just want to say what a pleasure it is to read his comments generally, not only for their erudition but for their wonderful civility. Too bad not all comments under this question in particular are so civil... – Todd Trimble May 17 '13 at 13:05
7

Of course, in the real world the sex ratio of a couple's offspring is a random variable with mean near 0.5. If a couple contains to product offspring until a boy is produced, the couples who tend to produce more girls will have larger families, and the proportion of girls will be higher than 50%.

Chris Godsil
  • 12,043
5

Caveat: This is not an entirely serious answer.

There has been some (heated?) discussion as to the sensitivity of the various answer to the particular model. I thought, for my own amusement, that I would do some Monte-Carlo experiments with a "plausible" model involving Pilgrims traveling to the New World. However, in thinking about possible models, I came up with the following issue. Suppose we assume that:

  1. Once Pilgrims marry, they stay married, and do not re-marry if their spouse dies.
  2. No living Pilgrim has a direct living ancestor older than their grandparents.

Under these assumptions, it follows that if there were N male Pilgrims in the first settlement, then, at any given time, there are at most 3N male Pilgrims. Moreover, the probability of the settlement dying out over several generations (because all the children are girls) is non-zero. By Kolmogorov's zero-one law (overkill), it follows that almost surely the Settlement will die out, and not become the kick-ass country it may well have been.

  • 2
    Assuming current cosmological models are reasonably accurate, we can confidently claim that the USA will die out in finite time. It seems the end result is consistent with your model. – S. Carnahan Jan 06 '11 at 06:24
4

A colleague, Eugene Salamin, came up with what I would consider the "Book" solution:

Phooey, this isn't at all a mathematical puzzle. A social convention cannot override biology, so the proportion of boys and girls is the biologically determined one, nominally 1/2, 1/2.

I didn't immediately understand his reasoning. But if all families are enumerated 1,2,3,... and you imagine each family's sequence of children placed in numerical order to make one infinite (or very long) sequence, then the resulting sequence of B's and G's is statistically identical to one you would get by repeatedly flipping a fair coin.

Viewed this way, the rule for stopping when the first B is reached is clearly a red herring! And clearly the proportion of boys and girls will be equal. (At least asymptotically, with probability 1, by the Strong Law of Large Numbers.)

(Likewise, if the original question is varied so that Prob(B) = p and Prob(G) = q, p+q=1, then by the same reasoning the ultimate proportions of boys and girls are p and q, respectively.)

P.S. On the other hand, this does not work for each possible stopping rule. Say we're back to the usual assumption of each birth having an equal chance of being a boy or girl. In an imaginary world, suppose each family stopped having children when the proportion of the girls in their family first exceeded 2/3. Then the ratio of girls to boys in the population will clearly be greater than 2.

  • The stopping rule matters because the ratio G/(G+B) for the population becomes a biased estimator of the probability that each child is a girl. Just as it is biased for 1 family, it is biased for 2, 10, or any finite number of families. Each boy or girl is weighted by 1/(population size), and girls tend to belong to larger populations.

    I'm not sure what you mean by "a social convention cannot override biology" or "clearly a red herring."

    If a family stops when there are at least twice as many girls as boys, then with positive probability (I believe 1/2 phi^-1) the family will not stop.

    – Douglas Zare Jul 08 '10 at 19:33
  • If you modify that example, then each family can stop with at least as many boys as girls with probability 1. That means with probability 1, the country's population will have at least as many boys as girls, and some populations will have more boys than girls. – Douglas Zare Jul 08 '10 at 19:37
  • The words "a social convention cannot override biology" (not mine) mean just that the ultimate proportions of boys and girls are the same as the proportions in which boys and girls are born.

  • The stopping rule (in question) is a red herring because any stopping rule of the form "Stop as soon as a certain consecutive string of B's and G's occurs" will result in the same ratio of 1:1 (or more generally p:q) as the probabilities of B vs. G (or H vs. T) are in.

  • You are right that the 2:1 stopping rule is not almost certain to occur. Oops.

  • What modification are you thinking of?

  • – Daniel Asimov Jul 08 '10 at 23:43
  • 3
    Douglas, if every family uses a stopping rule that enforces B > G (with probability 1, such as "reproduce until B > G") this obviously does not and cannot enforce a surplus of boys in the population. It is especially clear if you replace families/children by the isomorphic setup of gamblers/cointosses. If every gambler follows a "play until ahead" strategy that in no way implies a loss for a casino offering fair games. It does alter the allocation of tosses (children) to gamblers (families) but for the casino (population) the allocation is irrelevant. – T.. Jul 09 '10 at 16:46
  • @Daniel Asimov: 1 and 2) In that case I disagree with the statement. The proportion of girls in the population is not a martingale, and so it should not be a surprise that there are stopping rules which change the expected value. 4) A stopping rule suggested by your modification is that each family stops with at least one child and at least as many boys as girls. Another is that each family stops with 1 more boy than girl. With a finite number of families, the expected proportion of boys in the generation is not 50%. E[B] would not exist, but E[B/(B+G)] would. – Douglas Zare Jul 10 '10 at 04:04
  • 1
    @T: What follows "obviously" in your response is false. With probability 1, the families can stop with a surplus of boys. The optional stopping theorem has a finiteness assumption which is necessary and which is violated in this case, so the conclusion does not hold. It is possible to choose a stopping rule on a stream of fair coin flips so that you stop with probability 1, and when you stop, there are more heads than tails. The expected number of tosses for this stopping rule must be infinite. Again, the proportion of girls is not a martingale, so you can't expect the OST to apply anyway. – Douglas Zare Jul 10 '10 at 04:12
  • 6
    @Douglas: what followed "obviously" was the observation that it is not possible for stopping to enforce a surplus of boys in the population, although it can certainly enforce this, eventually, within each family. If you define "population" (i.e., the set of coin flips over which G/(G+B) is calculated) in terms of the stopping rule, as you did with your model with completed families only, then of course what is true for the stopping rule can be true of the population, but it would be hard to argue that the problem corresponds to any such model. – T.. Jul 11 '10 at 17:42