5

I was given the following question in a course on the probabilistic method. I have now been struggling with it for a couple of days and didn't make any real progress. This is the question:

Prove that for sufficiently large $n$, if $S_1, S_2, \ldots, S_k \subseteq \{1, 2, \ldots, n\}$ and $k\le \frac{1.99n}{\log_2{n}}$ then we can find some $X,Y \subseteq \{1, 2, \ldots, n\}$ so that $X\ne Y$ and for all $1\le i \le k$ we have $|X\cap S_i| = |Y\cap S_i|$.

I noted that we can assume that $\bigcup_{i=1}^{k}S_i = \{1, 2, \ldots, n\}$ because if not we can simply take $X = \emptyset, Y = \{1, 2, \ldots, n\} \setminus \bigcup_{i=1}^{k}S_i$ and then $|X\cap S_i| = |Y\cap S_i|=0$.

If we choose at random some $X,Y \subseteq \{1, 2, \ldots, n\}$ then for a fixed $i$ we have $$\mathrm{Pr}(|X\cap S_i| = |Y\cap S_i|) = {\binom{2|S_i|}{|S_i|} \over 2^{2|S_i|}} $$ But I could not calculate, or at least bound from below the probability that it happens for all $i$ since I do not know how the $S_i$ intersect.

One thing I thought of is that from $k\le \frac{1.99n}{\log_2{n}}$ we know that $2^{2n/k} \ge n^{2/1.99} > n$ which suggests we might divide ${1, 2, \ldots, n}$ into disjoint sets of size $k$: $I = \{A_1, A_2, \ldots, A_{n/k}\}$, and then look at some function $f\colon\{1, 2, \ldots, n\} \to \mathcal P(I) \times \mathcal P(I)$ which cannot be onto, but I don't know what $f$ to take.

So basically I did not make any progress with this problem. Any help will be appreciated.

Smylic
  • 6,715

1 Answers1

5

We would like to apply the pigeonhole principle to this problem. Unfortunately, at the moment, the number of possible values of $(|S_1 \cap X|, |S_2 \cap X|, \dots, |S_n \cap X|)$ is bounded at best by $$(|S_1|+1) \cdot (|S_2|+1) \cdot \dots \cdot (|S_k| + 1)$$ which could be much larger than $2^n$.

To fix this, we first apply the probabilistic method to show that for most choices of $X$, the value of $|S_i \cap X|$ lies in a much narrower interval for all $i$. Then we can discard lots of unlikely "pigeonholes" while losing only a few "pigeons", and after this, the pigeonhole principle will apply.

The general idea at work is a very useful trick to learn, and you will learn better if you stop reading this answer and try this yourself right now.


If we choose $X$ at random, $|S_i \cap X|$ is a binomial random variable with mean $|S_i|/2$, and by a standard large deviation bound, $$\Pr\left[\Big||S_i \cap X| - |S_i|/2\Big| \ge t\right] \le 2 \exp(-2t^2/|S_i|) \le 2 \exp(-2t^2/n).$$ Set $t = \sqrt{n \log n}$, and we get a bound of $2/n^2$. So with probability $1 - k \cdot \frac{2}{n^2} > \frac12$, $|S_i \cap X|$ lies in an interval of size $2 \sqrt{n \log n}$ for every $i$. From here on, we restrict ourselves to just those subsets of $\{1,2,\dots,n\}$ for which this bound holds for each $i$: there are at least $2^{n-1}$ of these.

If $X$ is such a subset, then the vector of values $(|S_1 \cap X|, |S_2 \cap X|, \dots, |S_n \cap X|)$ can have at most $(2 \sqrt{n \log n})^k$ different values. So we will have more pigeons than pigeonholes if we show that $$(2 \sqrt{n \log n})^k < 2^{n-1} \quad \Longleftrightarrow \quad k\, \log_2 (2 \sqrt{n \log n}) < n-1.$$ If we forget about all the less-significant terms of this, this is saying that $\frac12 k \log_2 n < n$ or $k \le \frac{2n}{\log_2 n}.$ All the less-significant terms can be rounded up at the cost of making the $2$ into the $1.99$, so this inequality follows from the given bound of $k$.

Once we have this inequality, the pigeonhole principle applies, so we can choose two subsets $X, Y \subseteq \{1,2,\dots,n\}$ that fall into the same pigeonhole, and therefore have $|X \cap S_i| = |Y \cap S_i|$ for all $i$.

Misha Lavrov
  • 142,276
  • Wow, thank you for the detailed answer. I didn't think of applying a probabilistic method first to narrow the range of possibilities and then apply a counting argument. Very useful technique. – Alex Ermolin Apr 14 '17 at 18:37
  • @user91015 The one I actually checked to make sure the constants were right was Hoeffding's inequality, but there's many inequalities that give a similar result so I wanted to be vague. Lots of people call these "Chernoff-type bounds" in the collective, but crediting Chernoff over Bernstein, Hoeffding, or Rubin (who came up with similar inequalities first) seems unfair. – Misha Lavrov Apr 15 '17 at 23:02