5

Let $S_1, \ldots, S_k$ be subsets of $\{1, \ldots, n\}$ with $k \leq 1.99 \frac{n}{\log_2(n)}$, prove that there are two distinct subsets $X,Y$ of $\{1, \ldots, n\}$, such that for all $1 \leq i \leq k$ we have $|X \cap S_i | = | Y \cap S_i | $.

I am sure this is supposed to be done with some concentration argument, but I have no idea what the right probability space to look at is. I've so far tried to pick X and Y independently with the same distribution by including every number in the set (independently) with some probability $p_j$. This would then mean that the random variable $|X \cap S_i | - | Y \cap S_i | $ has mean zero and that I could use some concentration inequality to then get a union bound. To make sure that the sets $X$ and $Y$ also are distinct, I need to pick the parameters $p_j$ close enough to $1/2$. The parameters $p_j$ should also somehow depend on the size of the smallest set covering the element $j$. But the analysis got too messy and I doubt that this approach is even feasible? Does anybody know what the right space to look at is?

Edit: $n$ is supposed to be taken sufficiently large as Mike earnest pointed out.

  • This is false. Let $n=2$, let $k=2$, and let $S_1={1}$ and let $S_2={2}$. Note that $k\le 1.99 \frac{n}{\log_2 n}$ is satisfied, but $X$ and $Y$ do not exist. – Mike Earnest Feb 10 '23 at 18:04
  • This would be true if $1.99$ was replaced with $0.99$. Is this what you intended? – Mike Earnest Feb 10 '23 at 18:04
  • 1
    @MikeEarnest you are right, $n$ is supposed to be sufficiently large. The constant $1.99$ is right though. – zimba bwe Feb 10 '23 at 18:37

1 Answers1

4

I do not have a concentration argument, but I can prove this using entropy. We just need three results about entropy:

  1. $H(X, Y)\le H(X)+H(Y)$.
  2. The entropy of the binomial random variable $\text{Bin}(m,\frac12)$ is $\frac12\log_2 m+O(1)$, as $m\to\infty$. (Proof)
  3. If $f$ is a deterministic function, then $H(f(X))\le H(X)$.

Suppose $X$ is chosen uniformly randomly from all $2^n$ subsets. Consider the entropy of the random vector ${\bf x}$, defined by $$ {\bf x}:=(|S_1\cap X|,|S_2\cap X|,\dots,|S_k\cap X|).\tag1 $$ Using properties $1$ and $2$, $$ H({\bf x})\le H(|S_1\cap X|)+\dots+H(|S_k\cap X|)\le k(\tfrac12\log_2 n+O(1)). $$ Note that $|S_i\cap X|$ is distributed like $\text{Bin}(|S_i|,\frac12)$, and $\frac12 \log|S_i|\le \frac12 \log n$.

Now, let us prove the contrapositive. Assume that there do not exist subsets $X$ and $Y$ for which $(|S_1\cap X|,\dots,|S_k\cap X|)=(|S_1\cap Y|,\dots,|S_k\cap Y|). $ This means that the correspondence from subsets to vectors defined by $(1)$ is injective, so you can recover the subset from its vector alone. That is, there exists a function $f$, from vectors to subsets, such that $f({\bf x})=X$. Using property $3$, this means that $$ n=H(X)=H(f({\bf x}))\le H({\bf x})\le k(\tfrac12\log_2 n+O(1)) $$ We have proved that $$k\ge \frac{n}{\frac12\log_2 n+O(1)}=2\cdot \frac{n}{\log_2 n+O(1)},$$ which for large enough $n$ implies $k>1.99\frac{n}{\log_2 n}$.

Mike Earnest
  • 75,930
  • I actually did find the same question I had link. The solution there did use concentration and PHP. I am not familiar with information theoretic tools so I cannot confirm, but the solutions feel like they should be equivalent in some sense. – zimba bwe Feb 10 '23 at 20:49