7

Consider a sequence of fixed non-singular $n$ by $n$ matrices $A_i$ whose entries are chosen from $\{0,1\}$ and a sequence of independent random $n$ dimensional vectors $x_i$ whose entries are also chosen independently from $\{0,1\}$ . Assume $n$ is large.

We know $H(A_ix_i) = n$. This is because $A_i$ is invertible and so $A_ix_i$ tells us precisely what the values of $x_i$ are.

I am interested in $$y=H\left(\sum_{i=1}^{\ell} A_ix_i\right)$$ and in particular, under what circumstances is $y$ much larger than $n$?

We know that if every $A_i$ is identical and each is simply the identity matrix then $y = nh_B(\ell)$ were $h_B(t) = H(B(t,1/2))$ . Therefore, under these circumstances $y \approx C_1n\log_2{\ell}$ for some constant $C_1 >0$.

What properties do the matrices $A_i$ have to have for $y$ to be of the form $C_2n\ell$ for some constant $C_2>0$?

It seems plausible that at the least the matrices $A_i$ should be dense to ensure that the range of values each entry of $A_ix_i$ can take is large enough. But is this sufficient?

  • Let $w_i=A_i x_i$ , If $A_i$ are fixed and $x_i$ and independent, then $w_i$ are independent and $H(Y) = \sum H(w_i) = \lfloor\log_2{n}\rfloor n$ What am I missing? – leonbloy Dec 18 '15 at 14:23
  • In the above, by $H(Y)$ I meant $y$ – leonbloy Dec 18 '15 at 14:30
  • @leonbloy This can't be right. Look at the case where the $A_i$ are all the identity matrix. In general it's not always true that $H(X+Y) = H(X) + H(Y)$ if $X$ and $Y$ are independent. –  Dec 18 '15 at 14:49
  • Of course, my bad – leonbloy Dec 18 '15 at 14:53

1 Answers1

1

A suggestion (sketch):

  1. The problem (leaving aside the non-singularity of $A_i$, which should not be necessary - it should be a result) can be put in this equivalent form: let $x$ be the concatenation of $\ell$ instances of $x_i$, so $w=Bx$ where $B$ is a binary $ n\times m$ matrix, $m=n \ell$, (in the original statement $\ell = \lfloor\log_2{n}\rfloor $ but we won't impose this). (It might be preferable to use $\{-1,1\}$ instead of $\{0,1\}$, for $x$ and/or for $B$).

  2. For large $n$, $w$ tends to a multivariate Gaussian, its entropy tends to $H(w)=\frac{1}{2}\log(|2 \pi e B B^t|)$. The task, then is to maximize $|B B^t|$ subjected to $B$ being binary. [*]

  3. See here (page 2, problem 2)

[*] Update: Perhaps this can be more convincing in the original formulation: $w =B x = \sum_{i=1}^\ell A_i x_i=\sum_{i=1}^\ell w_i$ where $A_i$ are square $n\times n$. Then, assuming $x_i \in\{-1,1\}$, $E(w_i)=0$, $Cov(w_i)=A_i A_i^t$ Then we can apply the multivariate CLT to show that $w$ tends to a $N(0,\sum_i A_i A_i^t)$ distribution.

leonbloy
  • 63,430
  • I would be equally interested in any answer that applied using ${-1,1}$ instead of ${0,1}$. –  Dec 18 '15 at 19:42
  • Your point 2. is fascinating but I don't understand it yet. a) Does $w$ tend to a multivariate Gaussian irrespective of what $B$ is and in what sense is this true? b) How do we know that $H(w) = 1/2 |2\pi e B B^T|$? c) For finite $n$, the entropy $H(w)$ does not seem to be monotonic in $|BB^T|$. Is this to be expected? –  Dec 19 '15 at 07:27
  • @dorothy a) Wild guess, it seems that some CLT variation should apply when $B$ has "enough ones" - yes, this is very informal b) that's the entropy of a multivariate gaussian with covariance $\Sigma = B B^T$ c) I'd bet that's a small $n$ effect - but I wouldn't bet my life :-) – leonbloy Dec 19 '15 at 12:48
  • I am sorry if this is obvious but how do you get the covariance to be $BB^T$ even assuming it does converge to a multivariate Gaussian? –  Dec 19 '15 at 19:39
  • 1
    @felipa It's not necessary to assume convergence to Gaussian: $w=Bx \implies C_w = B C_x B^t$ . What I'm assuming is $x$ in ${-1,1}$, which implies $C_x=I$ To use ${0,1}$ changes only a multiplicative factor. – leonbloy Dec 19 '15 at 20:27
  • @felipa Of couse, silly me. Fixed, thanks – leonbloy Dec 20 '15 at 21:47
  • I am worried your idea doesn't work. $w$ is a vector of Gaussians but we have no reason to believe this satisfies the criteria for a "multivariate Gaussian" do we? That is "[...] every linear combination of its [...] components has a univariate normal distribution." We need the covariance matrix to fully specify the distribution but it omits all dependencies that are more than pairwise (e.g. threewise etc.) and I don't see how to argue that is sufficient here. –  Dec 21 '15 at 16:04
  • 1
    @dorothy See my update. – leonbloy Dec 21 '15 at 16:27
  • One more (hopefully last) worry. If you just scale $B$ (i.e. multiply it by 1000) the entropy should be unchanged but the entropy of the continuous approximation we are using increases. Somehow any answer should be scale invariant which we don't have currently. –  Dec 22 '15 at 09:39
  • @dorothy That's a know issue of the differential entropy, it's not scale invariant. The correct relationship between the (true) entropy of a discrete variable $H(X)$ and its continuos approximation $h(X)$ is $H(X)=h(x)-\log(\Delta)$, see eg http://thirdorderscientist.org/homoclinic-orbit/2013/5/8/bridging-discrete-and-differential-entropy The last correction term is only relevant to compute exact values, not to maximize or find asymptotics. – leonbloy Dec 22 '15 at 11:23
  • Thanks again. However, I plotted entropy versus log(det(BB^T +1)) for 2^14 different +-1 matrices with 6 rows and 9 columns (with +-1 vectors too). This is done by exponential time computation (nothing clever). See here imgur.com/BFbzdJZ . How can our analysis be consistent with this picture? –  Dec 22 '15 at 17:23
  • Well, it seems to be some hint of linearity, no? Anyway, the numbers are small (besides, I'd prefer to take 0-1 matrices). For the CLT to work, $m/n \gg 1$ – leonbloy Dec 22 '15 at 17:42
  • Is there any literature that would indicate how large $m/n$ has to be? Here is a picture of 2^17 0-1 matrices with +-1 vectors http://imgur.com/PdYUzb4 with 6 rows and 12 columns. –  Dec 22 '15 at 18:23
  • Re: your Update: The $w_i$ are not identically distributed so which version of the multivariate CLT works here? –  Dec 23 '15 at 20:42