50

Consider all $2^n$ different binary vectors of length $n$ and assume $n$ is an integer multiple of $3$. You are allowed to delete exactly $n/3$ bits from each of the binary vectors, leaving vectors of length $2n/3$ remaining. The number of distinct vectors remaining depends on which bits you delete. Assuming your aim is to leave as few remaining different vectors as possible, how few can you leave as a function of $n$?

Example, $n=3$. You can leave only the two vectors $11$ and $00$.

Following comments at the math.se site (in particular by Jack D'Aurizio), in general for larger values of $n$ you can replace any block of three consecutive bits by either $00$ or $11$. This gives an upper bound of $2^{n/3}$. Is this in fact the correct answer?

Now I have some code to solve small instances, we can start to fill in a table of optimal results. We use the notation $H(n,b)$ to indicate the smallest number of distinct vectors that results from starting with all vectors of length $n$ and removing $b$ from each.

$$12 \leq H(15,5) \leq17$$

$$H(12,4) = 10$$

For $n=10$ and $b = 1,2,3,4$ we have $\leq140,\leq 31,10, 4$

For $n=9$ and $b = 1,2,3,4$ we have $70,18,6,2$

For $n=8$ and $b = 1,2,3$ we have $40,10,4$

For $n=7$ and $b = 1,2,3$ we have $20,6,2$

For $n=6$ and $b = 1,2$ we have $12,4$

For $n=5$ and $b = 1,2$ we have $6,2$

$$H(4,1) = 4, H(3,1) = 2, H(2,1) = 2$$

If we only allow symmetric solutions then $H(10,2)=32$. This implies (assumnig no error in the calculations) that for some instances there may be no symmetric optimal solutions as we already have $H(10,2) \leq 31$.

Simd
  • 3,195
  • 1
    This is the first question I see whose answers look so much like a mini polymath project. – Benoît Kloeckner Sep 26 '13 at 13:53
  • Why is your bound for H(10,1) 141 when the bound for H(9,1) is 70? Just by appending 0 or 1 to the end of the substrings for H(9,1) you can obtain a bound of 140. – Thomas Sep 27 '13 at 08:01
  • Can you list the 31 substrings for H(10,2)? Also, what symmetries are you thinking of? Just complementary symmetry? – Thomas Sep 28 '13 at 02:11
  • Finding H(n,1) for large n is actually quite hard, so I will focus on proving some easier things. For example, it is easy to prove that H(2a+1,a)=2, because one of the digits has at most a occurences. – Thomas Sep 28 '13 at 03:31
  • Also, H(2a+2,a)=4, because you have to have the substrings 0^a+2 and 1^a+2, but then the two strings 0^(a+1)1^(a+1) and 1^(a+1)0^(a+1) do not have those as substrings and are incompatible with each other, so two more substrings are required. The substrings 0^(a+1)1 and 1^(a+1)0 work, because any string not containing one of the first two as a substring has a+1 0's and a+1 1's, and if the string ends in 1 you can just eliminate all other 1's to get 0^(a+1)1, and vice versa with 0. I will now work on finding a proof of H(2a+3,a)=6. – Thomas Sep 28 '13 at 03:33
  • @Thomas These are the 31 vectors I get from my code for $(10,2)$ 00000000,00000011,00000101,00000110,00001100,00011000,00011111,00110000,00110111,00111011,00111100,01001111,01100000,01100111,01110011,01111000,10001111,10010001,10010010,10101001,10101010,11000000,11000111,11011011,11011100,11100011,11101100,11110000,11111101,11111110,11111111 . On the other question, when I said symmetry I simply meant under the assumption that every vector in a solution appears with its complement. – Simd Sep 29 '13 at 14:52
  • Have you recognized the number of this question?! MO/142857...7^-1! – A_S Dec 28 '17 at 15:59

8 Answers8

26

I can, at least, answer your question 'Is this in fact the correct answer?' with an affirmative 'no'.

Specifically, we can replace the upper bound $2^{n/3} \approxeq 1.26^n$ with the slightly better bound $6^{n/9} \approxeq 1.22^n$ by applying the same 'separate into independent blocks' construction to the following (conjecturally optimal) covering set for $n = 9$:

$$\{000000, 111111, 111000, 000111, 001100, 110011\}$$

Clearly, $2^{n/3}$ is still optimal for $n = 3$ and indeed (by exhaustive search) $n = 6$.

  • 1
    Thank you for this. At this point any lower bound would be interesting. – Simd Sep 22 '13 at 20:55
  • Indeed, I haven't been able to establish a non-constant lower bound. Any covering set must necessarily contain the words $00\dots 0$ and $11 \dots 1$, and except in the case of $n=3$ these are insufficient. I suspect that we can asymptotically beat $(1 + \epsilon)^n$ for any $\epsilon > 0$. – Adam P. Goucher Sep 22 '13 at 21:01
  • 1
    If we change the problem so that we delete $n/5$ bits instead of $n/3$ then we can get a simple exponential lower bound. We remove $n/5$ bits from each vector leaving $K$ distinct vectors. So we have $K\binom{n}{n/5}2^{n/5} \geq 2^n$. Therefore $K\geq \left(\frac{2^{1-1/5}}{(5e)^{1/5}}\right)^n > 1.033^n$. Maybe this approach can be improved to work for $n/3$ as well? – Simd Sep 23 '13 at 16:20
  • I am trying to prove that the covering set is optimal for n=9. The way I am doing this is to first consider the forced strings 000000, and 111111. Then the first length 9 strings that don't work are 000001111 and 111110000. This forces two more strings (0^a)(1^(6-a)) and (0^b)1^(6-b) for some a and b at least 2 and at most 4 (because of 000011111 and 111100000). For each of the 6 cases (3 are removed by symmetry), I then check for two other strings which don't have those as a substring and are incompatible with each other. a=b=2 has 010000111 and 101111000, and I am onto the a=2, b=3 case now. – Thomas Sep 25 '13 at 04:14
  • Woops, actually, there is another case: having the set 000000, 000001, 011111, (1^a)(0^(6-a)), 111111, for a=2, 3, or 4. Although, that case is covered by 101010101, so we're ok. – Thomas Sep 25 '13 at 04:23
  • I found that 010000111, 011000011, 110000111, 110011100 do not have the substrings mentioned above for a=2, b=3. Also they are incompatible, there is no 6 bit substring of all of them simultaneously. That takes care of the case a=2, b=3. Also, two of the cases can be ruled out by swapping left and right, so that leaves only the a=2, b=4 case and the a=b=3 case. – Thomas Sep 25 '13 at 05:13
  • I've just done some more work on it, and in the a=2, b=4 case, 101011010 and 110000011 are incompatible with the substrings in the a=2, b=4 case, and are incompatible with each other. Also, 001100110 and 110000011 are incompatible with the a=b=3 case and each other. That is all the cases that need to be covered, the rest are covered by symmetry. Therefore, six is indeed the smallest size of the covering set. I will now work on the case n=12. – Thomas Sep 25 '13 at 05:45
  • 1
    Your covering set is actually very useful! I managed to extend it to the general H(2a+3,a) case (just adding digits to the left and right). – Thomas Sep 27 '13 at 03:05
  • The bound can be slightly improved to $10^{n/12} \approxeq 1.21153^n$ by the following ten-element covering set for $n = 12$: ${00000000, 11111111, 11000011, 00110011, 11001100, 00111100, 00001111, 11110000, 01101001, 10010110}$ We seem to be observing diminishing returns. – Adam P. Goucher Sep 27 '13 at 07:45
  • Oh, I see that $H(12,4) = 10$ has already been established in another answer, using almost the same construction. – Adam P. Goucher Sep 27 '13 at 08:28
19

This is not an answer but rather a long comment. I give an informal argument that suggests what the right answer should be. The proof of the lower bound is rigorous, the proof of the upper bound is not.

Denote $k=n/3$. Let us say that a binary word $y\in\{0,1\}^{2k}$ covers a word $x\in \{0,1\}^n$ if $y$ can be obtained from $x$ by removing $k$ digits. Our goal is to find a set $S \subset \{0,1\}^{2k}$ of smallest possible cardinality that covers all words in $\{0,1\}^n$.

A lower bound on the size of $S$. We will show that every word $y$ covers approximately $2^{nH(1/3)}$ words in $\{0,1\}^n$. Therefore, the size of $|S|$ is at least $2^{n(1-H(1/3))}\approx 2^{0.08\, n}$. Here $H(t)$ is the entropy function $$H(t) = -t \log_2 t - (1-t) \log_2(1-t).$$

Let us fix $y$ and count the number of words $x$ that $y$ covers. To this end, we consider an algorithm that checks whether $y$ covers $x$. This is just a simple greedy algorithm that scans $x$ from left to right and finds indices $i_1 < \dots < i_{2k}$ s.t. $x_{i_r} = y_r$ for $r\in\{1,\dots, 2k\}$: $i_1$ is the first index s.t. $x_{i_1} = y_1$, $i_2$ is the first index after $i_1$ s.t. $x_{i_2} = y_2$ and so on. The algorithm terminates when it defines $i_{2r}$. The algorithm succeeds and finds $i_1< \dots < i_{2r} \leq n$ if and only if $y$ covers $x$.

Let $I_y(x) = \{i_1, \dots, i_{2k}\}$ for given words $x$ and $y$. Note that if $I_y(x') = I_y(x'')$ then the algorithm performs exactly the same steps. In particular, the first $i_{2r}$ digits in $x'$ and $x''$ are equal. Also for every set $I\subset \{1,\dots, n\}$ of size $2k$, there is a word $x$ s.t. $I_y(x) = I$.

Therefore, the number of words $x$ covered by $y$ is equal to sum over all possible values of $j\equiv i_{2r}$ the number of subsets of $\{1,\dots, j\}$ of size $2k$ times the number of possibilities for digits at positions $j+1,\dots, n$. $$\sum_{j=2k}^{n} \binom{j}{2k} 2^{n-j} \approx \sum_{j=2k}^{n} 2^{jH(2k/j)} 2^{n-j}\approx 2^n \sum_{j=2k}^n 2^{(\frac{j}{2k} (H(2k/j)-1))\cdot 2k} = 2^n \sum_{j=2k}^n 2^{f(2k/j)\cdot 2k}.$$ where $f(t) = (H(t)-1)/t$. The function $f(t)$ attains its maximum on $[2/3,1]$ when $t=2/3$. Thus the number of words covered by $y$ is approximately $$2^{n + 2 f(2/3)k} = 2^{nH(1/3)}.$$ We conclude that the set $S$ must contain at least $2^{n}/2^{H(1/3)n}\approx 2^{0.08\, n}$ words.

An upper bound on the optimal size of $S$. Note that this problem is a version of the set cover problem. Thus the size of the optimal set cover (optimal size of $S$) is within a log-factor of the size of the optimal fractional cover. (The log factor is $\log 2^{3n} = O(n)$). So it suffices to get an upper bound on the size of a fractional cover to get an approximate upper bound on the size the optimal set $S$.

Warning: This is not a proof! Some statements below are not correct!

Consider the bipartite graph with words $\{0,1\}^{2k}$ on the left, and words $\{0,1\}^{n}$ on the right, in which $y$ is connected to $x$ if $y$ covers $x$. The graph is “more or less bipartite”. To be precise, it is not regular but it is very close to a regular graph (this is an informal statement that needs justification!). We will pretend nevertheless that the graph is regular. The degree of each vertex on the left is approximately $2^{H(1/3)n}$ as we computed above. Thus we get a fractional cover when we take every string of length $2k$ with weight $2^n / (2^{2k} 2^{H(1/3)n})$. The total weight of all words in the fractional cover is $2^n / (2^{H(1/3)n}) \approx 2^{0.08\, n}$.

Answer: $\approx 2^{(1-H(1/3))n}\approx 2^{0.08n}$.

Yury
  • 696
  • 1
    Thank you. I wonder what, if any, the relation is between these ideas and those for deletion codes (see e.g. http://www.eecs.harvard.edu/~michaelm/TALKS/DelSurvey.pdf ). – Simd Sep 23 '13 at 18:32
  • @Anush, I have not heard before about this paper, but it looks like it is related. – Yury Sep 25 '13 at 00:09
13

Here is an exponential lower bound. We begin by determining exactly how many strings $Y$ of length $n$ can be reduced to a given string $X=x_1x_2\cdots x_k$. In general $y$ might contain many copies of $X$, but it contains exactly one left-most copy of $X$; that is, the first $x_1$ then the first $x_2$ and so on. The strings with $X$ as a left-most substring form a regular language of simple form. For example, if $X=101$ then the language is $0^*1 \cdot 1^*0 \cdot 0^*1 \cdot (0+1)^*$. The generating function for $0^*1$ and $1^*0$ is $z/(1-z)$ while the generating function for $(0+1)^*$ is $1/(1-2z)$.

Therefore, the number of $Y$s that contain $X$ is independent of the structure of $X$ and is the coefficient of $z^n$ in $z^k (1-z)^{-k} (1-2z)^{-1}$, namely $$N(n,k) = \sum_{i=0}^{n-k} 2^i\binom{n-i-1}{k-1},$$ which I think doesn't have a closed form.

Therefore, a lower bound for the question is $2^n/N(n,k)$.

For $k=2n/3$, the largest term in the sum is the first one, and the following terms are close to a geometric progression with ratio $2/3$. This gives $$ N(n,2n/3) = (2 + O(1/n)) \binom{n}{2n/3},$$ giving a lower bound of $$ (1+o(1)) \frac{\sqrt{\pi n}}{3} \left(\frac{2^{5/3}}{3}\right)^n. $$ Note that $2^{5/3}/3\approx 1.05826737$.

I think it is most unlikely that this is best possible.

Brendan McKay
  • 37,203
  • 2
    I think your answer is essentially the same as Yury's nice answer from yesterday. Note that his $2^{0.08}$ is approximately your $1.058$. Yury suggested plausibly that this might be the right answer. – Lucia Sep 24 '13 at 13:50
  • 1
    An issue for this being the correct value is how disjoint the "radius $k$" sets can be. For example in the case $n=9$ the string $111000111$ can be reduced to $111111$ and to $111000$ and to $110011$, half the members of the six element covering set. – Aaron Meyerowitz Sep 24 '13 at 21:11
11

If a vector has $d$ one-entries, and $d\leq n/2$, then delete as many ones as you can, (and further zeros if necessary). Conversely, if a vector has more one entries than zero entries, delete as many zeros as you can, (and further ones if necessary).

Any remaining vector of the first type will have $\max(d-n/3,0)\leq n/6$ ones. The number of such vectors with exactly $n/6$ ones in $2n/3$ positions is $\binom{2n/3}{n/6}$. For the final sum, one needs to sum over bimomial coefficients $2 \sum_{i=0}^{n/6} \binom{2n/3}{i}$ and such a sum can be approximated: Note that the largest entry, with $d=n/6$ gives by far the greatest contribution. The binomial coefficient in this region can be approximated by $\binom{k}{l}=2^{k H(l/k)+o(k)}$, where $H$ denotes the entropy function $H(x)=\frac{-x \log x-(1-x)\log (1-x) }{\log 2}$, (for $x\in [0,1]$, and $\log $ is the natural logarithm).

Edit: In view of Yuri's comment, I correct this: (Thank you, Yuri!)

As $H(1/4)$ is about $0.811$, this is about $2^{2n/3 \times 0.811...+o(1)}=2^{0.54\ldots n}$.

This upper bound is certainly weaker than the bound $2^{n/3}$, but uses a quite different method. It would be interesting to see, whether the "optimum" uses a deterministic construction, or a random construction (like Shannon's bounds in coding theory), or a combination of methods.

Some explanation why the method above gives some saving over the trivial $2^{2n/3}$: most of the original vectors have about $n/2+ O(\sqrt{n})$ zero and one entries. Going away from this symmetric centre reduces (by the binomial distribution) the number of possibilities. In other words the tail of this distibution is small.

  • 2
    Shouldn't it be $\binom{2n/3}{n/6}\approx 2^{kH(1/4)} \approx 2^{(2n/3) \cdot 0.811} \approx 2^{0.54n} \gg 2^{n/3}$ ? – Yury Sep 22 '13 at 16:30
8

For $n=12$ it is possible to use $10$ strings of length $2k=8.$

There is, for $n=3k$, a smallest size $s(n)$ for a set $S$ of strings in $ \{0,1\}^{2k}$ which covers all the strings in $\{0,1\}^n$.

We have $s(n_1)s(n_2)\ge s(n_1+n_2)$ so, by Fekete's Lemma there is a constant $\alpha=\inf {s(n)^{1/n}}$ such that $\lim {s(n)^{1/n}}=\alpha.$

We so far have from the various answers that $$1.05827 \lt \alpha \lt 6^{1/9}\approx 1.2203.$$

I show below that $s(12) \le 10$ which improves the upper bound to $10^{1/12}\approx 1.2115.$

To improve on this upper bound we would need $s(15) \le 17$ (I'd bet on $16$, if anything, just because it seems nicer.) Taking into account that we must have $0^{10},1^{10}$ (in an obvious notation) and adding the optimistic condition that the set of strings is closed under complements and reversing the order, it might be possible to investigate this by somewhat intelligent brute force. An even more optimistic condition would be that each string of length $2k=10$ is either unchanged or replaced by its complement when reversed. Finally one could add the restriction that each of the non-constant words has five $0$'s and five $1$'s. I can think of some obvious further strings to include but that still leaves much to do.

A larger lower bound might (or might not) arise from considering how much overlap there has to be for various "radius $k$" balls. For example each of the six members from the set for $s(9)=6$ covers $130$ of the $512$ strings in $\{0,1\}^{9}$ so on average each string is covered a little over $1.5$ times. Some just once and others, such as $111000111$ as many as three times.

In general one might reasonably hope to have a covering set where every non-constant string has $k$ $0$'s and hence $k$ $1$'s. This has the advantage that one knows for any particular word how many $1$'s and $0$'s must be deleted.

For $n=12$ the eight strings below ( Walsh sequences ) are (nearly) enough $$00000000,00001111,00110011,00111100$$$$11111111,11110000,11001100,11000011.$$ They are enough with the addition of the two strings $$11101000,00010111$$ Here is a sketch where we avoid using the final two strings as long as possible:

Thanks to the two constant strings we need only consider length $12$ strings with $5,6$ or $7$ $1$'s. By symmetry and complements we can restrict to either $6$ or $7$ $1$'s with at least as many among the first $6$ locations as the last $6$. Observe that any length $6$ string with $3$ $1$'s can be reduced to $0011$ or $1100$ with two deletions.

Consider first the case of $6$ $1$'s. If $3$ are in the left half (and $3$ on the right) then the previous observation shows that we can get one of the four strings which are not constant on either half. If $5$ or $6$ are on the left then we can get $11110000$.

It remains to consider the case of $7$ $1$'s and $5$ $0$'s with at least $4$ of the $1$'s on the left side. We must delete a single $0$ and three $1$'s. Again, if there are $5$ or $6$ $1$'s on the left side then we can get $11110000$. All that is left is the subcase with $4$ $1$'s on the left. If the left side is $abcde0$ then we can delete the single $0$ among $abcde$ and continue to get $11110000.$ If the left side is $abcd11$ then we can delete the two $1$'s among $abcd$ to get $0011$. So now we have the subsubcase that the left side is $abcd01$ and the right side has $3$ $0$'s and $3$ $1$'s. The four possibilities on the left include $111001$,$110101$ (both of these can be reduced to $1100$ with two deletions). Finally we get the case that the left side is $101101$ or $011101$ and there are three $1$'s on the right side. In some cases we are still covered but $101101\ 010101$ (for example) seems hopeless. However for all $30$ of the length 12 strings left uncovered we can delete the first $0$ from the left half and all three $1$'s from the right to get $1110101000.$

  • 1
    A better choice for the "extra" two vectors for $n=12$ is $10010110,01101001$. Then one has the four vectors constant on each half and six more vectors each equal or complementary to its reverse (as before) AND with equally many $0$'s and $1$'s in each half. – Aaron Meyerowitz Sep 25 '13 at 04:45
  • And it is the unique minimal cover with all these properties. – Aaron Meyerowitz Sep 25 '13 at 04:57
  • I have checked a few relatively simple cases, and it seems that using 00011111 and 11100000 would be better than 00001111 and 11110000. I know that it lacks the symmetry of the original solution, but we should try it anyway. – Thomas Sep 25 '13 at 08:45
  • I tried taking those four along with 00000111 and 11111000 but I don't thik you can enlarge that to a covering set of size 8. The eight Walsh strings cover all but 32 out the 4096 length 12 strings so I think a size eight could well be possible. – Aaron Meyerowitz Sep 25 '13 at 16:57
  • My idea is that you have to have some other string of the form (0^a)1^(8-a) to cover 000000111111. I am curious, what are the 32 problematic cases? – Thomas Sep 26 '13 at 02:59
  • It would be nice if someone proved some symmetry properties of the covering sets. For example, that the complement of every string in the covering set is also in the string, but I guess that's too much to hope for. Also, do the 32 problematic elements have some form of symmetry other than the complement symmetry? – Thomas Sep 26 '13 at 03:50
  • If I had more time I would make this shorter. There are actually 64 12-strings not covered by any of the Walsh strings. These 32 and their complements: 010010001101,010010001110,010010010101,010010010110,010010101001,010010101010,010010110001,010010110010,010011010001,010011010010,010011101101,010011101110,010101010001,010101010010,010101101101,010101101110,011010010001,011010010010,011010101101,011010101110,011100010001,011100010010,011100101101,011100101110,011101001101,011101001110,011101010101,011101010110,011101101001,011101101010,011101110001,01110111001 – Aaron Meyerowitz Sep 26 '13 at 06:33
  • 1
    I have now proved by computer search that $10$ is optimal for $n=12$. Unfortunately it does not seem feasible to scale this up to $n=15$ without some more work. – Simd Sep 26 '13 at 08:47
  • Did you search every option, or only the symmetrical ones? – Thomas Sep 26 '13 at 14:17
  • @Thomas All options. In fact I formulated the problem as an integer programming problem and just solved it using http://scip.zib.de/. – Simd Sep 26 '13 at 14:23
  • Wow, cool. I am working on the problem for strings of length 6 and being able to remove one bit (6,1). I am having a bit of trouble getting a covering set of size less than the trivial 12 gotten by appending the last digit to the 6 optimal strings for the (5,1) case. I am beginning to think that there isn't one, can you check? Is this program open source so I can download it and use it to solve the problem? Using paper and pen only gets so far. – Thomas Sep 26 '13 at 14:39
  • Also, I have tried to use the walsh sequences in the (6,1) problem, by removing the first two digits and the last one, but that did not work, there were 8 left over sequences. – Thomas Sep 26 '13 at 14:43
  • I conjecture that H(a+2,b+1)<=H(a,b) if a>b. This works for the first few values. – Thomas Sep 26 '13 at 14:52
  • @Thomas $12$ is indeed optimal for $(6,1)$ and $20$ is optimal for $(7,1)$. I feel embarrassed releasing my cobbled together code but I am happy to run any tests or describe it in more detail. $6$ is optimal for $(7,2)$. – Simd Sep 26 '13 at 15:07
  • @Thomas The sequence for $H(n,1)$ is $2,2,4,6,12,20,40, \leq70 , \leq 141$. This has two different possibilities at http://oeis.org/search?q=2%2C+2%2C+4%2C+6%2C+12%2C+20%2C+40&sort=&language=english&go=Search . Do we already know a closed form for $H(n,1)$? – Simd Sep 26 '13 at 21:22
  • I believe that 6 is optimal for (2a+3,a) for all a, using the construction described above, but I haven't been able to prove it. – Thomas Sep 27 '13 at 01:19
  • I also think I know a closed form for H(n,1). If n is odd, H(n,1)=(n+1 (n+1)/2) ((a b) is a choose b), and if n is even, H(n,1)=2H(n-1,1). This is completely conjectural of course, but it lines up for the first few values. – Thomas Sep 27 '13 at 01:19
  • I am going to work on H(2a+4,a). I already know 2 of the values, H(6,1) and H(12,4) because of your program, I am going to work on the intermediate cases H(8,2) and H(10,3). I also think that, since for H(12,4), it was close to being 8 (only 32 problem cases out of 4096), I believe that 8 may be the limit value of H(2a+4,a). – Thomas Sep 27 '13 at 01:22
  • Oh, I didn't see your edits stating that H(8,2)=H(10,3)=10. I will work on extending the walsh sequences for H(14,5) and H(16,6). – Thomas Sep 27 '13 at 02:55
  • Actually, maybe I should try to prove my claim about H(n,1), or at least prove it is an upper bound. – Thomas Sep 27 '13 at 08:34
  • 1
    I managed to get a $17$ solution for $(15,5)$. They are 0000001111,0000111111,0001101100,0011100011,0011111000,0100111010,0110001‌001,0111001110,1000010000,1001100111,1100011100,1101000110,1110000011,1111‌100000,1111111000 plus the all 0s and all 1s. – Simd Oct 02 '13 at 17:13
  • Wow! How did you get it? Do you have a sense how rare it is? Do you think 16 is possible? – Aaron Meyerowitz Oct 03 '13 at 06:06
  • I have reformulated the problem as a "hitting set" problem and then coded that as a Mixed Integer Linear Program (MILP). I am using some software called Gurobi to solve the MILP. Unfortunately a disadvantage of my method is that I don't have any sense either for how rare it is or if one could do better. My encoding is also naive making it probably very wasteful. – Simd Oct 03 '13 at 18:24
4

I think it may be helpful and/or intersting to consider the problem for fractions other than $1/3$.

Specifically, if $n$ is a multiple of $q$, let $f(p/q,n)$ be the minimal size of a set of $n- (p/q)n$ bit strings such that deleting $(p/q)n$ bits from each $n$-bit string produces an element of the set. Then set

$$g(p/q) = \lim _{n\to \infty} \frac{\log f\left(\frac{p}{q},n\right)}{n}$$

The limit exists because, by the divide-into-independent-blocks argument, any particular value is a bound for the $\lim\sup$.

It is clear that $g$ is monotonic. It's not immediately obvious to me if we can prove that $g$ is continuous.

Clearly $g(x)=0$ for $x \geq 1/2$.

For a simple upper bound on $g$ for $x<1/2$, we can assume with the loss only of a constant that there are fewer $0$s than $1$s. Then divide the $n$ bits into $q$ equally sized parts, and delete all the $0$s in the $2p$ parts with the fewest ones. This takes $2^{ (1-2p/q)n }$, giving an upper bound $g(x) \leq (1-2x) \log 2$.

Thanks to Brendan McKay and Yury, we have a lower bound on $g$. If I understand this bound correctly, it is that $g(x) \geq \log 2 + x \log x + (1-x) \log(1-x)$. (We can easily check that the maximal term in the $x=1/3$ case remains maximal for all $0<x<1/2$.)

Specific examples, like, Adam Goucher's, can give us tighter upper bounds for specific values of $x$.

Will Sawin
  • 135,926
  • I think this is a good idea. But why not just consider $H(n,k)$ which is the number of strings needed if we are allowed to delete $k$ elements from an $n$ element string. As you note the methods of Yury and McKay giive lower bounds for $H(n,k)$. Note that $H(n,k) \le H(a,\ell)H(n-a,k-\ell)$ and so Goucher's example gives non-trivial upper bounds. – Lucia Sep 25 '13 at 20:09
  • The reason I did that is that I want to consider asymptotics as $n$ goes to $\infty$. The most obvious way to do this was to fix $n/k$. But there might be other interesting things to do! Yes, this bound on $H(n,k)$ gives an upper bound on $g$, forcing it to be convex. So currently our curve bounding $g$ is the convex hull of just $3$ points. A search could presumably find more. – Will Sawin Sep 25 '13 at 20:47
  • Your $g(x)$ is my $(\log H(n,nx))/n$ as $n$ goes to infinity. The upper bound on $H(n,k)$ noted above I think gives that $g$ is convex, and therefore continuous. – Lucia Sep 25 '13 at 20:48
  • Our last comments crossed, but at least they are in agreement. Note that the lower bound for $g$ is also convex, so that it is at least plausibly the right answer. – Lucia Sep 25 '13 at 21:04
  • I have calculated a few values of H(n,j), and I discovered that H(2a+1,a)=2, and H(2a+2,a)=4. I am trying to figure out H(2a+3,a) now. – Thomas Sep 26 '13 at 10:01
  • I think that H(2a+3,a)=6 for all a>=1. We can use a construction of similar type to Goucher's, 0^(a+3), 0^((a+3)/2)1^((a+3)/2), 0^((a+1)/2)110^((a+1)/2), and their complements when a is odd, and 0^(a+3), 0^((a+4)/2)1^((a+2)/2), 0^((a+2)/2)110^(a/2), and their complements when a is even. I will work on the H(2a+4,a) case now, though I think it will be a lot harder. – Thomas Sep 26 '13 at 12:01
1

This answer is an elaboration on the idea of Christian Elsholtz. Like that answer, this answer does not beat $2^{n/3}$.

As noted by Christian, by symmetry it suffices to deal with vectors that contain at most $n/2$ ones. By left-right symmetry, it suffices to deal with vectors of at most $n/4$ ones among the first $n/2$ positions. Remove the first $n/3$ ones. That leaves a vector of length $2n/3$ with at most $n/6$ ones that starts with a string of $n/4$ zeros. There are approximately $\binom{2n/3-n/4}{n/6}\approx 2^{5n/12\times H(2/5)}=2^{0.40455n}$, which is still worse than $2^{n/3}$.

One can improve this a bit. Namely, after deleting all the ones from the first half of string, we choose whether to remove ones from third or four quarter. It appears that in the worse case (non-rigorously!) we end up with a string of length $2n/3$ that contains $n/6$ ones among which $n/24$ are in the third quarter and $n/8$ are in the fourth quarter. There are approximately $\binom{n/6}{n/24}\binom{n/4}{n/8}\approx 2^{0.385n}$ of them.

Boris Bukh
  • 7,746
  • 1
  • 34
  • 71
  • 1
    You can do better. Split ones into three parts and delete two of them. In one of three ways you obtain two rows of zeroes at the left and at the right with the sum of lengths at least $n/3$. Thus you get at most ${n/3\choose n/6}\cdot O(n)=2^{n/3+o(1)}$. – Ilya Bogdanov Sep 23 '13 at 12:48
1

For each $\ n=9\cdot m + 3\cdot k\ $ one gets a bound of $\ 6^m\cdot 2^k\ =\ 6^{\frac n9}\cdot 2^k\ $ for every $\ m=0\ 1\ \ldots\ $ and $\ k\in\{0\ 1\ 2\}$.

A hand justification below of the result by @Adam P. Goucher (and a computer) indicates a further possible progress along a similar line. I'll explicitly associate binary sequences of length $\ 9\ $ with the respective Goucher's sequences.

I'll provide a simpler derivation below, and will leave the previous one at the bottom.


Let $\ b_0\ldots b_8\ $ be a bit string.

Case A:   Let $\ b_0b_1b_2b_3\ $ have (at least) three same bits, say $\ x$.   Then $\ b_4b_5b_6b_7b_8\ $ bits contain (at least) three bits say $\ y$ (the majority of five),   where values $\ x\ y\ $ are different or the same. In either case by leaving the two groups of three bits we get one of the four strings of length 6:

$$ 000000\quad 000111\quad 111000\quad 111111$$

Case A':   Consider $\ b_5b_6b_7b_8\ $ -- everything is symmetric.

From now on let's assume that the distribution of bits in $\ b_0b_1b_2b_3\ $ is two bits of each, and the same for $\ b_5b_6b_7b_8$.

Case B:   $b_3=b_5$,   and say $\ b_3=b_5=x$.   Then remove one of bits of value $\ 1-x\ $ from $\ b_0b_1b_2\ $ and from $\ b_6b_7b_8\ $ and remove also bit $\ b_4$.   We are left with one of the strings:

$$ 001100\qquad 110011$$

Case C:   $b_3=b_4\ne b_5$,   and say $\ b_3=x$.   Then remove the one bit of value $\ x\ $ from $\ b_0b_1b_2$,   and the two more bits $\ x\ $ from $\ b_6b_7b_8$.   We are left with one of the two 6-strings as the above.

Case C':   $b_3\ne b_4= b_5$ -- symmetry.

END of PROOF


(Back to the old argument)

Let's refer to the six Goucher's sequences as of the type $\ 6\ \ 3\!+\!3\ \ 2\!+\!2\!+\!2$,   where each type addresses the consecutive two sequences by Goucher.

Case 1: one of the bit values of a binary sequence $\ b_0\ldots b_8\ $ occurs at least $\ 6\ $ times. Then we may leave a six of them to produce a 6-sequence of type $\ 6$.   Now we may restrict ourselves to the cases when each bit value of a 9-sequence $\ b_0\ldots b_8\ $ occurs $\ 4\ $ or $\ 5\ $ times.

Let the bit value $\ x\ $ be the value of the majority of $\ b_6b_7b_8$,   and $\ y\ $ be the value of the majority of $\ b_0b_1b_2$. (Values $ x\ y\ $ can be equal or different).

Case 2:   $b_6=b_7=b_8=x\ $ or $\ b_0=b_1=b_2=y$. It's enough to consider just the earlier option, about $\ x$.   Then there are three bits among $\ b_0\ldots b_5\ $ which have the same value.   These three bits together with $\ b_6b_7b_8\ $ form a 6-sequence of type $\ 6\ $ or $\ 3+3$. The latter option, about $\ y$,   is proved similarly.

Now we may assume that exactly two bits of $\ b_6b_7b_8\ $ have value $\ x$,   and exactly two of $\ b_0b_1b_2\ $ have value $\ y$.

Case 3:   $x=y$.   Then if at least $\ 2\ $ of the bits of $\ b_3b_4b_5\ $ have value $\ x\ $ then we leave these two $x$-bits together with two of the $x$-bits of $\ b_0b_1b_2\ $ and another two $x$-bits of $\ b_6b_7b_8\ $ to produce a 6-sequence of type $\ 6$.   Otherwise two bits of $\ b_3b_4b_5\ $ are different from $\ x=y$.   Then two (middle) non-x bits together with the 2+2 bits from $\ b_0b_1b_2\ $ and $\ b_6b_7b_8\ $ form a 6-sequence of type $\ 2+2+2$.

Case 4:   $x\ne y$.   The three bits $\ b_3b_4b_5\ $ cannot have the same value or else there would be $\ 6\ $ bits of the same value in the whole 9-sequence. Next, if there are integers $\ r\ s\ $ such that $\ 3\le r<s\le 5\ $ and $\ b_r=y\ $ and $\ b_s=x\ $ then we would get a 6-sequence of type $\ 3+3$.   Otherwise $\ b_3=x\ $ and $\ b_5=y$. Let's assume that $\ b_4=x\ $ (the case $\ b_4=y\ $ is symmetric). The the two of $y$-bits of $\ b_0b_1b_2\ $ together with $\ b_3b_4b_5\ $ and the single $y$-bit of $\ b_6b_7b_8\ $ form a 6-sequence of type $\ 2+2+2$.

END of PROOF