64

I was listening to Sphere Packing Solved in Higher Dimensions; A Ukrainian mathematician has solved the centuries-old sphere-packing problem in dimensions eight and 24. and reading the transcript there.

Mathematicians have been studying sphere packings since at least 1611, when Johannes Kepler conjectured that the densest way to pack together equal-sized spheres in space is the familiar pyramidal piling of oranges seen in grocery stores. Despite the problem’s seeming simplicity, it was not settled until 1998, when Thomas Hales, now of the University of Pittsburgh, finally proved Kepler’s conjecture in 250 pages of mathematical arguments combined with mammoth computer calculations.

later:

Higher-dimensional sphere packings are hard to visualize, but they are eminently practical objects: Dense sphere packings are intimately related to the error-correcting codes used by cell phones, space probes and the Internet to send signals through noisy channels. A high-dimensional sphere is easy to define — it’s simply the set of points in the high-dimensional space that are a fixed distance away from a given center point.

and later

The Leech lattice is similarly constructed by adding spheres to a less dense packing, and it was discovered almost as an afterthought. In the 1960s, the British mathematician John Leech was studying a 24-dimensional packing that can be constructed from the “Golay” code, an error-correcting code that was later used to transmit the historic photos of Jupiter and Saturn taken by the Voyager probes. Shortly after Leech’s article about this packing went to press, he noticed that there was room to fit additional spheres into the holes in the packing, and that doing so would double the packing density.

Question: How is stacking oranges in 24 dimensions related to receiving and decoding signals from the Voyagers? It it possible to explain in a relatively simple way to the Space SE community, or find a source that does explain the connection suitable for this site?


Thomas Hales, pictured in 1998, used a computer to prove a famous conjecture about the densest way to stack spheres.

Thomas Hales, pictured in 1998, used a computer to prove a famous conjecture about the densest way to stack spheres.

uhoh
  • 148,791
  • 53
  • 476
  • 1,473

1 Answers1

132

All the different data words a transmitter can send and a receiver can detect can be imagined as dots arranged in a large space.

Selecting a data encoding for error detection and correction is about keeping valid code words at a certain distance from each other. As a result, a slight change ("movement") of a valid word doesn't make it look like another valid word.

As I'm not going to make any drawings of 24 dimensional spheres and hypercubes, I'm going to restrict this answer to three dimensions.

3-bit example

Imagine a data word consisting of three binary digits. We can arrange them in the form of a cube by treating each digit as the coordinate in one dimension:

Each transmission error, that is a bit that is a '0' mistaken for a '1' or vice versa, corresponds to moving one step along one of the edges of this cube.

In normal, everyday communication with sufficiently low error rates, we can treat all code points as valid:

But, every flipped bit leads to another valid word. So, if we remove every second valid word, we get this:

Now, all the valid words are two edges apart. One flipped bit gets us to an invalid word and we know there was an error, but we are not able to correct it, because there are three possible bits that could have flipped. This is called a "0 error correcting, 1 error detecting" code.

To improve robustness, remove another two valid code words:

Now, all valid words are three edges apart. If one bit flips we get to an invalid word, but we still can tell where we came from. If two bits flipped we can't correct the word, because another valid word is closer to the wrong code than the correct word. Hence, this code is called "1 error correcting, 2 bit detecting"[*]. This is the best we can get with our simple 3-bit code words.

4 bit code extension

If you add another bit to have 4-bit code words, you can increase the distance between valid points to three edges of the then 4-dimensional cube. This gives another level of error mitigation: If two bits flip, you reach a invalid word right "in the middle" of several valid words. You can't decide which one might be the correct one, but at least you know that two bits flipped. This type of code is called "1 error correcting, 3 error detecting".

Voyager

In the case of Voyager, this wouldn't be enough, so they went for a 24 bit long code word. From the total of 16 million code words, they only defined 4096 as valid. I.e. 12 Bits carry actual information and another 12 are used for error correction. This resulted in a "3 error correcting, 7 error detecting" code. I.e., if in any word 3 bit were wrongly received, it could still be corrected properly and if up to 7 bits flipped; at least it would be known something is wrong. This code could be represented in the same way I did above as the corners of a 24-dimensional hypercube.

Now, how does this relate to packing of spheres? In fact, the three images show the densest possible packing of spheres with diameters of $\sqrt 1$, $\sqrt 2$ and $\sqrt 3$, respectively, under the constraint that their centers need to be located on corners of the cube.[**]

Obviously, this doesn't look too spectacular, but it gets a lot more challenging if we're not looking at digital, binary data, but use a transmitter that also supports values in between, e.g., by using not a simple on/off modulation, but add amplitude modulation on top. By adding one more step (e.g., power off/low/high) for each of the three digits in our example, we don't have eight valid code words, but actually $3^3 = 27$ - start packing your spheres onto that grid!


[*] Note that "detecting" doesn't mean that you will be able to tell how many bit errors there are exactly. It refers to the number of bit errors that can occur before you end up with another valid code word.

[**] Strictly speaking, we don't deal with spheres in a regular Euclidean space here, but Hamming spheres - these are defined by the set of corners that are a given number of edges away from their center. This accounts for the fact that in a binary world only the corners of the cube represent valid points while any other point would have fractional coordinates and just doesn't exist. Practically, there is no difference between the two in the examples given here.

asdfex
  • 15,017
  • 2
  • 49
  • 64
  • 9
    In the cube example, how can you error correct? You don't know whether one bit or two bits got flipped. Ie receiving 101 could be 100 with a bit flipped, but it could also be 011 with two bits flipped. Even if the first is more likely it's still ambiguous. I can't see how any error correction can be performed here, only detected. – Innovine Jul 07 '21 at 21:16
  • 9
    @Innovine I believe you can correct any one-bit errors in this example, but you cannot detect two-bit errors. All two bit errors are indistinguishable from a one-bit error of the opposing word. So a two-bit error will be "corrected" to the opposite word of what was sent. – Wayne Conrad Jul 07 '21 at 21:49
  • 2
    By having all zeros and all ones as an invalid code (as shown above), this approach also detects that big error. – chux - Reinstate Monica Jul 07 '21 at 23:00
  • 25
    The answer illustrates a 3,1 Hamming code, which is capable of detecting up to 2 error bits, or correcting 1 error bit, but not both. Adding an additional parity bit allows it to correct 1-bit errors and detect 2-bit errors, but at the cost of an additional bit. https://en.wikipedia.org/wiki/Hamming_code#Hamming_codes_with_additional_parity_(SECDED) – Lawnmower Man Jul 08 '21 at 05:50
  • 3
    @LawnmowerMan Yes. Similarly, the binary Golay code can correct 3 errors or detect up to 6 errors, but not both. (This is a code with words of 23 bits.) The extended binary Golay code (with 24 bit words) can correct 3 errors and detect 4 errors. If you want only error detection, the extended code can detect up to 7 errors. – Will Orrick Jul 08 '21 at 06:24
  • 2
    In many real systems, the transmitter would transmit clean 0 and 1, but the receiver would read an analog value. Then e.g. (0.9, 0.3, 0.6) would be more likely 100 than 011. – jpa Jul 08 '21 at 07:35
  • 1
    To make the connection with "orange stacking", you will need to introduce the geometrical concept of "Hamming distance". This is how one can show that Hamming codes (and Golay codes) are "perfect" solutions of the mathematical "sphere packing" problem. – Ng Ph Jul 08 '21 at 10:16
  • @NgPh The answer does that by actually drawing the valid points in the codes. I guess it could add oranges to the drawings. I mean, green spheres? What does that have to do with oranges? (I kid) – Yakk Jul 08 '21 at 13:17
  • 1
    @OrganicMarble Well, use https://image.spreadshirtmedia.com/image-server/v1/mp/compositions/T210A1MPA3176PT17X104Y92D1019911352S100/views/1,width=550,height=550,appearanceId=1,backgroundColor=FFFFFF,noPt=true/n-equals-1-n1-sample-size-statistics-mens-t-shirt.jpg but replace 1 with 24. – Yakk Jul 08 '21 at 13:36
  • 2
    @Yakk, the said drawings can be confusing. Some readers may miss the point that the "distance" between the code points is the number of edges (NOT the usual Euclidian distance). For ex, in the last drawing, a sphere of "size 2" centered on (100) covers (111), but not (011). You may try to draw "oranges" (spheres) in this Hamming geometry and let's see what you can achieve. – Ng Ph Jul 08 '21 at 14:49
  • @asdfex, You have warned that your answer above is just a beginning (kudos,anyway). Yet, it was sufficient to show how challenging uhoh's question is. And this type of challenge is explained right here. – Ng Ph Jul 08 '21 at 16:39
  • 1
    @NgPh Actually it doesn't matter if we talk Euclidian distance or Hamming here - as long as we stay binary, you only have to replace distance N by $\sqrt{N}$. – asdfex Jul 08 '21 at 16:44
  • 1
    @asdfex, not really, if you are working in binary error correction ("hard" errors). You are restricted to move from one point to another in your "space" in integer steps only. When the decoder receives a message, it chooses the codeword closest to that message (in terms of Hamming distance). A distance of 1.5 bits in error does not have any meaning for this decoder. It does matter a lot to precise which distance metric you are using in the general "sphere-packing" problem. – Ng Ph Jul 08 '21 at 17:02
  • 1
    @NgPh One should be clear that the sphere packing problem that is solved by the Leech lattice is the Euclidean version, not the Hamming one. It is true that the union of Hamming spheres of radius 3 surrounding the Golay codewords entirely covers the set of words of length 23 over a binary alphabet, making the code perfect. It is equally true that the extended code (of 24 dimensions) can be used to construct the Leech lattice and hence the optimal Euclidean packing. There are two distinct sphere packing problems, solved by two distinct but related objects (the Golay code and the Leech lattice). – Will Orrick Jul 09 '21 at 07:04
  • Excellent answer. I would suggest to add: The 24 bit long code word has 12 data bits and 12 ECC bits. – Uwe Jul 09 '21 at 07:24
  • @Will Orrick, I may add that the demonstration by Marina Viazoska for dimension 8 seems to be inspired likewise by the similarity between the Hamming code (8,4,4) and the lattice E8 (although she did not mention Hamming). Both mathematical objects reach their sphere-packing bound in their respective geometry (Hamming vs Euclidian). That's perhaps the connection? – Ng Ph Jul 09 '21 at 10:47
  • 2
    @asdfex, keep perseverance. Nobody has demonstrated the sphere-packing problem in Euclidian geometries yet for dimensions N=4,5,6,7,9 to 23, and >24! – Ng Ph Jul 09 '21 at 10:50
  • @NgPh Is that greater than 24, or greater than 24 factorial? – Matt Dunn Jul 09 '21 at 13:26
  • @Matt Dunn, well spotted. Greater than 24. – Ng Ph Jul 09 '21 at 15:07
  • If a code is 3 error correcting and not more, then that implies that there is at least one pair of code points at a distance of exactly 6 or 7 apart, right? Otherwise the code would be either more or less error correcting. At least, that's how I understand this answer. But whereas such a code might be either 3 or 4 error detecting, I don't see how it could be 7 error detecting. Flipping more than 4 bits could produce an undetected misinterpretation of one element of the aforementioned pair as the other. – John Bollinger Jul 10 '21 at 15:03
  • @JohnBollinger Your understanding seems correct. Note that "7 error detecting" doesn't mean that you are able to get to know that 7 bits flipped - it only means you don't get another valid code by chance if 7 bits flipped. It can well be that a word with 7 flipped bits get corrected to another, wrong, valid code word. – asdfex Jul 10 '21 at 15:56
  • Yes, I understand. The point is that this answer says, "This resulted in a "3 error correcting, 7 error detecting" code. I.e., if in any word 3 bit were wrongly received, it could still be corrected properly and if up to 7 bits flipped; at least it would be known something is wrong", but that seems inconsistent. – John Bollinger Jul 10 '21 at 16:49
  • @JohnBollinger If you have a better way to write that... up to 3 errors -> correcting to correct word, 4 errors -> error, not correctable, 5-7 errors: corrected to wrong word, 8 errors: valid word, no error detected – asdfex Jul 10 '21 at 17:06
  • Wouldn't the terminology be a "3 error correcting, 4 error detecting code"? I take it to be of little relevance to the nomenclature that the error-correcting logic will kick in on 5-, 6, and 7-bit errors (producing an overall word error), as these cannot be distinguished from 3-, 2-, and 1-bit errors without comparing to the source. Otherwise, why wouldn't your four-bit code be "1 error correcting, 3 error detecting"? – John Bollinger Jul 10 '21 at 18:25
  • 1
    @JohnBollinger The "detecting" number counts all cases, including those that can be corrected. In the four-bit case I messed up. – asdfex Jul 10 '21 at 19:04
  • 2
    THANK YOU this is the most elegant explanation of error detection/correction & hamming codes I've seen. Never understood it before now. Amazing – Anton Hengst Jul 10 '21 at 20:46
  • @John Bollinger, interpret it as follows. If you use Golay as a CORRECTION code, up to 3 errors correction is guaranteed (and the number of errors is also given). ALTERNATIVELY, you can use it as a DETECTION code. Detection is guaranteed up to 7 errors (although it won't tell you how many). If you use it for correction, with 4 errors it will (systematically) return a wrong codeword. Same for 5,6,7 errors. If you use it for detection, with 8 errors, it will happily "certify" the codeword as valid (which is of course wrong). – Ng Ph Jul 12 '21 at 14:18
  • A slight correction:: for the 4 bit code extension you should say that "you can increase the distance between valid points to four edges". – Will Orrick Jul 12 '21 at 17:45
  • 2
    @JohnBollinger Regarding terminology, Wikipedia is just one data point, but the article Block code says that a code with minimum distance $d$ can detect $d-1$ errors and that it can correct $\lfloor(d-1)/2\rfloor$ errors. I think the confusion arises in saying that the extended [8,4] Hamming code is "1 error correcting, 3 error detecting". The comma suggests that the code can do both if these things simultaneously, which it can't—it can only do one or the other. Better, I think, to replace the comma with the word "or". – Will Orrick Jul 12 '21 at 18:03