How can a CD still be read out even if some scratches adorn its back? How can a rover, which is supposed to make investigations on the moon, transmit its data to the base station on earth, even if there are surely some errors in the transmission? How can I efficiently store my data on three hard disks so that if one of them fails, all the data can still be reconstructed?
Coding theory deals with these and similar questions. The basic idea is the following: Given a finite alphabet (e.g. in the binary case) and a code length . The data to be encoded are now always stored/transmitted in blocks of length . However, such a block actually only contains the information of a smaller number of -bits. That is, during encoding always one word of length over the alphabet is converted into one word of length over the same alphabet. In this process, bits are redundant. Mathematically speaking, this encoding is nothing more than an injective mapping . Its image is called a –code ( is called the length of and the dimension).
Now the following can happen: In the block of length , some symbols are changed (by noise in the channel or the like). Let us say these are symbols. Thus a new disturbed block is created. Can the original information still be reconstructed? For this we define the Hamming distance of two codewords as (where and with ). The idea now is to modify the perturbed block to the codeword in that is closest to this block measured in Hamming distance. But is this codeword unique at all? In general not!
Therefore, in order to recognize “too disturbed” blocks as such and to always be able to reconstruct reconstructable blocks unambiguously, a ball of radius is drawn around each admissible codeword at the Hamming distance, so that no two of these balls overlap: for different. Now, if lies in one of these balls, it is reconstructed to the codeword . In this way is always unique due to the above condition.
But how is the radius to be chosen here? We want and not to intersect for from . However, this is true if and only if . This condition must now be satisfied for all and . Thus, if we set (the minimum distance from ), then is the best possible radius. Thus, the parameter determines the correction radius. In this situation is also called a code.
The Hamming codes are binary linear codes of length , dimension and with minimum distance developed by mathematician Richard Wesley Hamming (here is a parameter to be chosen which exactly measures the redundancy of the code). The word linear here means that our alphabet here is a body, namely the body with two elements and ( stands for Galois field after the mathematician Évariste Galois), and is a linear subspace of the vector space . Such a linear code can in principle be represented in two ways: 1.) via a so-called generator matrix ; or 2.) via a control matrix . A generator matrix is simply a matrix describing the above coding mapping in coordinates. A control matrix of , on the other hand, is a matrix whose kernel is exactly the code , i.e., to check whether a block belongs to the code , one simply computes the lying vector (we always use covariant notation in the following) and checks all places whether they are equal to zero, and is thus the zero vector. Hamming codes are most easily described by their control matrix. Namely, choose such that its rows are exactly all vectors of the -dimensional vector space except the zero vector (namely, that is exactly many). Thus is a matrix, as desired. The -Hamming code thus has exactly bits of redundancy, because one must perform exactly parity tests. So the actual encoded information has a length of bits (which is much larger than the redundancy for large ). Let us now consider how many errors a Hamming code can correct. To do this, we assume that are distinct codewords that have minimal Hamming distance from each other. By shifting (since this leaves the Hamming distance the same, we are allowed to do so), we can assume that is the zero vector (which, since is a linear code, is, after all, always contained in ). There are certainly three vectors that are linearly dependent, i.e. . If are the associated indices such that are the -th, -th, and -th columns of the control matrix, then the vector with if and surely satisfies the above control condition , because is then exactly , which was zero by assumption. This shows that the minimum distance (or, as this is also called for linear codes: the minimum weight) can be at most three. But if the vector has fewer than three ones as entries, then it cannot satisfy the control condition, because any two columns of the control matrix do not combine to zero (they are linearly independent). Thus . From this we see that a Hamming code can only ever correct a single error. This may seem a bit unsatisfactory at first. However, the big advantage that Hamming codes have is that they can be implemented very elegantly. For that, feel free to check out this video by YouTuber 3Blue1Brown.
But now to the exhibit “Hamming code”: here is shown the Hamming code for which and therefore and ; so the so-called –Hamming code. Therefore there are lamps and buttons. You can try here once yourself, which constellations of burning lamps can be set and which not. What is striking is that — if at least one lamp is burning — then always at least three lamps are burning. This exactly reflects the fact explained above that the minimum weight of any Hamming code is exactly . Also, the exhibit is a good way to practice what a control matrix and a generator matrix are. For example, the generator matrix can be easily set up by writing the base constellations (i.e., when only exactly one button is pressed) as row vectors in a matrix. Do you also succeed in finding three control conditions to check whether a constellation of burning lights is allowed? This seems to be much more difficult at first. If you would write your found three conditions into a matrix, you would get the searched control matrix of the dimension .
This exhibit also represents a linear code. You see nine lights. If you could turn each of these lights on and off, there would be possibilities. If you code the state that a lamp is on with and the state that it is not on with , then each of these possibilities corresponds to exactly one element of the -dimensional vector space over the body with the two elements and . Now you can’t control each lamp individually. For this, only the lamps of one row or column of the lamps arranged in a square of grid points can be switched all at once. So there can be at most possible constellations (for each column and row a factor ). But it turns out that every possible constellation can be set in exactly two ways: Namely, if you press all six buttons (i.e., those of the three rows and the three columns), you get the same constellation as before. The space of possible constellations now forms a linear code of length (because there are nine lamps) and dimension (because there are, as just explained, possibilities). This code is generated as a vector space from the basic constellations, i.e. from those constellations, where exactly the lamps of a row or column are burning (one of them being superfluous). Here again the concept of the generator matrix can be explained well. The rows of this matrix are exactly the basic constellations from which all others can be combined:
We don’t need the last column here as a base constellation, because it can be composed of the others.
But what does a geiegnete control matrix look like for the -dimensional code here? We are looking for linear constants which ensure that we are dealing with an admissible constellation. To do this, make the following observation: select one of the four corners of the square. Now consider those four lamps which are closest to this corner (they form a square). It can be observed that for all constellations you can set, always an even number of these four lamps is burning. But can this be proved? Yes! even very simple. For the basic constellations (i.e., when only one row or column of lamps is burning), either zero or two of these four lamps (i.e., an even number) are always burning. If we now combine basic constellations, this property must be preserved, since pressing a button always switches either zero or two of the four lamps. So we get in this way four control conditions (one for each corner), which must be fulfilled, so that a constellation of burning lamps can really be set. This is exactly the right number (see above). But are these four conditions also independent of each other? The answer is again: Yes! To see this, we write the above four conditions back into a matrix :
The four columns of this (control) matrix are now linearly independent, because the first, third, seventh, and ninth rows form a permutation matrix (which is regular). This is due to the fact that each of the four corner lamps occurs in exactly one of the four control conditions.
One last task: Can you also find five lamps that uniquely determine the state of all other lamps (this is called interpolation)?