COEN 180
Error Correcting Codes
The purpose of this module is to introduce the student to error control codes. This is not an overview of the rich mathematical theory of block and burst codes.

Transmission Channel

Error Correction Codes are used in a variety of settings. In general, their application can be schematized to the information transmission scenario in Figure 1. The sender sends a cleartext message, that is encoded by the encoder using an Error Correcting Code (ECC). The encoded message is placed into a transmission channel. During the transmission, some of the bits in the message are erroneously transmitted (flipped). The message is received at the other end of the channel and decoded back into the original cleartext message (successful error correction) or a corrupted text message, on which the receiver then acts. Error correction is commonly used on many data links including those linking storage devices, but also for storage itself. The channel is then the storage medium and data is not send through space but through time.

The physical characteristics of the channel determine the character of the errors. In principle, a noisy channel could add spurious bits or swallow client bits, but that behaviour is very unusual. The most common error source is misinterpretation of a received bit. In effect, the bit is flipped from a 0 to a 1 or vice versa. A common model for this behaviour is Gaussian or white noise. Every bit transmitted (or stored) is subject to an error with a fixed probability p < 1/2. If the error rate is above 1/2, then we just invert every bit received and obtain a channel with error rate < 1/2. If bits are flipped with error rate p = 1/2, then the channel is unable to conduct any information. While errors are always somewhat related (there is a greater chance for an error to occur if an error has occurred nearby) this simple model is typically adequate. However, in some applications, bursts, that is, sequences of bits that are in error with a much higher rate need to be modeled and error correcting codes need to be designed for them. Examples for bursts would be electro-magnetic field interference within a transmission channel such as cross-talk between components in a disk head or media surface errors on a hard drive. Typically, the error probability are very low, but the requirements on the probability of error in the decoded message is very high.

Figure 1: Transmission Channel.

Error Correcting Codes

In an algebraic block code, we typically encode a message by taking n message symbols (typically the symbols are bits) and encoding them in m (m > n) code bits. Some encodings are systematic, that is, we add m-n parity bits to the n information bits. Systematic codes have a great advantage, the data is easily extractable. We can use the parity bits per encoded block to decide whether there is a problem. If there is none, then we are done with decoding. If there is, we use the parity bits and the information bits to (hopefully) correct the block.

The quality of an error correcting code is measured in the capacity to correct errors and in the overhead necessitated by the encoding. In more detail, we have:

Parity: Simple Error Detection

A simple error detection code is the parity code. To each block of 7 (for example) message bits, we add an additional parity bit. The parity bit is set so that the parity in the code word is always e.g. even. For example, if the message is 0101 000, the encoded message is 0101 0000. The parity code detects a single, triple, etc. error in the word, but fails on two , four, etc. errors. There is no error correction capability.

Hamming Distance

Block message and code words form a vector space each. The space of message words consists of all bit (or more generally, symbol) strings of length n and the space of all code words consists of all bit strings of length m. The Hamming distance between two points is defined to be the number of positions in which the bit strings differ.

Maximum Likelihood Decoding

Some codes need to correct as many errors as possible, since there is no possibility for retransmission (e.g. if the data is stored and then retrieved or if the transmission comes from a Mars craft). The standard decoding technique is called maximum likelihood decoding. In maximum likelihood decoding, the received word is compared to all the possible code words. The one code word that is closest (ties are being broken randomly) is then the error corrected code word. The name describes the most important property of this decoding. If an error is detected, then we identify the most probable error and undo this error in order to correct the mistake. Sometimes of course we will make a mistake in guessing the error.

Figure 2: Maximum Likelihood Decoding.

As an example for maximum likelihood encoding we assume the following pairing between data words and code words:

00      00100
01      01110
10      10001
11      11000
Assume now that we receive the word 11111. By going through all code words in the list, we find that none matches. Hence, a transmission error has occurred. The Hamming distances to the code words are d(11111,00100) = 4, d(11111,01110) =2, d(11111,10001) = 3, and d(11111,11000) = 3. We therefore conclude that the first and last bit are in error and that we should have received 01110, which corresponds to message 01.

As you can see from this example, maximum likelihood encoding for larger message words is difficult to implement and time consuming to execute, since we need to compare the received, possible erroneous, code word with all possible code words. Also, to maximise maximum likelihood decoding's capacity to find errors, we need to find code words that are evenly spaced in the space of all possible code words.

Linear Codes

A large class of codes that can be more easily decoded are linear codes. Linear codes uses the normal arithmetic among the binary field {0,1}. The addition in the field is the normal XOR operation, the multiplication is the AND operation. Message words and code words are (row) vectors of length n and m with coefficients in this field, so that we can use the normal vector operations such as scalar multiplication and the vector addition. A code is linear if the sum of any two code words is again a code word. It follows from Linear Algebra that there exists a matrix P (the parity matrix) such that a vector x is a code word if and only if x · P = 0. The parity code is given by P = t(1,1,1,1,1,1,1,1), the transpose of the unit vector. A word (a,b,c,d,e,f,g,h) is in the code if and only if a+b+c+d+e+f+g+h = 0, that is, if there is an even number of ones among the eight variables. Thus, the code defined by this matrix is the parity code.

Returning to the general case, assume that we send a code word c. During transmission, and error e occurs and the word c+e is received. We check the received word for error freeness by multiplying the received word with the parity check matrix, i.e. we calculate (c+eH. Since (c+eH = c·H + e·H = 0 + e·H, we recognize that the received word is in error if and only if the error vector is not a codeword. Using maximum likelihood decoding, we can find a list of all words with minimum distance, which helps us localize the most probable error word e more quickly than in the general case.

Hamming Codes

Hamming codes are systematic, linear codes. That is, the code words consist of m bits message plus r parity bits to make up the m + r long code words. Figure 3 shows how to generate the parity check matrix for a Hamming code with 4 bits data and 3 parity bits. The rows of the matrix consists of all the binary representations of the numbers 1 - 7. If we do not care about the matrix being systematic, then we can just use the natural ordering, but otherwise we can arrange the rows so that the upper three by three matrix is the identity matrix.

Figure 3: Hamming Code.

A code word consists of four information bits and three parity bits. If the information bits are a, b, c, and d, and if the parity bits are e, f, and g, then the equation (a,b,c,d,e,f,g) · H = 0needs to hold. This translates to

As we can see, the parity bits are calculated from the information bits a, b, c, and d. As it turns out, Hamming codes are not only easy to encode, but they are easy to decode. Assume we send the data (0101). We encode this as (0101010). As we send this, we can make an error. Hamming codes can correct one bit errors (and detect any two bit errors), so lets make an arbitrary error (0001000). The received word is then (0100010). We notice this error by calculating (0100010)· H = (111). This vector can be found as the fourth row in H and this means that the error is made in position 4. Thus, the corrected word is (0101010), which decodes to the message (0101).

© 2004 Thomas Schwarz, S.J., COEN, SCU                                                                        SCU            COEN            T. Schwarz            COEN 180