COEN
180


Error Correcting Codes 
The purpose of this module is to introduce the student to error control codes. This is not an overview of the rich mathematical theory of block and burst codes.
Error Correction Codes are used in a variety of settings. In general, their application can be schematized to the information transmission scenario in Figure 1. The sender sends a cleartext message, that is encoded by the encoder using an Error Correcting Code (ECC). The encoded message is placed into a transmission channel. During the transmission, some of the bits in the message are erroneously transmitted (flipped). The message is received at the other end of the channel and decoded back into the original cleartext message (successful error correction) or a corrupted text message, on which the receiver then acts. Error correction is commonly used on many data links including those linking storage devices, but also for storage itself. The channel is then the storage medium and data is not send through space but through time.
The physical characteristics of the channel determine the character of the errors. In principle, a noisy channel could add spurious bits or swallow client bits, but that behaviour is very unusual. The most common error source is misinterpretation of a received bit. In effect, the bit is flipped from a 0 to a 1 or vice versa. A common model for this behaviour is Gaussian or white noise. Every bit transmitted (or stored) is subject to an error with a fixed probability p < 1/2. If the error rate is above 1/2, then we just invert every bit received and obtain a channel with error rate < 1/2. If bits are flipped with error rate p = 1/2, then the channel is unable to conduct any information. While errors are always somewhat related (there is a greater chance for an error to occur if an error has occurred nearby) this simple model is typically adequate. However, in some applications, bursts, that is, sequences of bits that are in error with a much higher rate need to be modeled and error correcting codes need to be designed for them. Examples for bursts would be electromagnetic field interference within a transmission channel such as crosstalk between components in a disk head or media surface errors on a hard drive. Typically, the error probability are very low, but the requirements on the probability of error in the decoded message is very high.
Figure 1: Transmission Channel.
The quality of an error correcting code is measured in the capacity to correct errors and in the overhead necessitated by the encoding. In more detail, we have:
A simple error detection code is the parity code. To each block of 7 (for example) message bits, we add an additional parity bit. The parity bit is set so that the parity in the code word is always e.g. even. For example, if the message is 0101 000, the encoded message is 0101 0000. The parity code detects a single, triple, etc. error in the word, but fails on two , four, etc. errors. There is no error correction capability.
Figure 2: Maximum Likelihood Decoding.
As an example for maximum likelihood encoding we assume the following pairing between data words and code words:
00  00100 
01  01110 
10  10001 
11  11000 
As you can see from this example, maximum likelihood encoding for larger message words is difficult to implement and time consuming to execute, since we need to compare the received, possible erroneous, code word with all possible code words. Also, to maximise maximum likelihood decoding's capacity to find errors, we need to find code words that are evenly spaced in the space of all possible code words.
Returning to the general case, assume that we send a code word c. During transmission, and error e occurs and the word c+e is received. We check the received word for error freeness by multiplying the received word with the parity check matrix, i.e. we calculate (c+e)·H. Since (c+e)·H = c·H + e·H = 0 + e·H, we recognize that the received word is in error if and only if the error vector is not a codeword. Using maximum likelihood decoding, we can find a list of all words with minimum distance, which helps us localize the most probable error word e more quickly than in the general case.
Hamming codes are systematic, linear codes. That is, the code words consist of m bits message plus r parity bits to make up the m + r long code words. Figure 3 shows how to generate the parity check matrix for a Hamming code with 4 bits data and 3 parity bits. The rows of the matrix consists of all the binary representations of the numbers 1  7. If we do not care about the matrix being systematic, then we can just use the natural ordering, but otherwise we can arrange the rows so that the upper three by three matrix is the identity matrix.
Figure 3: Hamming Code.
As we can see, the parity bits are calculated from the information bits a, b, c, and d. As it turns out, Hamming codes are not only easy to encode, but they are easy to decode. Assume we send the data (0101). We encode this as (0101010). As we send this, we can make an error. Hamming codes can correct one bit errors (and detect any two bit errors), so lets make an arbitrary error (0001000). The received word is then (0100010). We notice this error by calculating (0100010)· H = (111). This vector can be found as the fourth row in H and this means that the error is made in position 4. Thus, the corrected word is (0101010), which decodes to the message (0101).
© 2004 Thomas Schwarz, S.J., COEN, SCU SCU COEN T. Schwarz COEN 180 