|
||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
|
|
![]() DSP Main | Archives | Feedback by Charles Constantine Gumas The basic Turbo Code structure utilizes common ECC building blocks, many of which have been known to practitioners for 30 years or more. At first glance, Turbo Codes appear to be a mere rearrangement of these building blocks, and it could be argued that their discovery was long overdue. On the other hand, a straightforward rearrangement of the traditional building blocks leads to mediocre performance at best. To achieve the dramatic improvements that Turbo Codes offer, several subtle and ingenious advances were needed, and these have only recently been discovered (Ref 2). This article looks at the basic features and common building blocks in ECC systems and shows how they apply to Turbo Codes. Later installments will delve more deeply into how and why Turbo Codes achieve superior performance and then discuss typical applications.
Fig 1 -- To understand the terms that make up a Turbo Code, examine its overall structure (a) as well as the details of a Turbo Code decoder (b). Let's begin by recalling from the previous article that Turbo Codes are systematic, linear block codes that (often) employ recursive, systematic convolutional encoding with two or more parallel-concatenated constituent encoders (Fig 1a). Turbo Code decoders incorporate two or more soft-input/soft-output decoders with a pseudorandom interleaver and iterative decoding (Fig 1b). Many of these terms might be unfamiliar, so let's take a look at some of them along with a few other concepts that are crucial to understanding ECCs in general and Turbo Codes in particular. Fundamental Concepts Code rate: Perhaps the most basic attribute of a channel code is its code rate. An (n,k) encoder accepts k input bits, codes them and generates n output bits. The code's rate is k/n, and because its purpose is to add enough redundancy to reduce transmission errors, the code rate is always < 1. Common rates are 1/4, 1/2, 2/3 and 3/4. And while engineers generally report code rates as an integer ratio such as those just given, the actual rates are usually slightly less due to additional coding overhead. Turbo Codes exist at all these rates as well as for almost any desired value less than unity. Linear block code: Many error-correcting schemes break up an input datastream into blocks (usually called data frames) and encode each one independently. A simple matrix operation y = xG expresses the action of linear block codes, where a binary input vector of information bits x multiplies a binary generation matrix G to produce a binary output vector y. You perform the matrix operations using modulo 2 arithmetic, where the dimensions of x, G and y are 1xk, kxn and 1xn, respectively. Typical values for k (input bits) and n (block size) range anywhere from 8 to 256. Hence, a Rate 1/2 code might consist of k = 128 and n = 256. Notice that the code transforms an input block of k bits into a block of n output bits through a linear operation (matrix multiplication) -- hence the name linear block codes. When x is an impulse vector consisting of a single unit value at the ith location {0,0,0,...0,1,0,...0,0,0}, the output vector y is an impulse response that simply equals the ith row of G. Further, as a consequence of linearity, when an input vector contains more than a single 1 bit, the output is simply the superposition of the respective rows of G. Many varieties of block codes exist, and among the most well known are Hamming, Golay, Bose-Chaudhuri-Hocquenghem (BCH) and Reed-Solomon. Turbo Codes are a new form of linear block code, but you should view them as more of a hybrid because they rely heavily upon convolutional codes, as described below. Systematic codes: Systematic codes pass an input vector x directly through to the output. In addition, they append parity-check bits that the decoder uses for error detection and correction. For systematic block codes, the output vector y consists of the k input bits from x followed by (n-k) parity-check bits, so y = {x p} where p is a 1 x (n-k) row vector of parity bits. With nonsystematic codes (NSC), all the output bits in y are a more complicated function of the input bits. Turbo Codes are systematic. Convolutional codes: Instead of dealing with a block of input bits, a Rate k/n convolutional encoder immediately generates n output bits for every k input bits that arrive. That is, both the coding and decoding can proceed without waiting for an entire block of data to be ready. Hence, convolutional codes lead to lower data-processing delays than block codes.
Fig 2 -- Convolutional encoders come in three similar but subtly different forms: (a) systematic, (b) nonsystematic (NSC) and (c) recursive systematic (RSC). Convolutional encoders come in three varieties: systematic codes, nonsystematic codes (NSC) and recursive systematic codes (RSC). Fig 2 shows examples of each for Rate 1/2 codes. The binary shift registers in the figure hold past values of the input stream, so the current output bits are linear combinations of the past input. Thus engineers describe convolutional codes by the number of memory registers (m) or the constraint length (K = m + 1). Constraint length denotes the maximum number of input values (past and present) that affect the current output. Systematic convolutional coders (Fig 2a) pass the input data directly through to the output and also generate a parity-bit stream from a linear combination of past bits. NSC coders (Fig 2b) generate two streams of linear combinations of data. RSC coders (Fig 2c) combine attributes of the other two: systematic output as in Fig 2a and two linear combinations as in Fig 2b. However, RSC coders feed back one of the two combinations and add it to the input stream. Each type of coder has its strengths and weaknesses, but NSC coders are generally superior to nonrecursive systematic codes. RSC codes typically equal NSC codes, but Turbo Codes perform best with recursive systematic codes (RSC) of the type in Fig 2c (Ref 2). All the coders in Fig 2 use three unit delays, so m = 3 and K = 4. It's common to use either binary or octal notation to describe the generator or register connections. In other words, the taps for Fig 2a, left to right, are G1 = 11012 = 158 . These taps are visible before and after each unit delay and contribute values for the calculation of yk. For Fig 2b, G1 = 01012 = 058 G2 = 01112 = 078 ; and for Fig 2c, G1 = 11012 = 158 and G2 = 01112 = 078 . Two generators are needed in each one because of the additional feed-forward computation (2b) and feedback computation (2c). The systematic codes of Figs 2a and 2c route the input stream directly to the output, and they also include a separate parity bit stream. Engineers usually multiplex the bits from each of the two streams, sending them in an alternating fashion, one at a time. Although convolutional codes are not block codes by nature, you can operate them as if they were. It's common to do so by flushing the encoder with zeros (instead of input data) to clear its registers and force the encoder into a known state (the all zeros state). With knowledge of this state and the block length, the decoder can identify the block boundaries, which serves as a convenient way to take advantage of the benefits of convolutional codes while still using data blocks. Turbo Codes do the same but for an even more important reason: so they can iterate decoder results for each block. (More on iterative decoding later.) Puncturing: In some cases you want to increase a code's rate without changing any of its basic attributes. Puncturing a code accomplishes this goal by simply dropping certain output values instead of transmitting them. For example, you can increase the Rate 1/2 code of Fig 2c to a Rate 2/3 code by puncturing its output -- dropping every other output bit of the parity stream. Turbo Codes often use puncturing, and Table 1 shows some examples of how puncturing affects their code rate. The table assumes Turbo Codes with two constituent encoders as in Fig 2, and that puncturing drops half the parity bits each encoder generates.
Table 1 -- Puncturing a Turbo Code increases its rate without changing any of its basic attributes. Interleavers: Interleavers permute one data sequence into another according to some predefined scheme. Rectangular interleavers are the simplest type and work on a fixed-size data block. An input stream fills an M x N rectangular array of registers, row by row. The shuffled output stream empties the array column by column. Of course, instead of emptying the array column by column, some other predefined pattern might be preferable, such as along diagonals. For Turbo Codes, pseudorandom interleavers are preferable because they usually result in the best error-correcting performance, which also improves with interleaver size. (Interleaver sizes of 16k bits or more are commonly reported.) Pseudorandom interleavers use a fixed but random-like pattern for shuffling the data. They're sometimes called uniform interleavers because, unlike rectangular interleavers, they spread the input data uniformly throughout the output. Hamming distance: Any two m-bit sequences have a Hamming distance between them that equals the number of bits (or symbols) that differ. For example, the Hamming distance between 0110100 and 0101010 is 4; obtain this result by X-ORing the two sequences (0011110) and counting the number of 1 bits. A major focus of ECC research has been to find codes that have a large minimum distance such that no two possible encoder outputs (codewords) are ever closer than that value. The larger the minimum distance, the easier it is to distinguish between any pair of codewords, and the better the code is likely to perform. However, it's now well known that a code's minimum-distance properties are critical only at a large S/N ratio. Turbo Codes have refocused attention on low S/N ratio performance, which depends much more on the distribution of codes than on the minimum free distance itself. A later installment of this series will discuss code distributions in more detail. Decoding algorithm: One nice attribute of convolutional codes is that you can decode them in a variety of ways. The most famous decoding method is the Viterbi algorithm (VA), which is a form of maximum likelihood sequence estimation (MLSE, Ref 3). That mouthful simply means that after a system receives an entire sequence of bits, MLSE methods select the sequence that was most likely sent based on some criteria. Using VA, the number of computations required to decode a memory m convolutional code (constraint length K = m+1) is >> m2 m , which is the most efficient means known for MLSE. Maximum a posteriori decoding (MAP) is an alternative that can reduce the BER (bit error rate) at a given S/N ratio even more than can MLSE. You might ask, "But how can MAP perform better than a technique that already selects the best sequence?" The difference is that MAP decoding makes decisions on a bit-by-bit basis rather than a sequence-by-sequence basis. When errors exist in the received sequence, the two techniques don't always yield the same results. MAP maximizes the probability that, given the values of the (possibly erroneous) bits just received, the current bit is decoded correctly. The trouble is that even the most efficient MAP decoding algorithms require roughly twice the computations of the VA. So even though MAP can produce better results, the VA is much more widely implemented. Turbo Codes can use either MLSE or MAP-based decoding algorithms (Ref 4). Soft-input/soft-output decoders: To take full advantage of Turbo Codes, researchers have devised special versions of the traditional MLSE and MAP algorithms termed soft input/ soft output. Until now, we've tacitly assumed that the input to the channel decoder is a binary vector generated by a demodulator (see Fig 1a in Ref 1). With binary modulation, such as BPSK, the demodulator decides whether the signal phase corresponds to a +1 (bit value 1) or a -1 (bit value 0). If the demodulator makes such a binary decision, it makes what engineers call a hard decision because it provides no additional information to the decoder. In contrast, soft-decision demodulators quantize their output to 2m levels, where m is usually near 3 or 4. For m-bit quantization, one quantization bit is devoted to the sign of the decision and m-1 bits are devoted to the signal's magnitude. The larger the magnitude, the more confidence that the sign bit is correct. Decoders that exploit soft decisions can reduce S/N ratio requirements by approximately 2 dB over those that use hard decisions alone. But there's little incentive for more than three bits of quantization because < 0.5 dB of additional gain is left to be achieved. Traditional MLSE and MAP decoder algorithms can accept either hard or soft decisions, but they almost always output hard decisions. Turbo Code applications use modified soft-input/soft-output versions of these algorithms that instead output soft decisions. Their key purpose is to allow them to feed the soft-decision output from one decoder into a second similar decoder. This capability opens up the possibility of iterative decoding, which feeds the output of the last decoder back to the first. This important innovation leads to much of the performance gains possible with Turbo Codes. With iterative decoding, systems can employ simpler constituent codes (such as K = 5) rather than extremely complex ones (an example being the Galileo code described in Part 1, with K = 15) to achieve near Shannon limit performance. Computational complexity: In light of the above, it's useful to understand the computational complexity of a Turbo Code decoder. Let's compare the complexity of one with the form in Fig 1b to a constraint length K, conventional decoder that uses the Viterbi algorithm. Because a Turbo Code decoder consists of two constituent decoders, it's approximately twice as complex as its near-equivalent conventional decoder. But this case holds when the Turbo Code decoder performs only one iteration. If it performs two iterations, then the Turbo Code decoder performs twice as many computations and is four times as complex as its conventional counterpart. Furthermore, if the Turbo Code decoder employs a MAP algorithm, it's another factor of two more complex than the conventional MLSE (Viterbi) decoder, in other words a total of eight times more complex.
Table 2 -- Increasing the number of iterations in a Turbo Code decoder raises its complexity, which this table compares to two other implementations. Table 2 summarizes these relationships. Its values are based on the fact that, for each increase in constraint length (K), a MLSE decoder doubles in complexity. Hence, a Turbo Code decoder that uses two MLSE constituents, say each with K = 5, is roughly equivalent to a single K = 6 MLSE decoder. If a Turbo Code decoder instead employs a MAP algorithm instead of MLSE, then it's nearly equivalent to a K = 7 MLSE. This comparison assumes a single decoder iteration, but for each doubling in iterations the equivalent value of K increases by 1, as well. For example, after 18 iterations the ground-breaking MAP-based K = 5 Turbo Code decoder described in Ref 5 is roughly equivalent in complexity to an MLSE decoder with constraint length of 11 = K + 6. But in terms of BER, this Turbo Code handily outperforms the much more complex K = 15 Galileo code! (Ref 1). Thus the Turbo Code is more than 16 = 2(15-11) times more efficient than the Galileo code. Of course, the Turbo Code decoder suffers from a number of complexities not found in MLSE decoders, so this value is probably overstated. Even so, the Turbo Code decoder is substantially more efficient and exhibits better performance. Concatenated and product codes: The idea of feeding the output of one encoder into another is hardly new. Such coding combinations are called concatenated codes, and serial concatenated codes have long been the favorite for many communications links. The typical configuration uses a Reed-Solomon (RS) M-bit block code (the outer code) followed by a binary convolutional code (the inner code). Fig 3 shows the typical configuration, which illustrates where the terms inner and outer arise from. The optional interleaver breaks up burst errors that often occur on fading channels. The convolutional decoder accepts soft input and provides hard decisions to the RS decoder, which outputs the final hard decisions. The scheme does not iterate any results.
Fig 3 -- A conventional serial-concatenated coding scheme uses a Reed-Solomon outer code, a convolutional inter code and an optional interleaver. The concatenation improves performance with simpler codes than you could achieve with a single more complex code. When an error gets by the convolutional decoder, it's often accompanied by several additional errors (approximately K of them all together). Every M output bits from the convolutional decoder form a single symbol that feeds into the RS decoder. It then decodes a block of symbols (say 255) at a time, and it can correct as many as N symbol errors among them. You select N based on the expected channel properties. For example, if a burst of L errors gets through the VA decoder, the RS decoder need correct only L/M erroneous symbols (assuming no other errors occur in the block). By estimating the maximum value of L, you can specify M and N and achieve a reliable system. Turbo Codes operate on a similar concept, but they're parallel concatenated codes because the two decoders (as in Fig 1b) each accept output directly from the channel. Turbo Codes are also a type of product code. Product codes, discovered more than 40 yrs ago (Ref 6), use two systematic codes to process a single data set. Fig 4 shows the traditional arrangement with reordering based on a rectangular scheme. The first encoder (n1 , k1 ) operates on the rows of the information bits and generates parity bits for each row. The algorithm then reorders the information bits and inputs the entire set of information bits plus parity bits to a second encoder (n2 , k2 ). Traditionally, the reordering is just a column-by-column rearrangement of the row-by-row stored information plus parity bits. The net result is an (n1 n2 , k1 k2 ) code, hence the name.
Fig 4 -- Turbo Codes are similar to product codes. Product codes encode rows of information bits by Code #1 (n1 , k1 ) and encode columns of bits by Code #2 (n2 , k2 ). An (n1 n2 , k1 k2 ) code results. This scheme closely resembles the operation of Turbo Codes. The difference is that Turbo Codes usually employ pseudorandom interleavers to rearrange the information bits, and they usually omit the checks on checks that appear in Fig 4. Also, product codes don't traditionally employ iterative decoding. Acknowledgement The author would like to thank to Scott Francis of Coleman Research (Chantilly, VA) for his review of this column and many helpful suggestions. References 1. Gumas, CC, "Turbo codes rev up error-correcting performance," Personal Engineering & Instrumentation
News, Jan 1998, pgs 61-66.
Charles (Chuck) Gumas serves as Project Lead/Communications Engineer at the MITRE Corp's Information Systems and Technology Div (Reston, VA). He received an MSEE from Lehigh Univ and a BS in physics from the Univ of Pennsylvania. He maintains a web site at www.dc.net/spartan/chuck.html and can be contacted via email at cgumas@mitre.org. This article originally appeared in Personal Engineering & Instrumentation News, February 1998, pgs 54-62. Reprinted with permission of PEC Inc; all rights reserved.
|
|||||||||||||||||||||||||||||||||
|
Copyright © 2003 ChipCenter-QuestLink About ChipCenter-Questlink |
||||||||||||||||||||||||||||||||||