ChipCenter Questlink
SEARCH CHIPCENTER
Search Type:
Search for:




Knowledge Centers
Product Reviews
Data Sheets
Guides & Experts
News
International
Ask Us
Circuit Cellar Online
App Notes
NetSeminars
Careers
Resources
FAQ
EE Times Network
Electronics Group Sites


DSP Main | Archives | Feedback

Advanced digital audio demands larger word widths in data converters and DSPs (Part 1 of 3)

by John Tomarakos, Analog Devices Inc

Since the introduction of the compact disc in the early 1980s, digital technology has become the standard for the recording and storage of high-fidelity audio. It's not difficult to see why. Digital signals are robust. We can transmit and copy them without distortion. We can play them back without degrading the carrier. Who'd want to go back to scraping a needle along a vinyl groove?

Another advantage of digital audio signals is the ease with which we can manipulate them. Signal-processing technology has advanced to such an extent that almost any audio product, from a mobile phone to a professional mixing console, contains a DSP chip. Here, too, the reasons for the success of DSP are simple: stability, reliability, enhanced performance and programmability. Engineers can implement signal-processing functions for a fraction of the cost, and in a fraction of the space, required by analog circuitry--and gain functionality that simply isn't possible in analog. In fact, so ubiquitous has digital technology now become that, for many people, the word "digital" has become synonymous with "high quality."

The ever-increasing performance and falling cost of DSP hardware have generated new applications and markets for digital audio in both the consumer and professional sectors. Digital Versatile Disk (DVD) and digital SurroundSound in the home along with digital radio and hands-free cell phones in the car are just a few of the DSP-based technologies that have appeared in the last few years. The demands on the quality, speed and flexibility of DSPs have also increased as designers add more functionality. A product might now require mixing, equalization, dynamic-range compression and data decompression, all implemented on signal-processing hardware.

Today, 16-bit, 44.1-kHz PCM digital audio continues to serve as the standard for high-quality audio in most current applications such as CD, DAT and high-end PC audio. Recent technological developments and improved knowledge of human hearing, however, have created a demand for greater data-word lengths. A/D converters now support 18, 20, and 24 bits and can thereby exceed the 96-dB dynamic range available using 16-bit words. Many recording studios now routinely master their recordings using 20- or 24-bit recorders. These technological developments are beginning to make their way into the consumer and "prosumer" audio applications. The most obvious impact in the consumer space is DVD, which can carry audio with resolution as high as 24 bits at sample rates well above 48 kHz. Another example is a 16-channel digital home-studio recorder sampling at 96 kHz with 24-bit resolution.

Given available technology and market demands, three trends are influencing the digital formats that are set to replace CD-grade audio:

  • Higher resolution -- either 20 or 24 bits per dataword
  • Higher sampling frequencies -- typically 96 kHz and sometimes 192 kHz
  • Additional audio channels for a more realistic 3D experience

A related enabling technology arises from the emergence of low-cost, higher-performance DSPs that satisfy the high-dynamic range requirements for processing or synthesizing audio signals.

Key questions to ask

But here's a good question: How much resolution do today's systems need to process audio signals adequately -- 16 bits? 20? 24? or even 32 bits? A system designer must investigate other related aspects. Does the audio application require fixed- or floating-point arithmetic? What undesirable side effects of quantization should you watch out for?

To help you answer those questions, this series of articles starts by briefly reviewing desirable characteristics of a DSP for use in audio applications. It then discusses differences in data formats for fixed- and floating-point processors. Next it examines the relationship of dynamic range to dataword size in processing audio signals. This knowledge aids in determining how many bits you need to design a system, whether it's a lower cost, low-fidelity consumer device or high-performance, high-fidelity professional audio gear. To design a system with either CD- or professional-quality audio, digital filters are becoming indispensable. However, for a digital-filter routine to operate transparently, the resolution of the processing system must be considerably greater than that of the input signal. For the highest-quality, professional audio systems, this author offers a 32-bit DSP as a suggested solution.

Benefits of DSP when processing audio

Before you can start thinking about word sizes, it's important to get a handle on just what a signal processor can do in the audio chain. A DSP has one purpose: to operate on quantized signal data as quickly and efficiently as possible. Compared to a typical CPU or microcontroller, a well-architected DSP usually contains the following desirable characteristics that help it perform realtime computations on audio signals:

Fast and flexible arithmetic -- A DSP implements single-cycle computation for certain operations including: multiplication with accumulation; arbitrary amounts of shifting; standard arithmetic operations; and logical operations.

Extended dynamic range to keep accuracy of extended sum-of product calculations -- Multiply-accumulate units support the extended sums-of-products common in signal-processing algorithms. Extended precision in the multiplier's accumulator provides extra bits for protection against overflow in successive additions to ensure that no loss of data or range occurs.

Single-cycle fetch of two operands for sum-of-products calculations -- An extended sums-of-products needs two operations on each cycle to feed the calculation. The DSP should be able to sustain 2-operand data throughput, whether the data are stored on or off the chip.

Hardware circular buffer for efficient storage and retrieval of samples -- A large class of DSP algorithms, including digital filters, requires circular data buffers, which generally is a programmer-defined segment of DSP memory that stores samples for processing. Hardware implementations allow automatic address-pointer wraparounds back to the beginning of the buffer, which reduces overhead and improves performance. With the buffering in hardware, the DSP programmer needn't be concerned with the overhead of testing and resetting the address pointer to ensure it doesn't go beyond the buffer boundary.

Efficient looping and branching for repetitive operations -- DSP algorithms are repetitive, so you most logically express them as loops. For instance, digital-filter routines typically execute a running sum of MAC operations in efficient loop structures. A DSP's program sequencer, or control unit, should allow looping of code with minimal or zero overhead. Any loop branching, decrementing and termination tests are part of the control-unit hardware. Also, no overhead penalties should arise from conditional instructions that branch based on a computation unit's status bits.

All of the above architectural features are useful for implementing DSP-type operations. For example, as you can see in the following equation

convolution involves the multiplication of two sets of discrete data (an input multiplied with a shifted version of the impulse response to a system) and keeping a running sum of the outputs. Manufacturers architect their DSPs to perform these types of discrete mathematical operations as quickly as possible, usually within a single instruction cycle.

Examining the equation reveals the elements required for implementation. A DSP can store the filter coefficients and input samples in two memory arrays defined as circular buffers. The algorithm next multiplies both buffers and adds that sum to the results of previous iterations. To perform the operation shown above, the DSP should, in one cycle, perform a multiplication along with an addition to a previous result. Within the same cycle, the architecture should contain enough parallelism in the compute units to enable memory reads of the sample and filter coefficient for the next loop iteration.

Hardware circuitry in the architecture could allow for efficient looping through a number of iterations with zero overhead. When run in a zero-overhead loop, digital filters become extremely optimized because they require no explicit software decrement, test or jump instructions. Thus, to implementation the convolution operation, any device would require two circular buffers, multipliers, adders and a zero-overhead loop construct. A DSP contains these necessary building blocks.

In performing these types of repetitive DSP calculations, however, it's crucial to recognize that quantization errors from truncation and rounding can accumulate and degrade the quality of the algorithmic result. The resolution in a filter's arithmetic computations along with its structure determine how robustly that filter can manipulate signals. The remainder of this article series discusses how many bits are potentially required for a particular audio application, a number you determine by examining the complexity of the processing and the desired quality of the target signal.

Fixed- or floating-point for audio apps?

Depending on the application's complexity, an audio-system designer must decide on the required level of computational accuracy and dynamic range. To do so you must first be familiar with common native DSP datatypes. Begin by noting that 16- and 24-bit fixed-point DSPs are designed to compute integer or fractional arithmetic. 32-bit DSPs typically perform floating-point arithmetic. Although the ADSP-2106x SHARC family was traditionally offered as 32-bit floating-point devices, this popular family can equally support either 32-bit floating-point, 32-bit integer or 32-bit fractional fixed-point arithmetic.

Let's focus for a moment on fixed-point math. DSPs that can perform fixed-point operations typically represent signals with a 2s complement binary notation in either signed or unsigned integer or fractional notation.

Most DSP operations, though, are optimized for signed-fractional notation. This format makes sense because a fractional representation easily corresponds to a ratio of the full range of sampled values of an analog signal. For example, the fractional binary value of a discrete signal can be used to represent an electrical analog signal in the range of +/- 5 Volts, read from a 5V A/D converter (Fig 1). Why use fractional arithmetic for processing signals? It's difficult to overflow a fractional result because multiplying a fraction by another fraction results in a smaller number, which the algorithm then either truncates or rounds off. The highest full-scale positive fractional number would be 0.99999, while the highest full-scale negative number is -1.0. Anything in between the highest representable signal from the converter would be a fractional representation of that "loudest" signal. For example, the DSP interprets the fractional value halfway between 0 Volts (0x0) and full-scale 5 Volts (0x7FFF) to be 0x4000 (which equates to 2.5 Volts).

Fig 1 -- Signed 2s complement representation of sampled signals

Fig 2 -- Fractional and integer formats for an N-bit number

In the fractional format, we assume that the binary point is to the to the left of the LSB (sign bit) whereas in the integer format, the binary point is to the right of the LSB (Fig 2). Fractional math is more intuitive for signal manipulation, and in these articles we'll examine the LSBs in a fractional result because these lower order bits suffer most from quantization errors due to finite word-length effects. As you'll see later, the more bits that represent a given audio signal, the more accurate the arithmetic result.

Now take a look at floating-point math. IEEE-754/854 floating-point data (Fig 3) consists of a 32-bit format where 24 bits represent the mantissa (maintaining precision) and 8 bits represent the exponent (extending dynamic range). The use of the exponent increases the dynamic range of numbers this format can handle to more than 1500 dB. Thus programmers have much more flexibility in writing algorithms because they have far fewer worries about prescaling inputs to prevent an overflow or about intermediate results generating an overflow condition.

Fig 3 -- IEEE 754/854 32-bit single precision floating-point format

Let's take a closer look at this format. You represent a 32-bit floating point number in decimal as:

                n = m x 2e-128

while a 32-bit floating-point DSP stores its binary IEEE-format representation as:

                n = (-1)S x 2e-128(1.b0b1b2...b23)

It's important to know that the IEEE standard always refers to the mantissa in signed-magnitude format and not 2s complement format. The extra hidden bit effectively improves the precision to 24 bits and also ensures any number must fall in the range from 1 (1.0000....00) to 2 (1.1111....11) because we assume the hidden bit is always a 1.

While the IEEE format limits you to 23 bits in the mantissa, another format has emerged to give even more precision. Specifically, Analog Devices has devised an extension to the IEEE format that keeps the exponent at the same size but adds eight bits to the mantissa, thereby creating a 40-bit format (Fig 4)

Fig 4 -- 40-bit extended precision floating-point format

Devices such as the ADSP-2106x family support this format, and they store this binary numeric format representation as:

n = (-1)S x 2e-128(1.b0b1b2...b30)

Floating point's dynamic range might be unnecessary for some audio processing, but the flexibility programmers enjoy in floating-point representations make it desirable especially for high-level languages such as C. Keep in mind that many of the fixed-point precision issues we'll discuss likewise apply to a DSP that supports floating-point arithmetic, at least in terms of dataword truncation and coefficient quantization. A chief reason is because the programmer must still convert fixed-point data coming from an A/D converter to its floating-point representation, and the system must likewise convert the floating-point result back to its fixed-point equivalent when sending the value out through a D/A converter.

Programmers have traditionally used floating-point arithmetic for applications with high dynamic-range requirements such as image processing, graphics and military/space. As you'll soon see, the dynamic range for 32-bit IEEE floating-point arithmetic is 1530 dB. In the past, engineers typically examined price vs performance tradeoffs when deciding whether to employ floating-point processors. Until recently, high cost made 32-bit floating-point DSPs unreasonable for use in audio. Today, with the introduction of lower-cost 32-bit processors such as the ADSP-21065L, designers can achieve high-quality audio using either 32-bit fixed- or floating-point processing at a cost comparable to 16- and 24-bit DSPs.

The relationship of dynamic range to dataword size

A key consideration when designing an audio system is determining acceptable signal quality. Table 1 shows some quality comparisons for some common audio applications, devices and equipment.

Table 1 -- Representative dynamic-range comparisons

Audio Device/Application

Dynamic Range (dB)

AM radio

48

Analog broadcast TV

60

FM radio

70

Analog cassette player

73

Video Camcorder

75

ADI SoundPort codecs

80

16-bit audio converters

90-95

Digital broadcast TV

85

Mini-disk player

90

CD player

92-96

18-bit audio converters

104

Digital audio tape (DAT)

110

20-bit audio converters

110

24-bit audio converters

110-122

Analog microphone

120

Audio-equipment retailers and consumers often use the phrase "CD-quality sound" when referring to high dynamic-range audio, which is immediately apparent if you compare the sound of a CD player to an AM radio. Noise isn't audible in higher quality CD audio, so listeners can hear lower level signals such as quiet musical passages. In contrast, someone listening to an AM radio can hear low-level noise at levels audible enough to make it a distraction. Thus, by increasing an audio signal's dynamic range you can better distinguish low-level audio signals while lowering the noise floor to the point it becomes undetectable to the listener. (Note that "noise floor" is a term audio engineers use to describe the point where a listener can no longer distinguish the audio signal from white noise.)

To achieve CD-type signal quality, the trend in recent years has been to design a system that processes audio signals digitally, using 16-bit data converters with a dynamic range in the range of 90-93 dB. When processing these signals, the programmer should write algorithms with computation precision exceeding the 16-bit level found in CD signals. But not just in this case. Whatever the application, the designer must first determine an acceptable signal/noise ratio (SNR) and decide how much precision is required to produce acceptable results.

SNR and dynamic range for a DSP

For both analog and digital systems, engineers often use SNR (S/N ratio) and dynamic range interchangeably. In pure analog terms, SNR gives the ratio of the largest known signal to the noise present when no signal exists. In the digital realm, engineers often use either term to describe the ratio between the largest representable number and the quantization error (Ref 1). A well-designed digital filter should realize a maximum SNR exceeding that of the system's data converter. Thus, a DSP designer must be certain that the filter's noise floor isn't larger than the minimum precision that the A/D or D/A require (Fig 5).

Fig 5 -- Audio signal-level (dBu) relationship between dynamic range, SNR and headroom

Now examine a summary of the terms in Fig 5 as defined by Davis and Jones (Ref 2). It's important to know exactly what they mean because we'll be referring to them frequently throughout this series.

Decibel -- describes a ratio of sound level (sound pressure level), power or voltage:

dBVolts = 20log(Vo/Vi ),    dBWatts = 10log(Po/Pi),    dBSPL = 20log(Po/Pi)

Dynamic Range -- the difference, measured in dB, between the loudest and quietest representable signal level; or if noise is present, the difference between the loudest (maximum level) signal to the noise floor.

Dynamic RangedB = (Peak Level)dB - (Noise Floor)dB

SNR (Signal/Noise Ratio) -- the difference between the nominal level and the noise floor, measured in dB. Other authors define this value for analog systems as the ratio of the largest representable signal to the noise floor when no signal is present (Ref 3), which more closely parallels SNR for a digital system.

Headroom -- the difference between nominal line level and peak level where signal clipping occurs, measured in dB. The larger the headroom, the better the audio system handles very loud signal peaks before distortion occurs.

Peak Operating Level -- the maximum representable signal level where clipping starts.

Line Level -- nominal operating level (0 dB, or more precisely, between -10 dB and +4 dB)

Noise Floor -- the noise floor for human hearing is the average level of "just audible" white noise. Components in analog audio equipment can generate noise, whereas with a DSP, noise can arise from quantization errors. You might make the assumption that (headroom + SNR) of an analog signal equals the dynamic range, but that assumption isn't entirely accurate because signals can still be audible below the noise floor.

Before leaving this section of definitions, let's review a few that pertain to data conversion. Quantization is the process of approximating a continuous value with a number of finite precision. For example, an A/D conversion represents an infinitely variable voltage with a binary number. The difference between two consecutive binary values is the quantization step or quantization level, and it defines the sampled signal's effective noise floor. The word length for a given processor determines the number of available quantization levels. An n-bit dataword yields 2n quantization levels.

Increasing the number of bits with which you represent a sample results in a better approximation of the audio signal and a reduction in quantization error (noise), which increases the SNR. In theoretical terms, each bit represents an increase in the signal-to-quantization noise or dynamic range of approximately 6 dB.

Fig 6 -- DSP or data-converter SNR and dynamic range

Note that this 6-dB/bit rule is only an approximation for determining dynamic range. To calculate the ratio of the actual maximum representable signal amplitude to the maximum quantization error for an ideal n-bit A/D converter or DSP-based digital system, use the equation

    SNRACD(RMS)(dB) = 6.02n + 1.76 dB

    Dynamic Range (dB) = 6.02n + 1.76 dB, approximately equals to 6n.

The 1.76 dB term is based on a sinusoidal waveform and varies for other waveforms (Ref 4).

When working with this 6-dB/bit rule, keep in mind that it represents a theoretical maximum. In the real world, data converters spec dynamic ranges that are somewhat smaller. One reason is the dithering that most converters incorporate. Dithering is a process by which the data converter introduces additive random noise to the quantized signal so that any noise introduced in the quantization process is then statistically reduced to steady white noise. The scheme is equivalent to multiplying the lower bits by a random number generator to make the errors broadband in nature. Note, also, that dithering isn't applicable to signal processors and their word representations because no noise is present in the absence of a digital signal. That's the reason why, when working in the digital realm, engineers generally use dynamic range and SNR synonymously to describe the ratio of the largest representable signal to the quantization error or noise floor.

With this background, the next installment will start to address the key question of this series of articles: given today's advanced technologies and advanced consumer tastes, how many bits are really necessary to design a high-quality audio system?

Author's biography

John Tomarakos is a senior DSP applications engineer at Analog Devices (Norwood, MA). He received a BSEE from the Univ of Pittsburgh and is a member of the Audio Engineering Society and the IEEE Signal Processing Society. For the past five years, John helped support 16- and 32-bit DSPs and AD18xx computer audio codecs. Currently his primary focus is lead applications support for low-cost SISD and SIMD SHARC 32-bit architectures. He developed reference ADC, DAC, AC'97 codec, and I2S interface drivers for the SHARC, and he's also implemented various audio-effect reference examples for the ADSP-21065L. His audio-related application notes and DSP reference code are available on www.analog.com. References

Note: an extensive set of references going beyond those specifically called out in the text will appear at the end of the final installment.

1. Wilson, R, "Filter Topologies," Jour Audio Eng Soc, Sept 1993, Vol 41, No 9.

2. Davis, G and Jones, R, Sound Reinforcement Handbook, 2nd Ed, 1989/1990, Ch 14, pgs 19-42, Yamaha Corp of America, ISBN 0-88188-900-8

3. Kloker, KL, Lindsley, BL and Thompson, CD, "VLSI Architectures for Digital Audio Signal Processing," Audio in Digital Times, Proc Audio Eng Soc, Toronto, Canada, May 1990, pgs 313-325. ISBN 0-937803-14-6

4. Fielder, LD, "Human Auditory Capabilities and Their Consequences in Digital-Audio Converter Design," Audio in Digital Times, Proc Audio Eng Soc, Toronto, Canada, May 1990, pgs 45-62. ISBN 0-937803-14-6

Key audio converter terms

Signal-to-noise Ratio (SNR, S/N ratio)

The ratio of the input signal S to the background noise N in a system. For an ideal A/D converter with a sinewave input, the SNR relates to the resolution n as

SNR(RMS) = 6.02n + 1.76 dB.

Thus, the resolution and quantization level establish the system noise floor. Random system noise reduces SNR.

Quantization Error

All A/Ds suffer a minimum error as a result of the discrete or finite specs that represent the analog input, and this error is directly proportional to the resolution.

Quantization uncertainty error = ±0.5 LSB

(Spurious Free) Dynamic Range

The ratio of the full-scale input or output signal to the highest harmonic or spurious input/output noise component amplitude. Essentially, this spec indicates how far it is possible to go below the full-scale input signal without hitting noise or distortion. Engineers usually measure it from 0-20 kHz and express it in dB. They calculate the dynamic range based on a -60 dB input signal as

    Dynamic Range = (S/[THD+N]) + 60 dB

The dynamic range of a digital signal is the ratio of the maximum full-scale signal representation to the smallest signal the DSP or data converter can represent. For an N-bit system, the ratio is theoretically equal to 6.02N.

Note: Spurious harmonics fall below the noise with a -60 dB input, so the noise level establishes the dynamic range. This is the recommendation of AES and EIAJ.

Total Harmonic Distortion (THD)

A very important specification in audio systems, THD is defined as the RMS ratio of the sum of all spectral components (harmonic distortion amplitudes) to the original full-scale input amplitude. It arises from A/D converter nonlinearities.

Total Harmonic Distortion + Noise (THD+N)

The ratio of the RMS value of a full-scale fundamental input signal to the RMS sum of all other spectral components in the passband, expressed in dB and percentages.

Go to Part 2
Click here to get your listing up.

Copyright © 2003 ChipCenter-QuestLink
About ChipCenter-Questlink  Contact Us  Privacy Statement   Advertising Information  FAQ