|
||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
|
|
Processing Digital Signals With Programmable Logic Devices
Senior Design Engineer Altera Corporation There are many DSP systems for wireless communications. Many of these need to execute large computation tasks that a single DSP processor may have difficulty performing. Very often, a designer will break these tasks among several DSP processors. This scheme results in a flexible system that can be reconfigured by changing the ROM code. Another approach is to use dedicated logic (either in the form of a custom application specific integrated circuit (ASIC) or a function-specific chip available from a semiconductor vendor (i.e. company XYZs 32-tap digital FIR filter) combined with a single DSP processor. The DSP processor is primarily used for command, control, and status monitoring, which may include certain "light duty" signal processing tasks such as power measurement. The dedicated hardware will perform most of the computationally-intensive DSP tasks, thus freeing the processor to work with the engineers custom DSP functions. A third alternative is to design your high-performance digital signal processing system with a programmable logic device (PLD) and a single DSP processor. A PLD can function very much like an ASIC or an ASSP. It can easily work with DSP processors and accelerate any computationally-intensive processing (usually datapath oriented). Since a PLD can be custom coded, the end user can also implement his own custom DSP functions within this device. This solution maintains high flexibility, quick time-to-market, and low system cost.
Simulation vs. Prototype hardware The major benefit of simulation is that the end user can test his hardware before it exists. The designer can try out different scenarios and determine if they will fly. By working with various levels of abstraction (i.e. algorithm-level floating-point numbers vs. fixed-point bit-accurate models), the designer can quickly put together a system and determine how well it works, and refine it into more accurate models of hardware. Simulation models are relatively easy to debug (compared to lab test benches). Also, in simulation, its possible to access internal nodes. Since the analysis platform is usually the same as the simulation platform, moving the hardware from a Verilog XL simulation to a MATLAB analysis package is not that difficult. The largest downside of simulation is simulation speed. Even though computers have gotten faster, the typical amounts of signal processing required in most communication systems have gone up even more. An often-overlooked aspect of simulation is the amount of time a designer spends setting up a complex stimulus for his hardware. The designer will generally spend a week or more to generating channel models and various noise sources. Working in the lab with a prototype also has many benefits. One can quickly set up complicated test environments since real-world signal generators and real-world channels are readily available. Since the system is operating in real time, its possible to obtain early Bit Error Rates and determine if your solution can work in the real world. The biggest downside to operating with a prototype has been attempting to debug a system operating in the real world, especially if the problem occurs infrequently. Actually, finding problems in the lab with an early prototype is a good sign. Very often these problems could not have been found in simulation, and represent real-world issues which must be overcome in your solution. Using PLDs allows you to get the lab faster, and accelerate your time to market. Modern PLDs help you to debug your problems with access to buried nodes within your design.
Allocating the PLD Resources At the highest level, the designer needs to budget the MIPS to perform the required function, and optimize the algorithm to obtain the lowest possible MIPS. The traditional hardware solution to obtaining higher system performance has been to throw more gates at the solution. This is the classic area vs. speed tradeoff. Modern PLDs have changed the rules. Very often, its possible to clock the internal structures of PLDs higher than the available system clocks. Devices such as Alteras APEX family provide on-board phase-locked loops (PLLs) for this purpose. These devices allow you to boost your internal performance (allowing the internal hardware to run as fast as logic allows). PLLs also save logic, since you can time-division multiplex. For example, lets say youve got a system clock of 25Hz. You need to perform four (16x16) multiplications in one system clock period (40 ns). This would normally require four multipliers operating in parallel. By using a clock-doubling on-board PLL, its possible to use two multipliers running at 50 MHz, eliminating the need for two of the multiplication structures. Many applications require temporary storage of data. Most high end PLDs have built-in memory today. This is typically very fast on-board memory (usually much faster than off-chip asynchronous SRAM). It can be difficult to find a PLD with exactly the amount of memory to fit an engineers custom application. In some cases, the application will require 16 MB of memory. In other cases, 2KB will suffice. If there is not enough memory on the device for your application, it must be stored off-chip. Many applications require very fast access of data however. In this case, the ideal solution would be a level-one memory cache. A cache will take advantage of the high speed of on-board memory and low cost associated with off-chip external memory. The hardest part about creating a cache is keeping track of old data. Typical cache designs utilize a Content Addressable Memory (CAM) in the cache controller to keep track of which locations have been updated in the built-in memory. If your design doesnt need much internal memory at all, you dont want to waste this resource. It is possible to use any memory as a product term, just by using the memory as a truth table. This method of converting a memory into a product term mode can become expensive very quickly. With an 6 bits of input, you need 64 memory locations. With 16 bits of input, you need 64K locations. Altera provides memory in the form of Embedded System Blocks (ESBs) in their APEX devices. A single ESB can be configured as a 2K ROM or SRAM, a 32-bit product term, or a 32-bit CAM. It is interesting to note that Altera APEX devices support a feature called SignalTap. SignalTap utilizes ESBs to record the states of internal nodes. In this manor it behaves as a built-in logic (or state) analyzer, for use in debugging your design. This is the least invasive manor in which to observe the state of internal nodes. The internal nodes remain internal, but the end user can access them through a SignalTap port. Design Tools and IP for PLDs PLD vendors provide tools to help designers with their DSP development. These design tools help the end user to create high-level DSP functions. These design tools and intellectual property are often available from the PLD vendor or a third party. One example of a DSP design tool for a PLD is Alteras FIR Compiler. This tool allows the end user to generate their custom FIR filters. It also automatically generates interpolation and decimation filters utilizing a polyphase decomposition. The designer can specify the type of filter desired (high pass, low pass, etc.), and cutoff frequencies. The FIR Compiler automatically generates floating-point coefficients. These are converted into fixed-point numbers by the tool. Alternatively, the FIR Compiler can read in coefficients generated by tools such as MATLAB. There is a fixed-point coefficient analyzer available which shows the fixed-point response. The tool allows the designer to choose from either serial or parallel architectures, and perform his or her speed vs. area tradeoff. In order to facilitate the speed/area tradeoff, there is a resource estimator, which dynamically updates the logic cell/ESB count based on user parameters. The FIR Compiler will also generate MATLAB / Simulink cycle accurate models of the FIR filter. This is important, since the end user may decide to incorporate pipelining in order to increase system throughput. Pipelining will, however introduce a latency into the system that must be verified to be compatible with your system-level simulations.
Verification of System Performance Even though the processing power available on the desktop is staggering, the amount of processing which is required to ensure a communications system is operating is even more staggering. In order to verify a system works with a bit error rate (BER) of 10-6, approximately 109 bits must run through a simulation to obtain a statistically relevant estimate. Each bit would require at least 100 simulation cycles each to get from transmitter through the channel and finally out from the receiver. There are ways to manage this problem. The simplest method is to extrapolate the information. Instead of working at with a channel whose signal to noise ratio (SNR) would result in a BER of 10-6, use channel with a lower SNR. This will result in a lower BER. If this data matches, mathematically, it should result in a higher SNR, delivering a better BER. The downside to this approach is sensitivity. Circuits that operate well under low SNR conditions may not operate as well under high SNR conditions. For instance, a timing recovery circuit in the receiver may actually be introducing a small amount of phase noise into the system. This phase noise would be drowned out by the overall noise at 10 dB. Lower to SNR to 30dB, and the phase noise from the timing recovery circuit may now become the limiting factor in your system. Unfortunately, this would never have been picked up at the 10 dB tests. Conclusion The best test naturally, is a real-world test. Real-world tests generally require a large setup time. For instance, if your system is an ASIC, it means that you cannot truly test the system until youve got the ASIC back from the fab. The solution to the problem comes from the programmable logic world. The traditional problem with PLDs has been both lack of resources and maximum clock speed. Currently, however, there are PLDs with over 40,000 logic elements, and over 512K bits of high speed RAM. Also, with improving device fabrication technology comes higher clock frequencies. With todays PLDs, its possible to obtain a system clock of over 70 MHz on a complex circuits which utilizes over 80% of device resources. Finally, PLDs offer speed of reconfiguration, and testability. You dont have to wait two months for a fab to turn your ASIC. If you want to look inside your architecture, you can bring the signals to unused outputs. If you have some leftover resources in your PLD, you can even utilize tools such as SignalTap (Alteras built-in logic analyzer), to look inside your design (with minimal timing impact). By incorporating PLDs into your signal processing solution, its possible to significantly increase your system performance. Modern PLD architectures and design tools make utilizing PLDs easier than ever, and speed up your development time.
Figure 1a. System level design tools help to speed up development
Figure 1b: Design Tools Designer can perform area / speed tradeoff interactively
|
|||||||||||||||||||||||||||||||||
|
Copyright © 2003 ChipCenter-QuestLink About ChipCenter-Questlink |
||||||||||||||||||||||||||||||||||