ChipCenter Questlink
SEARCH CHIPCENTER
Search Type:
Search for:




Knowledge Centers
Product Reviews
Data Sheets
Guides & Experts
News
International
Ask Us
Circuit Cellar Online
App Notes
NetSeminars
Careers
Resources
FAQ
EE Times Network
Electronics Group Sites

Designing for Performance: CPLDs vs. FPGAs

By Anita Schreiber, Philips Semiconductors

Traditionally, large designs have been targeted to FPGAs, however, designers have been challenged with how to achieve high performance from these devices. As the densities of CPLDs increase, applications that once were only targeted to FPGAs are now being targeted to CPLDs. Philips Semiconductors' CoolRunner 960 is an example of how larger, low-power devices are encroaching on the FPGA market. The CoolRunner 960 contains 960 macrocells and simultaneously delivers high performance and low-power consumption. This device can run at system speeds well over 100 MHz while consuming less power (< 100uA at standby and approximately 300 mA at 100 MHz) than a FPGA. The CoolRunner 960 also has deterministic timing and the ability to incorporate last minute design changes without changing pinout. Designers are finding that the architecture of CPLDs such as Philips Semiconductors' CoolRunner devices makes the implementation of high-performance applications easier with a shorter development cycle because device resources are more efficiently used and there are no routing or timing issues to deal with.

Architectural Differences between FPGAs and CPLDs
The basic logic cell of FPGAs typically consists of a 4 or 5 input Look-Up Table (LUT) followed by a D-type register. This allows the implementation of logic functions with 4 or 5 inputs within a single logic cell. Horizontal and vertical routing channels then interconnect these building blocks.

fpga_arch.jpg

Because the number of inputs to the logic cells of FPGAs are limited to 4 or 5 inputs, wide logic functions are spread across several of these cells. The delay through these additional cells increases the propagation time between registers and limits the maximum frequency of the design. The delay between registers is also based on the horizontal and vertical routing channels that connect the logic cells together and can vary greatly depending on the relative location of the logic cells and the different routing channels that can be used. The total delay of the implementation of a logic function is not known until the design has been placed and routed by the FPGA software.

The basic building block of a CPLD is a macrocell, which typically consists of 4-5 product terms with up to 36 inputs followed by a D-type or T-type register. Macrocells are grouped into logic blocks which are connected via a centralized interconnect array. Logic functions are synthesized into sum of product equations with up to 36 input terms.

cpld_arch.jpg

A CPLD macrocell can implement a function with 36 inputs. Wide logic functions can be implemented in a single macrocell, so there are no additional delays through extra blocks. Since additional blocks are not necessary, relative placement of the macrocells and additional routing delays are not an issue. Thus, the delay of the implementation of the logic is predictable and is known from the data sheet values. The delay is known before the design has been placed and routed in the CPLD.

Techniques for Implementing High Performance Designs
In order to compensate for the variable and sometimes quite large propagation delays that result from implementing wide logic functions in FPGAs, several design techniques are employed to obtain high performance.

Pipelining
Pipelining is a technique where a wide logic function is partitioned into smaller logic functions of 4 or 5 inputs to fit into FPGA logic cells. The output of these smaller functions are registered within the FPGA logic cell, and the results are strung together to implement the final function. This minimizes the delay between registers and increases the internal clock frequency of the design, but this technique also increases the latency from input to output of the application. Note that the delay between registers is still dependent on the relative location of the logic cells due to the routing of the output of one logic cell to the input of the next. Therefore, the internal clock frequency of the device is not known until the design has been placed and routed. Since each logic cell of the FPGA contains a register that would otherwise be wasted if the logic cell were purely combinatorial, this technique fully utilizes the resources within the cell. It does, however, significantly increase the number of registers in the design. An increase in the number of registers in a design toggling at a fast clock rate in turn increases the power dissipation of the device.

Pipelining is not necessary with CPLDs. Because a wide logic function can be implemented in one macrocell, there is no need to break up the logic function. This decreases the latency of the design and leaves more available registers for implementing other logic functions within the device. The delay between registers within the design is based off the data book values and the performance of the design is known before the design is placed and routed. Since registers are not wasted implementing the pipeline, the implementation of the design in a CPLD uses fewer registers than the FPGA implementation. Therefore power dissipation is lower in the CPLD.

pipeline.jpg

Replication of Logic to Reduce Fan-out
Another technique used to increase the timing performance in FPGAs is to replicate the generation of high fan-out signals. Routing between logic cells in an FPGA contributes significantly to the overall delay of a logic function. As the fan-out of an output from a logic cell increases, the routing delay of the line increases. This is due to the additional routing required to reach all of the necessary inputs of other logic cells as well as the additional capacitance of the line due to the increased number of loads. The same logic function is replicated so that each copy of the signal now has a reduced number of loads. This decreases the routing delay of these lines which increases the internal clock frequency of the device; but it also increases the number of registers required to implement the high fan-out signal. This technique allows each copy of the logic function to be placed close to the logic cells that it feeds so that the routing delays can be further reduced, but the internal clock frequency of the design is again dependent on the placement and routing between blocks. Therefore, the performance of the design is not known until the design has been placed and routed. Also, as the number of registers increases, so will the power dissipation of the device.

In a CPLD architecture, all macrocells are interconnected through a centralized interconnect array. An output from a macrocell is routed back into the interconnect array and is then available to all other macrocells in the device. The delay for this route is deterministic and does not vary depending on the number of loads or the location of those loads. It is therefore unnecessary to replicate logic in order to increase the performance of the device.

"1-Hot" Encoding of State Machines
FPGAs have a very register-rich architecture. To increase the performance of state machines, designers often take advantage of the large number of available registers within the device by using a "1-hot" encoding style for state machines instead of a binary or gray-code encoding method. The "1-hot" encoding method uses one register for every state of the machine. For example, a 16-state state machine would use 16 registers within the device. This encoding method improves the performance of the state machine in an FPGA because the active state of the state machine is represented with one signal, reducing the number of inputs to the logic that determines the active state. Binary encoding methods increase the inputs to the state logic because all of the state register outputs are required to determine the current state. For example, in a "1-hot" state machine, the only signal required to determine that state 3 is active is the output of the state 3 register. If the logic that determines if state 4 should be active is dependent on whether state 3 is active, only the output from the state 3 register is needed. In contrast, with a binary-encoded state machine, the outputs from all state machine registers would be necessary to determine that state 3 was the active state. In this case, the logic to determine state 4 would need all of the state machine register outputs as input. If additional inputs are also required to determine state 4, the number of inputs to the logic may exceed the 4 or 5 inputs to the FPGA logic cell LUT. When this happens, this logic must be pipelined, which further complicates the implementation of the state machine at a high clock frequency.

The number of registers required to implement a "1-hot" state machine is larger than that of a binary encoded state machine, therefore the number of registers required to implement a state machine in an FPGA increases. As the number of registers increases, the power dissipation of the device increases. Also, large state machines tend to be hard to implement due to the large number of registers and routing resources required by this encoding method.

CPLDs typically use a binary or gray-code encoding method for state machine implementation. The number of registers used in a binary encoded state machine is log2 of the number of states. A 16-state state machine would use only 4 registers. This encoding method works well in CPLDs because the logic to determine the active state can have a large number of inputs. Less registers are required to implement a state machine and therefore CPLDs can implement large state machines quite easily.

Effects of these Design Techniques on Logic Synthesis
More and more designers are moving to the use of a high-level Hardware Description Language (HDL), such as VHDL or Verilog, as their method of design entry. The HDL description of the logic is then synthesized into logic functions that are compiled and fit into the target device. One of the advantages of using a HDL for design entry is that the design can be done independent of the target device and re-targeted to different devices without changing the design. Many FPGAs and high-density CPLDs are used to prototype ASIC designs. This is most successful when the HDL description of the design can be targeted to a CPLD or FPGA for prototyping and then into the ASIC without any changes to the HDL file.

To successfully synthesize a high performance HDL design to an FPGA, the HDL description of the design must break up a wide logic function into pieces that fit within the basic building block of the FPGA and include the additional pipeline registers. This then makes the HDL description of the design device specific and removes the capability of re-targeting the design without changes. It is also not intuitive, when describing the behavior of a design, to break up logic and insert registers that would otherwise be unnecessary.

For example, consider the HDL description of a 12 to 1 multiplexor. The device independent and intuitive description of this function is to describe the decoding of the select lines to output the selected input signal. The implementation in an FGPA would require that this function be pipelined, therefore the designer would have to decide how to break up the 12 to 1 multiplexor into smaller multiplexors that fit within the FPGA logic cell. The outputs of these smaller multiplexors are then multiplexed to form the 12 to 1 multiplexor. The HDL description would then be written to describe the smaller multiplexors with the insertion of the necessary pipeline registers followed by the final level of multiplexing. At this point, the HDL description is very device specific and is no longer intuitive.

As mentioned above, there are many signals with high fan-out where the generation of these signals has to be replicated to achieve high performance within an FPGA. This replication of logic is again not intuitive to the basic description of the application and makes the HDL description of the design device dependent and not easily re-targetable to an ASIC or other device without design changes.

Because the routing between logic cells in an FPGA contributes significantly to the delay of the design, achieving high performance from an FPGA typically requires floorplanning the design to keep routing delays at a minimum. The need to control the placement of blocks in the design also makes the use of a HDL description of the design device dependent and synthesis more difficult.

The combination of high speed, low power and routable architectures with CoolRunner CPLDs addresses some of the market's most challenging design considerations. Since CoolRunner CPLDs have the capability to implement wide logic at high performance without pipelining, synthesis to the CPLD is much more effective and allows the HDL description of the design to be intuitive and device independent. This HDL description is directly re-usable if the design needs to be re-targeted to an ASIC. Control of the placement of the logic is not necessary to insure the performance of the design, nor is additional logic needed to replicate signals with high fan-out. The HDL description of the design does not need to include registers for the sake of pipelining or logic replication that are otherwise functionally unnecessary, but can instead describe the true desired behavior of the design. This enables designers to develop HDL descriptions of their designs quicker and easier.

Summary
The techniques used to achieve high performance in FPGAs require the use of additional registers within the device. Additional registers within the device affect the power dissipation of the device and can, at times, force the design in to a device larger than what is really needed. These design techniques also make the use of a HDL non-intuitive and device specific.

Because the architectural differences between CPLDs and FPGAs, additional registers are not necessary to achieve high performance in a CPLD. This means that the registers within the device are utilized more effectively and that a design will not be forced into a larger device to accommodate functionally unnecessary registers. Since the number of registers to implement a high performance design in a CPLD is less than the registers required in a FPGA, the power dissipation of the CPLD will be lower. The use of a HDL for design entry is intuitive and device independent.


Home | Product of the Week | Tech Note | App Note | Vendor Tools | Feedback

Click here to get your listing up.

Copyright © 2003 ChipCenter-QuestLink
About ChipCenter-Questlink  Contact Us  Privacy Statement   Advertising Information  FAQ