|
by
Ingo Cyliax
Start ı Evolving
Technology ı Just the Beginning ı Sources
and PDF
EVOLVING TECHNOLOGY
A CPLD, as the name implies, supercedes
the PLD. Because PLDs like the 22V10 are fairly small devices, designers
typically had to wire several of them on a PCB to implement large
designs. When you increase the component count, you reduce reliability
and increase cost. Also, for high-speed designs, the chip interconnects
slow down the signals. A CPLD combines several PLD structures into
a single chip and adds a global programmable interconnect to connect
the inputs and outputs of the PLD cells to each other. Figure 4 shows
an example of a CPLD.
|
|
| Figure 4ıComplex PLDs (CPLD)
contain several PLD structures and a global interconnect matrix
that can wire the inputs and I/O signals from each PLD block
to each other and to external pins. |
CPLDs are a great innovation because
they make it possible to add a single chip that is totally programmable.
The more PLD cells and interconnects you can get in a chip, the bigger
the single-chip design can be.
On the other hand, FPGAs have a different
background. Although CPLDs have their roots in PLDs, which were primarily
used to reduce the component counts on PCBs, FPGAs are a programmable
version of gate arrays.
A gate array is an ASIC with a simple
gate replicated in a large array. The gate is typically a two-input
NAND gate. To implement a design, these silicon gates were connected
with metal traces. So, in a sense, the function is implemented solely
by routing. We can implement all of the basic functions (OR, AND,
INVERT, PASS) with a NAND gate depending on the voltage encoding of
the inputs and outputs.
Voltage level Function implemented
with NAND
A B O
L=0 L=0 L=0 !(A * B) ->
A NAND B
L=0 L=0 L=1 A * B ->
A AND B
L=0 L=1 L=0 !(A * !B) ->
invert A, when B = L
L=0 L=1 L=1 (A * !B) ->
pass A, when B = L
L=1 L=0 L=0 !(!A * B) ->
invert B, when A = L
L=1 L=0 L=1 (!A * B) ->
pass B, when A = L
L=1 L=1 L=0 A + B ->
A OR B
L=1 L=1 L=1 !(A + B) ->
A NOR B
Table 1ıItıs a brain teaser, but the
basic NAND gate can be used to implement all basic logic functions
depending on the input and output voltage conventions.
With gate arrays, the actual design can
be implemented fairly late in the manufacturing process (when the
metal layers get deposited) so the lead time is short compared to
a truly custom ASIC. These types of gate arrays are called mask-programmed
gate arrays.
An adaptation of the mask-programmed
gate arrays is the programmable gate array. These devices had predefined
metal layers that connected traces to all of the gates. To program
the function, fuseable links would be burned off either with a large
current or with a laser.
It turns out that the metal traces actually
take up much of the chip area in a programmed gate array and much
of the metal layer isnıt used. One way to enhance the density of the
chip, is to increase the logic function of the gate in a gate array.
This was first done by using 4:1 MUXs as the basic logic element.
By programming the input levels of a 4:1 MUX, it can be used to generate
any function of two variables, independent of the voltage levels.
We can see that this is much denser then a two-input NAND gate. A
single 4:1 MUX can implement a half-adder.
|
Connect
Your PIC to the Internet
Now,
Getting Connected to the
Internet Can Earn You Cash.
More information
|
Instead of programming gate arrays by
burning, fusing, or cutting metal traces, you can use a small programmable
routing matrix to implement the routing of the chip. This matrix could,
for example, connect the input and outputs of each logic block to
the nearest neighbor or to a global interconnect. If this device can
be programmed by the user, then you have a basic FPGA architecture.
Figure 5 shows an example of a generic FPGA architecture.
|
|
| Figure 5ıThe basic field programmable
gate array (FPGA) contains configurable logic blocks, small
routing matrices, and I/O blocks that can configure each I/O
pin for different functions. |
Let's look at programming these devices.
I mentioned that early chips were programmed burning out fusible links
or similar features. Of course, these chips are not reprogrammable
and are called one-time programmable (OTP) devices. There are applications
for OTP CPLDs and FPGAs. For example, Actel makes a line of OTP FPGAs
that are robust in the presence of radiation and thus are used in
military and space applications where you don't want your chips to
get reprogrammed. Also, fusible links propagate signals fast because
they are essentially just wires on the chip.
Although some OTP applications are interesting,
I'll primarily focus on reprogrammable architecture in this column.
There are two types of reprogrammable FPGA/CPLD technologiesıflash-memory/EPROM
based and SRAM based. Flash-memory/EPROM-based CPLDs are easy to understand.
A small pass gate is wired to a flash memory or EPROM cell and that
enables us to program the terms, the macro cells, and the large interconnect.
Just like EPROMs, EPROM-based CPLDs have
pretty much been surpassed by flash memory-based devices. Flash-memory
devices are reliable and don't require expensive windowed packaging
to erase. Also, just like flash-memory devices, flash memory-based
CPLDs are in-circuit programmable, usually via a JTAG or other serial
interface.
On the other hand, reprogrammable FPGAs
tend to be primarily SRAM based. Same idea with the small routing
matrix, which is implemented using pass gates driven by the value
of the SRAM memory cell assigned to it. However, instead of using
pass gates to program the function of the basic logic block, most
SRAM-based FPGAs use look-up table (LUT) function generators, which
are small SRAM cells with four or five inputs. The inputs are the
address lines of the SRAM cell and the output of the SRAM cell is
the output of the function generator. Of course, the programmable
flip-flop after the logic block is programmed via pass gates. To program
the function of the logic block, you load up the contents of the SRAM
cell and configure the logic block to be either registered or combinatorial.
SRAM-based FPGAs tend to be much denser
than flash memory-based CPLDs, but they lose their configuration once
the power is turned off. Because they lose their configuration, you
need some sort of external memory to store the configuration information.
Most FPGAs can read programming information from a serial or parallel
EPROM or flash memory. This mode is called the master mode. The FPGA
will provide all signals and addressing to read the data on its own.
No components other than the serial/parallel PROM are needed.
SRAM-based FPGAs can also be programmed
via an external source. In slave mode, the FPGA accepts a serial or
parallel data stream that represents the configuration data. The source
of the data can be a processor, computer, or an FPGA that is acting
as a master. Using this technique, itıs possible for several FPGAs
to be programmed from a single memory. A master FPGA is wired to a
daisy chain of slave FPGAs. When the master FPGA has been programmed,
it will keep reading the data from the memory and pass it on to the
slave devices until all of the FPGAs are configured.
Configuring SRAM-based FPGAs is faster
then programming a flash memory-based CPLD, but takes some time when
the system is started. Itıs important to take the FPGA configuration
time (at startup) into account when designing your system. If you
need instant power-on performance, you probably want to use a flash
memory-based device or an OTP device.
To recap, a CPLD is a device that has
several PLD-like blocks connected via a large connection matrix, and
an FPGA has a large number of logic blocks that are usually simple
lookup tables followed by a configurable register connected with smaller
routing matrixes in an array. This is the basic structure idea, but
of course, not everyone is happy, so let's look at some architectural
enhancements and features that can be found in many modern FPGAs and
CPLDs.
One of the features in a SRAM-based FPGA
is the SRAM-based LUT. Because they usually have four or five inputs,
they are essentially 16 ı 1 or 32 ı 1 memory blocks. Early FPGAs would
only let you use these LUTs as ROM cells. If you wanted to implement
registers or memory, you had to use the flip-flop in each logic block
as a single register bit. By making the LUT writable, you can now
use a LUT as a general-purpose memory or register block in our design.
So, you get 16 or 32 registers for each logic block.
Occasionally it would be nice to have
larger memory blocks. Maybe you want a FIFO that can buffer up data.
Newer FPGAs now include dedicated large memory blocks that can be
used in this way. This is only one trend to combine more complex functions
with a general-purpose FPGA. There are FPGA architectures from Lucent
that include a whole PCI bus interface as dedicated logic on the chip.
Also, Triscend has an interesting architecture that adds a processor
core with an FPGA. Check out the links in the sources sections to
get more information on some of the chips available.
Besides registers and memory blocks,
math is important. The most common operation is the add. A full adder
can be implemented in a four-input LUT (or two blocks if you need
a carry out). This setup is ideal for implementing ripple-carry adders.
However, ripple-carry adders are slow when the word size increases
and you generally want to use carry-lookahead adders, which take more
logic to implement.
Because adders are so prolific (think
counters), current FPGAs and some CPLDs also include hardwired carry
chains in the logic blocks. These carry chains are dedicated carry
generators. If the adjacent bits of the adder or subtraction are connected
to adjacent logic blocks, you can use the carry logic to implement
a fast ripple-carry adder without using additional logic. The hardware
carry chain is so fast it can be used to efficiently implement 16-
or even 32-bit adders.
FPGAs and CPLDs are register rich. Each
logic block has a flip-flop. FPGAs are good for implementing synchronous
circuits and have efficient dedicated global clock networks that distribute
clock signals to all of the flip-flops on the chip. Most FPGAs have
multiple global clock networks, making it suited for implementing
multiclock domain circuits. These global clock networks can also be
used for logic signals that need to go every place on the chip. Clock-enable
signals or strobe signals can use these networks.
Some FPGAs also have tristate drivers
at each logic block that can drive bus-like networks running along
the rows or column of the device. These drivers can be used just like
tristate buffers and buses in board-level designs. One common trick
is to use buses and tristate buffers to implement wide MUXs. Implementing
MUXs using tristate buffers is essentially free in FPGAs that have
this feature because there is a tristate buffer in every logic cell
output.
I mentioned earlier that gate arrays
can be implemented with simple NAND gates, but the density is not
as high. However, itıs much easier to synthesize high-level descriptions
of circuits into simple gates, than trying to take advantage of high-level
functional blocks. This debate is similar to the argument about a
C compiler being able to optimize more easily to a RISC processor
than to a CISC processor. However, just as C compilers have gotten
more sophisticated and can target CISC processors better, high-level
logic synthesizers have gotten much better at targeting complex logic
functions. For example, the VHDL compiler that comes with Xilinx's
Foundation toolset can figure out when to use carry chains to implement
an adder automatically.
Because the tools are getting better
and complex logic function can be implemented more densely, the trend
seems to be to implement more complex logic block functions.
Initially, FPGA had only local and global
routing resources (i.e., a logic block could only connect to adjacent
logic blocks or to global networks). Newer FPGAs have multilevel routing
hierarchies, so logic blocks can connect to different levels in the
routing hierarchy. These FPGAs are complex, but luckily the design
software takes care of these issues for you.
Incidentally, routing performance is
one area in which CPLDs are more predictable because they have fewer
routing matrixes then FPGAs. Each routing matrix adds a little delay
to the signal, so the fewer routing matrixes a signal has to traverse,
the faster it gets there. FPGAs have many matrixes and the software
has to route the signals around the chip. So, depending on where the
logic blocks end up on the chip, the signals can be delayed significantly.
I mentioned that the logic complexity
is going up in FPGAs. Also, FPGAs tend to have higher logic densities
per chip then CPLDs. But all of this is changing. CPLDs are becoming
more dense, with more PLD blocks and more routing matrixes so, in
a sense, CPLDs are becoming more like FPGAs and FPGAs more like CPLDs.
Also, with more system-on-a-chip functionality (e.g., dedicated CPUs,
bus interfaces, and memory blocks), it will be interesting to see
where all this is going.
Lucky for us, the new advances and features
in the high-end devices have made last yearıs basic low-density FPGA
and CPLD architectures more economical to use in embedded systems.
Many production-volume products now ship with FPGAs and CPLDs in them,
never bothering to implement the function in an ASIC.
PREVIOUS
NEXT
Circuit Cellar provides up-to-date information for engineers. Visit
www.circuitcellar.com for
more information and additional articles.
For subscription information, call (860) 875-2199, subscribe@circuitcellar.com
or subscribe online.
ıCircuit Cellar, the Magazine for Computer Applications. Posted with
permission. |