By Peter Alfke, Director of Applications Engineering, Xilinx Inc. (peter.alfke@xilinx.com)
IC manufacturers are under relentless pressure to increase performance and reduce cost. They can do that by migrating existing designs to smaller-geometry, thinner oxide, more advanced processes. This evolution has gone on for more than 30 years. TTL migrated to LS-TTL, and then to various forms of Advanced Logic, CMOS migrated from 2ý, 2-layer metal to 0.25ý, 5-layer metal. Cost per function was reduced by several orders of
magnitude and speed was increased at least 20-fold.
Fundamentally, everybody benefits, but this rapid evolution is also causing some problems. Equipment manufacturers often finish their designs and move them into
production, but find out later that they can only buy components that are significantly faster than the ones used in the original designs. In most cases, the faster part works just fine, with lower power consumption, wider timing margins, and functioning over a wider-temperature range. The designs have simply become more robust.
But there are exceptions. The important aspect is device speed, not the users clock rate. An old 5-MHz design, when run in modern CMOS devices, has flip-flops capable of
toggling at 500 MHz, and it drives output edge rates of up to 4 V/ns. These fast output transitions can create reflections and ringing even on short PC traces. Crosstalk and ground bounce can become nastier. Faster circuits can respond to glitches and spikes that were ignored by the slower circuits.
Here is a list of potential problems. They can cause headaches and costly rework and redesigns of the PC-board or even the logic. Most problems are the result of careless PC-board layout.
Clock reflections and ringing causing double triggering
The faster edge rate on a clock output causes even a short interconnect to behave not as a lumped capacitance but rather as a transmission line, usually unterminated, which will cause signal reflections and ringing whenever the round-trip (out and back) propagation delay is longer than the rise of fall time of the driving output. At the typical
propagation speed of 6" per nanosecond and a 1 ns output transition time, all interconnects longer than 3" can thus exhibit reflections and ringing. This can cause multiple crossings of the receiving input threshold, resulting in multiple internal clock pulses.
Check the clock signal at its origin and at all load points, using a very fast scope and scope probe. 500 MHz bandwidth is marginal, 1 GHz is good, 2 GHz is better. RC-termination at the far end of the clock line has proven effective. Use a resistor equal to the characteristic impedance of the line (50 to 100 Ohm) in series with a dc-blocking capacitor of about 150 pF, and connect this series RC combination from the clock line-end to ground. If the clock output drives only a single destination, series termination
right at the source is very effective. Use a 27 to 47 Ohm resistor right at the driving source, but never use this termination method when the clock line drives multiple distributed loads.
Crosstalk on the PC-Board
Faster edges can couple capacitively and also inductively into adjacent traces. Avoid long, closely spaced parallel runs of sensitive signals, and minimize holes and breaks in the ground and Vcc planes. The signal current must always return to its origin, and any detours cause inductive voltage drops.
Series Inductance of the Power-Supply Decoupling Capacitors In a single-clock synchronous system, all supply current must be delivered within a few nanoseconds after the clock. The dynamic current is thus several times higher than the average dc value, and this current surge can only be supplied by the decoupling capacitors. The unavoidable
inductance in series with the capacitors must, therefore, be kept very low. Check the synchronous noise on the Vcc pins with a fast scope. Use ceramic, preferable surface-mount, capacitors located very close to the Vcc package pins and directly connected to the ground plane.
Slow input transitions pick up crosstalk and noise, This is especially dangerous on clock inputs, where a single slow transition can easily be interpreted as three transitions in very fast succession. This may be difficult to detect especially when the input transition is clean, but is internally affected by ground bounce. The LSB of a counter not changing is often a clue to such double triggering. Avoid slow transitions on any clock line, anything over 10 ns rise of fall time is unacceptable.
Ground bounce causing erroneous input transitions
Simultaneously switching outputs cause ground bounce. The chip ground moves up and down with respect to the PC-board and system ground. This changes the output Low voltage and, worse yet, changes the apparent input voltage, effectively adding to or subtracting from the input threshold voltage. In a fully synchronous design with only one clock, surprising amounts of ground bounce are tolerated by all the synchronous
data inputs, since they only need to be valid a set-up time before the subsequent clock edge. But all asynchronous inputs, including the clock, are sensitive at any time. They should, therefore have the best possible noise immunity, which means they should quickly be driven between the supply rails. Vcc bounce is less serious in 5-V systems, but in 3.3-V systems, it is almost as serious as ground bounce.
Programmable Logic Requires Extra Attention
All digital ICs, dedicated circuits like microprocessors and peripherals as well as ASICs and FPGAs must cope with the external problems mentioned above. FPGAs can have additional problems since the user is responsible for their internal logic implementation, and among the tens of thousands of FPGA users, there are some that occasionally violate
basic design rules.
Just Say NO to Asynchronous Design
Synchronous designs are safer than asynchronous designs, more predictable, easier to simulate and to debug. Asynchronous design methods may ruin your project, your career and your health, but some designers still insist on creating that seemingly simple, fast little asynchronous circuit. Thirty years ago, TTL-MSI circuits made synchronous design attractive and affordable; twenty years ago, synchronous microprocessors took over
many hardware designs; synchronous state machines have recently become very popular, but some designers still feel the itch to use asynchronous tricks.
The popularity of FPGAs has created a flurry of asynchronous designs in a specially treacherous environment where the logic can be customized at the gate level, but is very difficult to inspect. Internal nodes cannot be calmed down with capacitive loads, the BandAid of simpler technologies.
Here is a short description of the ugly pitfalls in asynchronous design, documented for the benefit of the inexperienced designer. Veterans are familiar with the problems and may even know their way around them to design asynchronous circuits that are safe.
Clock Gating
Gating a clock signal with an asynchronous enable or multiplex signal is an invitation to disaster. It will occasionally create clock pulses of marginal width, and will sometimes move the clock edge. Asynchronous signals can be used to gate the clock reliably, as shown below, but this still introduces additional clock delay, which can cause hold-time
problems.
Note that the DISABLE and ENABLE control signals must arrive during the clock half-period following the active clock edge and must stay active for the remainder of the clock period, in order to avoid generating clock glitches.
Ripple Counters
Using the output of one flip-flop to clock its neighbor can generate a binary counter of arbitrary length and impressive toggle rate, but with decoding problems due to the ripple-carry delay. The worst problem occurs when the counter increments from 2n -1 to 2n. It takes n delays from the incoming clock to the resulting change in bit n. In a 16-bit
counter, this delay can be longer than 20 ns. At a 50 MHz clock rate, certain counter values will never exist, the LSB will have changed before the MSB reached its new value. Decoding such a counter will always produce dangerous decoding spikes. Note that these spikes are independent of the counter clock rate. Designers of slow systems are
actually most vulnerable to this problem, since they are less accustomed to delicate timing issues.
Decoder Driving Clocks and Reset Inputs
Indiscriminate use of decoder outputs to clock flip-flops or set/ reset them asynchronously is one way to invite unpredictable and unreliable operation. The decoded outputs from synchronous counters are even more devious. While the decoding spikes from ripple counters are fairly wide and somewhat predictable, decoding spikes from synchronous counters are entirely the result of small but unpredictable differences in routing and decoding delays. Using the decoded Terminal Count as asynchronous Master Reset input is another popular method to achieve unreliable operation. The spike might
reset some flip-flops, but not all.
Synchronizing One Input In Several Flip-Flops
A single asynchronous input should be synchronized in only one flip-flop. There will be an occasional extra metastable delay as described in a separate app note. This extra delay is acceptable in all but the very fastest systems. Synchronizing one input in more than one
flip-flop is another matter. The setup times and input routing delays of the various flip-flops will inevitably differ by one or several nanoseconds. Any input change occurring during this time difference will be docked differently into the individual flip-flops, and the error will last for a full clock period. Synchronize any input with only one single
flip-flop!
Synchronizing Multiple Inputs In One Register
Synchronizing an asynchronous parallel data word can lead to wrong results when the asynchronous inputs change during the register set-up time. For the duration of one clock period the register might then contain any imaginable mixture of old and new bit values. There is no simple solution, the most popular is to pipeline the result and compare
the previous and present values. Any difference declares the data invalid. This operation is sometimes performed in software.
Asynchronous Reset of Multiple Circuits
A simple RC combination, perhaps augmented by a diode, is a popular power-on reset circuit. When it is used to drive several ICs in parallel, the system must accept wide variations in the reset duration. Differences in input threshold voltage will cause some circuits to start operating while others are still being held reset. If that is unacceptable, the RC combination must drive only one IC that, in turn, controls the reset operation of all others.
Excessive Interconnect Delays
Combinatorial and interconnect delays that exceed one clock period might shrink to less than one clock period and thus alter the functionality. No competent designer should ever have that problem which indicates carelessness and lack of proper timing analysis or simulation. Partition and lay out the logic, perhaps add pipeline-registers, to guarantee that all combinatorial delays are shorter than one clock
period.
Conclusion
This paper has described the problems caused by increasingly faster semiconductor technology and has highlighted the need for rigorously synchronous design inside programmable logic devices.