ChipCenter Questlink
SEARCH CHIPCENTER
Search Type:
Search for:




Knowledge Centers
Product Reviews
Data Sheets
Guides & Experts
News
International
Ask Us
Circuit Cellar Online
App Notes
NetSeminars
Careers
Resources
FAQ
EE Times Network
Electronics Group Sites

  Test and Measurement

    Tech Note

T&M Main | Archive | Feedback

Debugging a PCI Bus With a Mixed-Signal Oscilloscope
A Real-World Example

In today's high-tech, high-speed world it's becoming increasingly difficult to meet schedules and ship products on time. Customers want more, sooner, and for less money. One way to beat the clock is to put a mixed-signal oscilloscope on your bench. It can help locate signal anomalies buried in complex signals—and do it much faster than has been possible in the past.

by Vivian Patlin,
Software Engineer,
Agilent Technologies,
1900 Garden of the Gods Road MSACIBR
Colorado Springs, CO 80907
Phone: +1(719)590-3296
www.agilent.com

Jump to...
Well-Known Challenges
Some Pitfalls
A Real-World Example
   Making the Bugs Repeatable
   Scrutinizing the Hardware
   Enter the MSO
   Time for Infinite Persistence
   There It Is!

Though the Peripheral Component Interconnect (PCI) bus has been a popular parallel bus for years, identifying signal-integrity issues on it can still tie up a project—sometimes for months.

It's axiomatic that the faster an engineering team can complete a design, the sooner the product hits the market—and the sooner revenue begins arriving. One of the most time-consuming tasks is debugging the signal anomalies that pop up. In the past, designers have found that it generally required several test instruments to fully test and debug a parallel bus such as the PCI bus.

Well-Known Challenges
Back to top

The challenges of debugging a PCI-based system are well known. For starters, there are many bus lines to monitor at once, and that means too many signals to view at once.

That situation can be exacerbated by difficulties in detecting analog anomalies riding on digital signals. As such, it's crucial for digital designers to be able to examine the analog characteristics of digital signals.

Since no single instrument sufficed to do this in the past, debugging often required the use of a number of instruments to detect anomalies. But many signal defects were still missed, sometimes because of either slow display update rates or instruments with too much deadtime at their inputs.

Some Pitfalls
Back to top

A logic analyzer is fine for looking at all bus lines simultaneously, but since it's strictly a digital signal tool, it captures data as ones and zeros. A user can't view detailed signal behavior such as ringing, or see rise or fall times and bounce, as you can with an oscilloscope.

Sometimes a PCI exerciser and analyzer are used. The combination can connect to all the lines of a PCI bus and provide a number of timing checks for bus events. Timing violations can be detected and isolated more easily with this tool. But again, it doesn't enable you to view and analyze an error in detail.

To view and analyze signal integrity, digital storage oscilloscopes (DSO) have—until recently—been considered the best tools for the job. They're designed specifically to look at signal characteristics in detail. However, since their channel count is typically limited to four channels, it's sometimes difficult to trigger properly on PCI bus events.

A Look At a Typical Mixed-Signal Oscilloscope

The Agilent Technologies Model 54832D is an example of a mixed-signal oscilloscope (MSO). It's called mixed signal because it provides several analog inputs and sufficient digital inputs to enable viewing many signals at once, and to permit triggering on any of many bus lines.

Agilent Technologies Model 54832D Mixed-Signal Oscilloscope

This example MSO can measure and display two analog signals (represented in the upper portion of the screen) and 16 digital channels (represented in the lower portion of the screen) at once, and with all 18 channels time-aligned.

Each of the two analog channels of this MSO signal provides 600 MHz of bandwidth. The MSO's standard acquisition memory enables capturing up to 8 Mbytes.

This instrument thereby combines the detailed signal-analysis capability of a scope with the multi-timing measurements of a logic analyzer. Intended for designs with lots of digital signals, it enables you to see complex interrelationships among all displayed signals. Its high-definition display is mapped into 32 levels of intensity that disclose subtle details instantaneously.

This means that you need to cross-trigger the DSO with either an exerciser/analyzer tool or a logic analyzer to look at signals where there's complex multi-line triggering. That can be cumbersome. A mixed-signal oscilloscope (MSO) can fill the gaps.

MSOs combine the signal analysis capabilities of oscilloscopes with the multi-timing measurement capabilities of logic analyzers. In effect, an MSO unites, in a single enclosure, the best features of an oscilloscope with a logic analyzer. With an MSO you can both trigger and view the signal-integrity issues on a PCI bus.

A Real-World Example
Back to top

To see how this is done, let's look at a real problem the author and her design team faced while debugging a PCI parallel bus. In the early stages of this actual project, progress moved along smoothly. Prototypes came back from manufacturing and seemed to function properly. The system's firmware development was on schedule, and the project was cruising toward completion.

Then a fly landed in the honey. Some new boards started to fail sporadically. Everything would seem to be working fine, and then suddenly systems would crash, and crash hard. System shut-down and reboot was required to get up and running again.

This posed a huge problem for our project team. What's more, our manufacturing-line folks couldn't consistently get new systems to run through all the required parametric tests.

Other engineers on our team couldn't proceed through environmental testing. The project's firmware engineers, for example, had to reboot frequently, sometimes several times each day. That caused delays in firmware quality assurance.

We studied the problem and initially deduced that the PCI bus was locking up intermittently. It would run fine for a while and then would just hang up. It appeared that it was going into some sort of deadlock situation where everything seemed to be up, but no work got done.

Making the Bugs Repeatable
Back to top

Since the problem was intermittent, the first step was to find a way to cause the problem, and cause it to be repeatable. After some poking around and collaboration between our software and hardware teams, we discovered a way to cause the problem to appear more frequently—if not quite in a reproducible and repeatable way.

Studying the problem intensely, we discovered that it would occur more frequently if certain paths in the system's software were exercised heavily. Specifically, running a software test cycle that exercised the PCI bus and the devices connected to it caused problems. Now it was a question of where and why.

Scrutinizing the Hardware
Back to top

The hardware consisted of a printed-circuit board (PCB) loaded with lots of custom components and ASICs. Our area of interest was the 32-bit 33 MHz PCI bus, which had five to seven devices connected to it. A large firmware base was driving it.

A typical 32-bit PCI bus requires 47 to 49 pins, depending on whether it's a Target or Master device. Our components all contained 49 lines, since all the devices were required to behave as Masters occasionally.

Of the 49 lines, 32 were multiplexed Address and Data lines. Two lines were used for error reporting, and one line was a parity bit for the Address/Data lines. The rest of the lines were control lines used to coordinate the use of the PCI bus by multiple devices. Since the problem we were facing was a lockup, our interest focused on the interaction of those control lines.

Enter the MSO
Back to top

To help get a handle on the problem, we decided to use an Agilent Model 54832D 600 MHz Deep-Memory MSO. It provided 16 digital timing channels and four analog channels.

By running basic Write and Readout tests, we noticed that the address lines of one of the PCB's devices would occasionally receive the wrong address. That is, the sequence returned was not always what was sent.

For example, an ABCDEF sequence of addresses sent to the device would be read as ABCFEF sporadically. As such, it made sense to look closely at the address phase of a PCI bus transaction. The MSO's state trigger handled this nicely.

To begin, we hooked up several control lines from the PCI bus. We hooked up FRAME#, IRDY#, TRDY#, DEVSEL#, GNT0, and CLK. We then set the oscilloscope to trigger in its advanced AND state/pattern mode.

As CLK provides the basic timing for the PCI bus, all the other lines we connected were sampled on the rising edge of CLK. CLK was used as the clock in the state trigger.

FRAME#, asserted when a transaction occurs, needs to be asserted (low) in our trigger since we weren't interested in non-transaction phases. IRDY# and TRDY# were asserted when both the initiator or Master and the Target were ready for data transfer.

Since we weren't interested in the data phases of the transaction, we wanted both IRDY# and TRDY# to be de-asserted (high). DEVSEL# indicated when the device decoded its address. Since we were interested in the address phase itself, it was set up to be de-asserted (high). This prevented triggering in the middle of a data phase where both the Master and the Target devices weren't ready.

GNT0 is an arbitration line used to grant devices the right to drive the bus. We toggled it from asserted (low) to de-asserted (high) so we could control whether or not we triggered when Device 1 was driving the bus.

Time for Infinite Persistence
Back to top

The address phase of a PCI bus starts on the CLK edge following FRAME# being asserted (going low). After looking around a while, it became apparent to us that the CLK signal itself might very well have a signal-integrity problem. So at this point we turned on the infinite-persistence feature on the oscilloscope so that we could see any issues with the CLK signal.

The address phases of all devices, other than Device 1, were shown. These are displayed as the lower eight digital traces in the figure here.

Figure 1

Figure 1 - Address Phases of All Devices, Except Device 1

Note that GNT0 in the above state trigger was de-asserted (high). Basically we were examining CLK integrity when Device 1 was quiescent. The markers were set to the Vin and Vout levels of the CLK. At this point, everything looked fine.

Triggering on the address phase of Device 1, however, revealed a problem with the clock pulse that preceded the address-phase clock. This is the clock that samples FRAME# when it's first asserted.

There It Is!
Back to top

As shown in the next figure, an anomaly can be clearly seen in the upper analog trace as it drops below the trigger level and the Vout marker.

Figure 2

Figure 2 - Anomaly in Address Phases

Now we had a viable suspect! We then added circuitry to enhance the coupling on boards that weren't failing to see if they would fail. They did.

The address Write and Readout tests were occasionally failing because we were violating setup and hold times. This was occurring because the anomalous CLK signal was double-clocking, causing the address to be read in sooner than expected. Basically, the address was being clocked when the abnormal dip in the CLK went high rather than on a normal edge.

By changing the circuitry to reduce the coupling between the activity on Device 1 with the CLK, we eliminated the intermittent lock-up problems—and proved that our MSO was an effective tool for looking at signal integrity on the PCI bus.

Looking at the same problem with a conventional oscilloscope would have required external circuitry to be built, or the use of a logic analyzer to cross-trigger from. But either approach would have made it difficult to look at the signals we were triggering on, as well as the signal we were checking for integrity problems.

Both approaches would've also required significantly more time to set up, and reducing the amount of time it took to look at this problem was essential to meeting our schedule requirements. As it stands, the problem described took several engineers several weeks to isolate.


 
Click here to get your listing up.

Copyright © 2003 ChipCenter-QuestLink
About ChipCenter-Questlink  Contact Us  Privacy Statement   Advertising Information  FAQ