|
||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
|
|
The Programmable Logic Proving Ground
PLDs are an Increasingly Useful Component of System Design Throughout a Product's Lifetime
By Tom Troksa, Networking Processor Architect, Packet Engines (part of Alcatel); Steve Dabell, Networking Processor Architect, Packet Engines (part of Alcatel); Martin S. Won, (mwon@altera.com) Member of Technical Staff, Altera Corporation
Introduction
System designers are faced with larger, more complex projects and less time to complete them. Completing a system design today is not simply seeing a single vision through to completion; it is an event representing the successful convergence of several changing technologies. Programmable logic devices (PLDs) have emerged as useful tools in dealing with these issues. PLDs are familiar to many designers as interface or "glue" logic, but the rapid rise in PLD size and features now makes it desirable to implement large subsystems within a single chip.
The decision to use PLDs is accompanied by other choices, such as which device to use, how to integrate it into existing design flows, and where to include it in your development cycle. This article will focus on these aspects of using PLDs using the example of a recent network processing device (called Argus) developed by Gigabit Ethernet provider Packet Engines. Argus is a core device in Packet Engines' PowerRail 5200 enterprise routing switch. The PowerRail 5200 switch (shown in figure 1) is a wire-speed router designed for the core of enterprise networks. These devices, which perform routing functions in custom ICs, are replacing traditional routers.
Figure 1: Packet Engines PowerRail 5200 Gigabit Ethernet Routing Switch
A brief description of Argus will assist in understanding the issues covered in this article: Argus is a PowerPC-compliant system controller for networking products. It was architected to provide massive amounts of system bandwidth between the PowerPC processor and the gigabit ethernet switch fabric. In a stand-alone configuration, Argus provides a PowerPC 603, 604, 740 and 750 compliant data transfer engine which is capable of supporting 6 Gbit/sec memory bandwidth, a 2 Gbit/sec DMA receive channel, a 2 Gbit/sec DMA transmit channel, an industry compliant I2C interface, and a 32-bit local bus. When coupled to a local bus controller, it can control an entire computer system including flash memory, PCMCIA cards, a MODEM interface and an RS-232 serial monitor port. Argus integrates two independent synchronous DRAM controllers, a receive DMA channel, a transmit DMA channel, a 32-bit local bus, an I2C controller, an interrupt controller, and a system configuration controller. An internal multi-master/ multi-slave parallel bus structure provides simultaneous connection between six independent execution units with compliance for up to seven concurrent transactions. Figure 2 shows a block diagram of Argus.
Figure 2: Argus Block Diagram--Click to view full size.
Although Packet Engines' decision to start with its big switch required more complex development initially, we believed it would be more practical to scale down its architecture than scale it up. The PowerRail 5200 routing switch is the industry's first wire-speed routing switch of such a scale. Therefore, as development of the PowerRail 5200 routing switch proceeded, it was necessary to prove its capabilities to early adopters. Packet Engines decided to use a PLD as the initial technology of the Argus device to gain the confidence of the market and initiate the sales cycle ahead of the full production release of Argus.
When to Use Programmable Logic
In the case of Argus, use of PLDs made the most sense during the prototyping and initial manufacturing stages, with a transition to a gate array for full-scale production. The decision to use a gate array was based on the lower cost of a custom device in the density range (roughly 100K gates) that could accommodate Argus and its performance requirements. For lower density designs that achieve their speed goals in PLDs, it makes sense to perform a cost analysis to determine if a custom device is necessary. At today's PLD volume prices, many designs in the 50K-gate range could warrant full-scale production with PLDs alone.
Which Device to Use?
In some cases, specific device features like on-board memory will further steer the decision. Many high-density PLDs offer ways to implement on-board memory; the two prevalent schemes are embedded (in which the device includes dedicated memory structures) and distributed (in which logic resources are converted into memory resources). Each of these implementations has advantages and disadvantages. For Argus, we required two-256 x 64 memories (single-port RAMs) to interface between the DMA engines internal to Argus and our external switch fabric at 41 MHz. For memories of this size and speed, distributed memory was too slow and resource-inefficient, so we favored embedded memory.
The next consideration is pin count. There are no hard and fast rules for choosing the right pin count; some designers prefer a buffer of anywhere from 5%-10% extra I/O pins to address changes and modifications. The Argus design required 450 I/O pins, and we estimated that the design logic would take roughly 100K gate-array gates along with 4Kbytes of internal single-port memory. Among the embedded memory PLDs, the EPF10K130 in Altera's FLEX 10K family seemed like a good fit. The EPF10K130 offers up to 130,000 usable gates, up to 32K RAM bits, and 470 I/O pins. We were also attracted to FLEX 10K's pin-compatibility between different family members. Several PLDs offer this capability, which is also useful in planning future versions. Our plans to upgrade Argus required more logic and memory resources while using the same I/O. We planned to use an EPF10K250, with nearly twice the logic and memory resources as the EPF10K130 in the same pinout and packages.
Integration Into Your Design Flow
The block diagram in Figure 2 shows our design flow. It begins with capturing the hierarchical design using an HDL (in our case, Verilog). The design synthesis stage follows (we use Design Compiler, FPGA Compiler, and FPGA Express from Synopsys), which yields a gate-level representation of the design. After this stage begins the PLD place-and-route phase, which is similar to typical gate-array development with some minor differences. PLDs (especially the high-density PLDs we used) require place-and-route tools that are provided by the PLD vendor. These tools can typically be used either alone to develop PLD designs or together with gate-array design tools.
Figure 3--Click to view full size.
In our case, the Synopsys tool is directed to produce a netlist for processing by the PLD tools. During synthesis, we created of a set of scripts with a specific script associated with each Verilog file. A bottom-up strategy produces gate representations of each synthesizable leaf-level module. The gate-level output files produced by the leaf-level synthesis scripts are stored in a common directory. In the bottom-up strategy, the gate-level design files are connected (using scripts within the Synopsys environment) at higher levels of the design hierarchy until a top level design file (representing the entire design structure as a hierachical gate-level netlist) is created. This design file is output from the Synopsys environment as an EDIF hierarchical netlist, which is then provided to the PLD place-and route-tools.
In our flow, the place-and-route tool is Altera's MAX+PLUS II. Hierarchical EDIF is useful because it allows the designer to manage the timing / area requirements with constraints assigned to any module within the hierarchy and at any hierarchical level. MAX+PLUS II provides a design hierarchy viewer/editor which serves this purpose well. Logic assignments that carry over to MAX+PLUS II can also be made within the Synopsys tool environment (in the case of Design Compiler, it requires entering commands at the dc_shell prompt).
Generally, the PLD vendor's tools provide more accurate post-route timing information than can the pre-route estimate from the logic synthesis tool. Although this post-route information can be imported into the Synopsys static timing tool for static timing analysis, we used the MAX+PLUS II static timing analyzer. To check the functionality of the design, we exported a Verilog file of the compiled design from MAX+PLUS II into Cadence's gate-level simulator, Verilog-XL.
Board Layout and Hardware/Software Codesign
An incremental release strategy for the PLD design allowed for early hardware/software integration using subsets of the final design. We planned four prototyping releases of Argus, the fourth being the first production release. The first release, which took five weeks from specification to completion, contained the PowerPC interface, memory controllers, and the bus fabric. This release allowed our software developers to get to work with their PowerPC emulator and test routines for transactions between the PowerPC and its DRAM.
In the second release (one week later), we added the I/O bus, which gave Argus access to data sources in the system (ie, UART, Flash memory, and a PCMCIA port). For this release, the software team developed code for the PowerPC to talk to the data sources (for example, booting off the Flash or the PCMCIA port and loading the corresponding instruction sets into the DRAM). Since the PowerPC-to-DRAM routines had already been established, the software team could focus on dealing with the new data sources. The next version came two more weeks later; it included the DMA engine and the single-port RAMs for communicating between the external switch fabric and the PowerPC. The fourth release followed two weeks thereafter, in which we made minor enhancements and began intense software testing.
Following the release of the PLD design, we started the gate array retargetting. The difference between the PLD design and the gate-array design was the structure of the DMA engines single-port memory. To ease this process, the hierarchical design isolated the interface to the memory. The resulting design changeover was smooth, although the timing of this portion of the design was scrutinized during Verilog simulation and testing of the gate array design.
In-System Testing and Initial Production
Regarding PLD compilation times: as with gate arrays, compilation times vary with size and complexity, and are a factor in design cycle efficiency. With Argus, early releases took about 20 minutes (using MAX+PLUS II on a Sun UltraSPARC 2-based workstation) to compile. Final releases required up to six hours. The final PLD version of Argus occupied 82% of the logic and all of the memory resources of the device. Regarding gate counts, Altera states that the EPF10K130 provides 82K to 211K gates, depending on how the logic and memory are used. For comparison, the gate array version of Argus required 95K "gate-array" gates and two 2Kbyte single-port RAMs. So, by the measure of Argus, the logic cells in an EPF10K130 provided ~115K gates, and the EABs provided 4K memory bytes.
Conclusions
With today's prices and product cycles, it makes sense for many 50K-gate designs with production runs in the thousands or less to consider using PLDs throughout the product's lifetime. In the near future, the size of design that can remain with PLDs will quickly grow to the 100K-gate range and beyond.
Home | Product of the Week | Tech Note | AppReview | Vendor Tools | Feedback
|
|||||||||||||||||||||||||||||||||
|
Copyright © 2003 ChipCenter-QuestLink About ChipCenter-Questlink |
||||||||||||||||||||||||||||||||||