|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Single-Mask Programmable Cores for Platform based SoC by Laurence H. Cooke, Consultant
Filling in the space between Standard Cell and FPGA As we move to multi-million gate chips, it has become necessary to adopt design reuse strategies for these new SoCs. This necessity stems both from the cost of logic redesign and re-verification, as well as from the increasing expense of dealing with the physical implementation artifactslike signal integrity or clock distributionof deep submicron processes. At the same time, increasing the number of metal layers eliminated traditional gate array technology as a viable option, and shrinking mask lithography has increased the NRE costs above $500K per prototype. At the other end, the FPGAs with no NRE and zero prototype turnaround time have become larger as the process geometry shrinks. They began to include busses, special IO, memory blocks and processors in addition to the FPGA logic, yet, their density and performance are far behind those of Standard Cell. The demise of the gate array market came at a most unfortunate time for the industry. Gate arrays disappeared because they could not deliver value, when processes require ten or fifteen metal masks. Yet such a low-cost high-performance product is desperately needed in a marketplace where the cost of going to silicon is heading upward of $1M. A technology, which employs a single metal mask programmable interconnects, is a viable alternative that can deliver an efficient solution. The interconnect programming provides a low NRE option for configuration, with performance closer to Standard Cell, and turn around time closer to FPGA. The Single-Mask Programming (SMP) technology can be used for stand-alone ASIC products as well as for IP cores designed to be implemented within an SoC platform-based design. Such a solution is described here as an option to answer today's design challenges. The Platform Solution Even if we can use a new SMP (single-mask programming) solution to implement an embedded design, designing millions of gates is unacceptable from a cost and time to market perspective. Platform design, as defined in "Surviving the SoC Revolution"[1], is one such solution to the cost and time to market. A Kernel is defined as a customizable hard core, which contains the common IP used for all the derivative designs in a specific marketplace. This Kernel consists of the API, RTOS, processor(s), the bus, memory and critical common IO functions. These are optimized for performance, size or power as required by the market segment the platform is designed for. Subsequent derivatives are created by combining soft hardware and software IP and integrating them, along with the Kernel, into an SoC chip. A derivative can be created quickly because the engineering activity is limited to IP selection, integration and verification, rather than designing from scratch. The solution is efficient because the critical timing of the bus and processor have already been solved. Creating a Platform SoC, which consists of a Kernel and the SMP fabric that can be customized for each derivative, would further reduce the NRE and manufacturing turn around time, over their proposed Standard Cell implementation. The SMP fabric There exist a number of SMP architectures in the marketplace. All of them have a pre-existing structure of wire segments, connected by jumpers patterned on the customized metal layers. Those wire segments connect small structures of transistors or simple gates into custom user logic. The technology worked well with 2 or 3 layers of interconnect, but becomes via constrained as technology moves towards 6 to 8 metal layers contacts to these small device and gate structures need to traverse all the way to upper metal layers, creating large vertical blockages and congestion on the custom layers. On the other hand, FPGA designers long recognized that Rent's rule suggested that larger granularity cells would require fewer interconnects to wire them together. That is the reason most commercial FPGAs have coarse cell structures compared to gate arrays or standard cells. Fewer interconnects imply fewer jumpers to connect them, and coarse-grain cells also require proportionally less jumper customization at top metal layers. Consequently the silicon area can be more efficiently used. Based on these observations a novel fabric has been designed, that combines the advantages of large FPGA-like SRAM-programmable logic cellseCellsconnected by segmented pure metal routing, using SMP. The proposed structure has FPGA-like programmability with density and performance closer to standard cells due to its metal interconnection. The eCell The eCell consists of two 3 input Look-up Tables (LUTs), connected to a Flip-Flop through a mux. One input of each LUT is driven by a two-input NAND gate. It also includes two different strength inverters, which can be separately connected to any signal to re-drive it. The LUTs can perform any 3 input function, with the NAND providing a LUT-4 subset. The complete cell, equivalent to about 12 logic gates, needs only 3 metal jumpers for configuration. The rest of the area can be dedicated to Single Mask Programmable routing. With conventional metal, this is a two customized mask process, one via and one wire mask. With the more recent mechanical planarization and copper metal process, the vias and metal are patterned from one mask. A standard via mask is used in conjunction with the customized metal mask. Vias are only formed where both masks exist. If a line is designed to pass over a via site the via is subtracted off of the metal mask. In this way the etching process does not complete the via, but etches enough to make the groove for the metal line. The deposition of the metal is over the whole structure and mechanically polished off the high (non patterned) areas. The Fabric The basic cell is tiled in a 16 x16 array (eUnit), with no routing channels between the cellsall routing happens over the cell. Eight such units make up the basic Configurable Embedded Corethe eCore. Each eCore has its own built-in scan string. The scan string is a simple, single-connection through all flip-flops in the eCore, based on a mux scan structure so the system clock is used during scan mode to scan the data into and out of the eCore. Each eCore has a low skew, low power clock grid. The clock line is buffered and both the clock and its inversion are distributed as an evenly loaded clock grid. Such a grid is known to have far less skew than a balanced tree structure, but it usually takes much more power. To compensate for this the grid is connected between each + and clock driver with a shunting N channel transistor, which is enabled for a short time during clock transition to enable charge sharing. This minimizes the noise and power consumption of the clock structure, while keeping the skew to a minimum. Finally, some of the eCells can be configured as dual-port SRAM, providing the small-distributed RAM blocks embedded within the logic. Each core can be viewed as a black box hard IP to be connected at the top level. Specific signals can easily be assigned connection locations for subsequent wiring when configuring the core. The actual number of I/O connections varies depending on the number of interconnect layers available and is in the thousands. The interconnect is a series of pre-fabricated segments which run significant distances in the lower layers in predominantly perpendicular directions between pairs of layers. The underlying cells, clocks and scan take up the first three or four wiring layers. Typically four layers would be used for the SMP routing, split between short and long segments. Jumpers are used to connect the short and long segments, or to change routing direction. The longer segments periodically shift over one tract and rotate to the other side of the channel, insuring the aggressor nets do not travel next to victim nets for very long before being rotated away. This technique reduces the need to reroute to avoid cross talk signal integrity. By fine-tuning the devices inside eCell coupled with selectable output drive, the resulting power and performance numbers are much closer to standard cells than FPGA. Table 1 describes how such SMP process compares to other currently available technologies.
Table 1. A comparison of FPGAs, Single-Mask Programming (SMP) and Standard Cell Next we will explore the use of such SMP fabric within a Platform Design to accelerate the design turnaround time even further. A Skinny Platform Example A platform that addresses a specific market segment should have a specific architecture, but by way of example this platform will be organized to be as general as possible We will call this a "skinny" platforma processor, an RTOS, a system bus, some internal memory and some external interface. We will assume here both a memory controller and a USB. Such a skinny platform could have the following characteristics:
In general, all the variations that may occur in a derivative design should be targeted to be outside the hardened part of the Kernel. This should include the interrupt controller, any timers or counters, the protocol for the memory controller, the USB stack, and all interface logic for the non-dedicated bus ports and the address space on all the bus ports. The bus arbiter should probably be hardened since it is time critical. The buffers for each of the bus ports must be included in the bus design to insure proper timing. The physical structure of the bus is determined by the structure of the SMP logic and wiring. The memory cores should be hardened for reasons of performance and density, but they should be able to be partitioned into a number of smaller configurations. The user could still make smaller memories if necessary within the user logic area, without tuned timing. The bus could be broken into multiple smaller busses. In that case, additional bus address lines should be added to allow for adequate addressing, and the bus interfaces, such as the processor's cache interface, must be configurable. Configuration needs to be accomplished with dedicated logic and tie off straps, which can be programmed via the user logic portion of the SMP fabric. A platform as above, with a Kernel and 1 million additional gates of user defined SMP logic, should fit in an 8 mm square die in 0.18 micron technology. This size die is highly manufacturable and would be able to support over 240 signal pins. ![]() Figure 1: Skinny Platform SoC Floorplan Business Issues The example skinny platform has the advantage of using an existing SMP fabric chip, thereby minimizing the platform design costs. Furthermore the user obtains the entire fixed platform IP with a single purchase and therefore avoids the hassle of costly and lengthy IP acquisition. In addition, the whole Derivative Design Methodology for implementing user logic on the skinny platform can be put on-line, such that the tools, models and methodology can be accessible on a per use basis from the platform provider. Both Cadence and Synopsys have announced plans to provide some similar, on-line access to tools. Royalty for the fixed portion of the Skinny Platform can be assessed by the semiconductor company, using identification data that is added to the mask data being transferred to them. Furthermore, per-design royalty for soft IP could be assessed and tracked automatically by the design infrastructure, and per-chip royalty for soft IP can also be tracked by the semiconductor vendor, adding it's ids to the single layer mask data. This feature allows the royalty costs to be added to the overall chip cost in a transparent fashion to the customer, while providing the customer full and accurate forecasting of their chip prices. In summary, a skinny platform using Single-Mask Programmable cores, processors, together with design tools and methodologies within a web enabled environment will provide the next generation of intermediate volume product designers with an effective solution for a quick time to market, and low NRE cost, for their SoC design needs. Reference (1) Henry Chang,etal, Surviving the SoC Revolution: a Guide to Platform Based Design Kluwer,1999
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Copyright © 2003 ChipCenter-QuestLink About ChipCenter-Questlink |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||