by Eric Verhulst, Chairman and Marketing Director, Eonic Solutions GmbH
Dear Mr. Mendelsohn,
I think you are hitting it quite right on the SoC/ASIC front with your year-end observations. I would even go furtherASICs/SoCs are hitting the wall. Actually, any large chips (with millions of gates) that basically implement random logic are going to hit the wall.
Quite a number of ASIC/SoC designs are available already. Here are two real-world examples.
- ADI's TigerSharc DSPAlthough ADI finally has production silicon, the first chips were actually promised for 1998 (if my memory is right)! ADI had to go through three silicon versions and redesigns to get there. The current production chips are still labeled 0.2, and there is a bug list with 50 items. Note that the 21160 Hammerhead Sharc had a similar story. The bugs were never fixed.
- Tundra Chip Sets, e.g., the PowerPro and the Psi320While the first chip is a PowerPC memory controller, the second one is a PCI-PCI bridge. Both were announced. While the first PowerPro more or less worked, it had several serious bugs (e.g., one could only use one out of the four DMA engines at a time). When the Rev. 2 silicon was released, we came to the conclusion that the first silicon really was a prototype and should not even have enjoyed the status of "engineering sample." Rev. 2 silicon still has serious problems (e.g., some DMA operations only reach about 40% of the speed that should be possible). In practice, the chip's bad performance really means that it is not useable for any design that needs to reach the top performance levels.
The fate of the Psi320 was even worse. While the chip was announced, it was canceled on short notice after the first chips proved to have too many anomalies.
There are many more examples like this, but the underlying drama is four-fold.
- Too much "management" by marketing and financial people. This results in early "vaporware" announcements. We are all aware of cases where announcements were made even before the company had started the design. I think business ethics should dictate either that no products are announced before they have actually evolved into working silicon, or the manufacturer should clearly indicate the status of their developments so as to actively avoid the possibility that customers start designing with chips whose status is still uncertain. ADI did the latter correctly, Tundra didn't.
- Technology limits. While semiconductor people dream about 1 billion transistors on a chip, which will very soon be technically feasible, this will not necessarily result in the same kind of chips we see today. The reasons are escalating nonrecurring engineering expense costs ($1 million for each mask, increasing about 90%/year for each new silicon technology), very expensive test and verification cycles, and plain mathematics. In random logic, the probability for an error (hence a bug) increases with complexity. I will not speculate on whether the increase is linear or not, and I also agree that more upfront verification can reduce this probability (expressed as ppm or %), but it's almost certain that the absolute number of remaining errors/bugs will increase. After all, the design process is still a human activity. Notwithstanding increasingly better tools, the aspect of "craft" remains an essential ingredient in engineering activities. Hence, only the big ones can afford to pay for new masks until
the absolute level of bugs (or workarounds) becomes manageable.
The solution is to be found in plain common sensereduce the complexity.
How? Make chips with regular structures. The same for logic. FPGAs (and their variants) are leading the way. The goal is higher density and more functionality. As feature sizes shrink, increase regularity and flexibility by allowing "reprogramming." This means "separate the concerns" (correctly working gates and correctly working function, the latter is a at a higher level of abstraction). Of course, this requires a bit more silicon "real estate" and power, but at least it can lead to working chips. Note that this approach already worked for memory chips. A side effect is higher yield, hence lower costs and redundancy built in. Exploiting this concept, it has been proven that systems designs with standard components can even become more reliable (better MTBF) than systems using rad-hard silicon in harsh environments like space.
- "Divide and conquer," e.g., by getting away from global clocks and shared buses. Complex SoCs can be designed with many IP blocks (assumed to be trusted) if their interaction is through pure message passing, i.e., asynchronously over a "LINK." This would be another good use for the excess silicon resources new silicon technology is generating faster than the engineers can use.
- And as to marketingstop the vaporware. Chips especially are at the source of a whole value chain down the road, and this generates liabilities. I am convinced that part of the problems of the high-tech industry during the last two years are related to this vaporware practice. Financial managers want to boost their share price and hence create expectations. Marketing executes. In the meantime, empty value is created, and some time later the bubble of expectations bursts.
Best regards,
Eric Verhulst
Back to Main Page