ChipCenter Questlink
SEARCH CHIPCENTER
Search Type:
Search for:




Knowledge Centers
Product Reviews
Data Sheets
Guides & Experts
News
International
Ask Us
Circuit Cellar Online
App Notes
NetSeminars
Careers
Resources
FAQ
EE Times Network
Electronics Group Sites

Accelerating DTP with Reconfigurable Computing Engines

By Donald MacVicar, The University of Glasgow (donald@dcs.gla.ac.uk) and Satnam Singh, Xilinx Inc.(Satnam.Singh@xilinx.com)

Introduction
Imagine the office of a typical desk top publishing house: it will comprise of mid to high end PCs or Macintosh computers with large displays connected to a series of printers that can generate publication quality colour output at thousands of dots per inch (dpi). Rasterisation is the process of converting the high level page descriptions in PostScript into bitmaps that specify where to apply ink on the paper. This is a very compute intensive operation and is often performed by a separate computer or Raster Image Processor (RIP) system. The applications that graphic designers run include image processing packages like Adobe Photoshop that perform compute intensive operations such as convolution (e.g. Gaussian Blur) over huge true-colour images. Image processing and printing are often very slow.

Imagine taking an off-the-shelf FPGA PCI card and slotting it into these machines to accelerate typical desk top publishing functions. That vision is exactly what we are striving for in our projects at the University of Glasgow in conjunction with Xilinx Inc. This paper concentrates on how we are working to accelerate PostScript rendering with reconfigurable technology. Other projects have shown that it is possible to extend and accelerate the filtering operations of Adobe Photoshop using exactly the same hardware.

This paper introduces the desktop publishing market and describes the PostScript rendering technology that we are currently developing. This includes one example of the circuits we have developed to accelerate the rendering process. The circuits are dynamically swapped onto the FPGA as and when required. It also includes a software architecture that manages the FPGA. Rather than providing a stand alone rasterisation application, we show how this system can be used with the Windows operating system to systematically provide seamless support for hardware based PostScript rendering.

Desk Top Publishing
Desk Top Publishing (DTP) is a rapidly growing area of the computer applications market. The increased use of high quality colour graphics has put an even bigger demand on the software and hardware used for DTP. The manipulation of a colour image to prepare it for printing using Postscript requires a significant amount of com-pute power to perform the operation in a reasonable amount of time.

Adobe Systems Inc. have recently introduced two solutions for PostScript print-ing. Adobe PrintGear [2] is a custom processor designed specifically for executing the commands in the PostScript language and is aimed towards desktop printers. The second solution is called PostScript Extreme [1], which is a parallel PostScript RIP system. It consists of upto ten RIP engines each of which processes one page before-sending the rasterised page to the print engine. The first version of this system was built together with the IBM and costs arround $810,000 for printer and RIP system. This can produce 464 impressions per minute, on a RIP while printing basis, at 600dpi.

FPGA Technology
In the case of the FPGA PostScript project the Xilinx XC6200 FPGA is used to accel-erate the compute intensive areas of the PostScript rendering process. The FPGA is only utilised when there will be a clear advantage over using software. It is not prac-tical or possible to implement the entire PostScript rendering process on an FPGA therefore only the areas that can benefit from acceleration are concentrated on.

Since space is limited on the FPGA, we use the discipline of virtual hardware[4] to dynamically swap in circuits as they are required. Whenever possible, it helps to order rendering operations to avoid swapping, otherwise we might experience thrashing.

A brief overview of the rasteriser is given here a more complete description can be found in [8]. The system takes a PostScript print job as input and converts this into a PDF document which is then parsed to obtain the list of objects on each page. The bitmap for each page is then created using a combination of hardware and software. The final result is compressed before sending to the printer. Hardware is used to accel-erate the following functions: matrix multiplication, rasterisation of lines, curves, circles and fonts, anti-ailasing, colour correction and compression.PDF is used as it provides a static page independant description of each page in a document unlike Post-Script which allows variables to be dependant on later pages in a document.

A Case Study: Rendering Býzier Curves
Bezier curves are a parametric cubic curve and are used in both PostScript and PDF cf.,,, Curves are then defined in terms of control points which are substituted for a,b,c,d in the above equations.

The general technique for rasteristing curves is to approximate the curve with a number of straight line segments. After investigation it was decided that the best method to use was a recursive subdivision technique. Rather than performing complex straightness checks we use a fixed depth recursion. The distance between P1, P2, P3, P4 is used as a pessimistic estimate of the length of the curve. The distance P1, L4, P4 in Fig. 1 is used as an optimistic estimate of the length of the curve. The logarithm of each of the above lengths is found. The depth of recursion is then set too the average of the two logarithm values.

A circuit for dividing Býzier curves has been designed and built using only integer arithmetic but could be improved by using fixed point arithmetic. To perform the divi-sion of a curve two identical circuits are used one for the x components and one for the y components.

All the circuits were implemented in an hardware description language called Lava, which is a variant of the relational algebraic hardware description language Ruby [3][5]. A key feature of this language is that it provides circuits combinators that encode circuit behaviour and layout. This allows us to specify circuit topology without explicitly calculating the co-ordinate of each cell. This in turn allows us to generate circuits which are far more likely to route in a reasonable amount of time.

Platform
Our target system is the VCC HotWorks board which contains a Xilinx XC6216 FPGA with 2MB of SRAM and a PCI interface. This card plugs into a Windows NT host. The driver model in Windows NT is a multi-level one with different levels having different operating environments. An FPGA accelerated system could be implemented at utilised in different levels of the print driver. The optimum level for implementing the FPGA-PostScript is at the print spooler level. The print spooler simply takes the data from the application level sending it down too the device level driver using a buffer to handle multiple print jobs.

Many printers can interpret the PostScript internally but this can be a very slow process. The FPGA system performs as much of the processing as possible and sends a bitmap to the printer which requires no further processing.

Performance
Using a test document which contains 15,768 lines totalling 92,262 pixels at a resolu-tion of 72dpi the speed of the FPGA line drawing circuit was analysed. Using a simple implementation of Bresenham's line scan algorithm in C++ which simply renders the lines into a bitmap in memory it was measured to take approximately 1.73seconds to render the 15,768 lines at 72dpi. Assuming that the same image is to be rendered using the FPGA at a resolution of 1000dpi resulting in approximately 1,295,305 pixels must be rendered. The circuit can render at 5,000,000 pixels per second (using 16-bit arith-metic) thus takes 0.26s to render. The transfer of the data for each line will also affect the total rendering time. We shall use the measured data transfer rate of 0.7Mb/s for writing to registers on the XC6200. This results in a further 0.25s for data transfers giving a total time of 0.51s which is significantly faster than the software only version.

One of the severest limitations of our system is the very low performance of the PCI interface. Using one of the earlier VCC Hotworks boards, we have measured a transfer rate of just 0.7Mb/s, but the theoretical limit for PCI is 132Mb/s. In the future, we plan to investigate using Intel's Accelerated Graphics Port (AGP) system allowing us to rapidly transfer the image over this dedicated bus (up to 533Mb/s), leaving the PCI bus for control signals.

Accelerating Image Processing
PostScript rendering is just one of many desk top publishing functions that are suitable for hardware based acceleration. We have also developed plug-ins for Adobe Pho-toshop which use the same VCC Hotworks board to accelerate image processing oper-ations like colour space conversion and image convolution. Some of these filters operate at around 20 million pixels per second on the board, but due to the PCI inter-face give a poorer performance to the user. However, all the hardware filters that we developed still ran several times faster than their software only versions.

Summary
In summary, we report that we are getting closer to our vision of a desk top publishing studio exploiting dynamically reconfigurable technology for commercial advantage. We have designed and implemented some of the most important circuits required for PostScript rendering. We have developed the methodology of virtual hardware allow-ing us to swap in circuits as required into the FPGA. We are developing a run-time system to manage the FPGA board resources and to present a high level interface to the application layer. And finally, we have investigated where in the Windows 95 and NT architecture would be the best place to install the rendering system.

The main barriers at the moment include the unsuitability of the VCC Hotworks board for our applications. In the next stage of the project, we will investigate using a board with a superior PCI interface, or one that has an alternative channel for commu-nicating the image (e.g. AGP). We also need far more image memory on the card, which might require us to move to DRAM instead of continuing with SRAM based cards. The TSI-TelSys cards are a likely system for us to investigate. They would allow us to cache enough circuits on the board to accelerate swapping virtual circuits.

The authors acknowledge the assistance of Dr. John Patterson with Býzier curve rendering. This work is supported by a grant from the UK's EPSRC.

References
1. Adobe Systems. Adobe PostScript Extreme White Paper. Adobe 1997
2. Adobe Systems. Adobe PrintGear Technology Backgrounder. Adobe 1997
3. M. Sheeran, G. Jones. Circuit Design in Ruby. Formal Methods for VLSI De-sign, J. Stanstrup, North Holland, 1992.
4. Satnam Singh and Pierre Bellec. Virtual Hardware for Graphics Applications using FPGAs. FCCM'94. IEEE Computer Society, 1994.
5. Satnam Singh. Architectural Descriptions for FPGA Circuits. FCCM'95. IEEE Computer Society. 1995.
6. J.D. Foley, A. Van Dam. Computer Graphics: Principles and Practice. Addison Wesley. 1997.
7. Xilinx. XC6200 FPGA Family Data Sheet. Xilinx Inc. 1995.
8. S Singh, J. Patterson, J. Burns, M Dales. PostScript rendering using virtual hard-ware. FPL'97. Springer, 1997


Home | Product of the Week | Tech Note | AppReview | Vendor Tools | Feedback

Click here to get your listing up.

Copyright © 2003 ChipCenter-QuestLink
About ChipCenter-Questlink  Contact Us  Privacy Statement   Advertising Information  FAQ