ChipCenter Questlink
SEARCH CHIPCENTER
Search Type:
Search for:




Knowledge Centers
Product Reviews
Data Sheets
Guides & Experts
News
International
Ask Us
Circuit Cellar Online
App Notes
NetSeminars
Careers
Resources
FAQ
EE Times Network
Electronics Group Sites

EE Expert David Gilbert
Code Optimization

Click Here to Go to the Code-Optimization ArchiveClick Here to Go to the Main EE Expert Code-Optimization PageClick Here to Go to the EE Experts Main Page

3D Graphics Transforms On The x86 Platform Page 1 of 3
by David Gilbert

Introduction

I've mentioned before that the first step in code optimization is choosing the right tool for the job. Many times, though, a programmer doesn't have that luxury. Wouldn't it be nice if we could program all of our graphics-intensive apps with the Sun V9 ISA and VIS extensions? Obviously, the choice of processor architecture made at the outset can be a compromise itself, and this is why it is up to the programmer to optimize his code within a given application.

Let's take a look at a common set of math operations that are used in video applications—the transform function. How could we optimize code for such calculations using the Intel platform? Should we use MMX technology, or SSE, or SSE2? Is one ISA extension any better than the others at this type of thing?

The Transform Operation

First, to put everything in context, let's briefly examine the operations we will study here. In the field of 3D graphic animation, transformation refers to the process of creating a new set of coordinate points as an object moves through space, using the previous coordinates and other parameters as input.

Within the transformation process itself it is often necessary to translate, scale, rotate, or change perspective of a coordinate set, and sometimes all of these procedures must be done simultaneously. The transformation process is based on a set of matrices. The input matrix contains each set of coordinates and a scaling factor for perspective correction. Therefore, when working in three dimensions, each matrix has 4 sets of values. There are also individual matrices (the transform matrices) for performing translation, scaling, rotation, etc. To effect the operation, the matrices are multiplied together, resulting in a 4 × 4 matrix vector operation. Note that it is possible to re-use the transform matrices if their values still apply to the current operation.

Keep in mind that within these operands it is possible to introduce other mathematical operations, such as sine and cosine during a rotation. Multiplication of inverse values (division) is also used for the scaling factor of the coordinates, and during a change of perspective. Final results are achieved through repetitive additions.

Our Options

The MMX instructions allow us to perform "packed integer" operations by reading the contents of a single register as multiple operands. The main disadvantage to this is that we may not be able to use the processor's floating-point unit simultaneously since the register file is shared between the MMX functional unit and the floating-point unit. Therefore careful programming is in order.

Intel's Streaming SIMD Extensions (SSE), introduced with the Pentium III processor, enable the programmer to perform single-precision (32-bit) floating-point vector operations with the operands contained in a single instruction. This works by reading the contents of a single 64-bit register as two 32-bit registers, therefore making more efficient use of registers and issuing fewer instructions to do the same job. The advantage here is that resource contention has been eliminated by allowing SSE instructions their own register file.

SSE2 instructions also use their own register file, which has been expanded to 128 bits, for "packed floating-point" operations in double precision. These extensions are included with the Pentium 4 processor.

Our Mission, Should We Choose To Accept It

In developing and optimizing software, we have a couple of important issues to take into account. The first is the "installed base" of the instruction set we choose to work with, and the second is the amount of information available to help us with optimization for that platform. In order to take advantage of broader product availability, and the availability of more detailed information for use in optimization, we might elect to stay a "notch" or two behind "leading edge," so to speak. To create highly effective software that will run on some P5 chips, as well as all chips of the P6 architecture, we might want to target our code optimization to achieve leverage with the MMX extensions. This topic will be our focus here. For general information on optimizing for the Intel P6 Microarchitecture, please refer to my previous article, Understanding the Intel P6 Microarchitecture.

The Challenges

The decision to use MMX Technology for 3D graphics applications is not one that should be taken lightly. There are several issues intrinsic to this method that present themselves during the course of programming, especially when hand-optimizing at the assembly language level. MMX instructions by nature limit us to a precision of 16 bits and the use of integer math, although this may not be enough of a problem to deter us. What is the best we can do with these limitations? According to Intel, "angles can be resolved to about 1/1000°, and screen coordinates to about 1/50 pixel." This is quite adequate for most 3D graphics applications.

Another issue to consider is that popular off-the-shelf software applications, such as CAD, produce transform data in floating-point format. This means that we will need to convert this data to integer math for MMX processing. In order to maintain the best possible accuracy, the conversion process should be performed as few times as possible. One reason for this is that many significant digits are "shifted out" during the conversion, leaving us with less precise data. Unfortunately, the conversion must take place every time the transform matrix is re-calculated. This particular issue would make a good case for the use of SSE or SSE2 instructions, which were made available with the introduction of the Pentium III and Pentium 4 processors, respectively.

  Next >>

Code Optimization Archive

Guides and Experts   Analog Avenue   EDA Tools   PLD   DSP   EDA   Embedded Systems   Power   Test
Click here to get your listing up.

Copyright © 2003 ChipCenter-QuestLink
About ChipCenter-Questlink  Contact Us  Privacy Statement   Advertising Information  FAQ