ChipCenter Questlink
SEARCH CHIPCENTER
Search Type:
Search for:




Knowledge Centers
Product Reviews
Data Sheets
Guides & Experts
News
International
Ask Us
Circuit Cellar Online
App Notes
NetSeminars
Careers
Resources
FAQ
EE Times Network
Electronics Group Sites

EE Expert Bill Sprouse
Code Optimization

Click Here to Go to the Code-Optimization ArchiveClick Here to Go to the Main EE Expert Code-Optimization PageClick Here to Go to the EE Experts Main Page

Writing Efficient C Code for Microcontrollers Page 1 of 2
Part 5: Prime Real Estate
by Bill Sprouse

Many microcontrollers have multiple memory types that have unique access abilities and speed. They may have internal memory, which is almost as fast to access as internal registers. This memory can sometimes be operated upon directly with a variety of operations. Because of the speed, additional instructions, and small address requirements, these locations become prime real estate in the world of microcontrollers. By wisely choosing when and how to use this memory, you can reduce the code size and at the same time improve speed.

Take a careful look at the internal memory available and understand how it differs from other memory. There are several points you want to pay close attention to.

The first is how memory is addressed. Look at access to both RAM and ROM, paying attention to the instructions and the registers required for each. Look for shorter and faster instructions to access specific areas such as the zero-page in memory. For those not familiar with the term "zero-page," it refers to RAM, usually internal, which resides in the first 256 addresses. It is called zero-page because the upper byte of the address is always zero. If the processor has zero-page addressing modes, this page can often be used for both direct and indirect addressing.

The main advantage to zero-page addressing is that it can reference variables with a shorter and faster instruction. This is especially true in processors that use pointer registers to access 16-bit addresses. These registers can become a bottleneck in programming, forcing the code to load and reload values to them on a regular basis. In older processor designs such as the 8051 there is only one 16-bit address pointer called DPTR. Let's look at a fairly simple calculation loop.

     char i
    int offset;
    int x[200];
    int y[200];
    ...
    for (i = 200; i--; y[i] = x[i] + offset);
    ----------------------------------------------------
    1. Load DPTR with address of i and save 200 to i
    2. Load DPTR with address of i, Load i, Decrement i, 
    Test i, If zero exit loop
    3. Load DPTR with address of i, Load i, Convert to int, 
    Multiply by 2 and Save multiplied value in R6/R7 pair.
    4. Add x array base address to R6/R7 pair and Save result 
    in DPTR
    5. Retrieve x[i] using DPTR and Save to R6/R7 pair
    6. Load DPTR with address of offset and Add offset to 
    R6/R7 pair
    7. Load DPTR with address of i, Load i, Convert to int, 
    Multiply by 2 and Save multiplied value in R2/R3 pair.
    8. Add y array base address to R2/R3 pair and Save result 
    in DPTR
    9. Save summation in R6/R7 to y[i] using DPTR
    10. Loop back to Step 2

The compiler listing for this example was fairly lengthy, so I summarized what the compiler produced in the above list. Since all the variables were located to external RAM (default), you can see that DPTR is seeing some heavy use. An easy way to optimize this loop is by moving the smaller variables, namely i and offset, to internal RAM. Many compilers allow you to specify special RAM areas by using an added keyword. This, of course, makes the code slightly non-ANSI, but it's worth it. On this 8051 compiler the keyword to locate variables to the internal RAM is DATA. Look what happens when we declare i and offset to the DATA area.

     data char i;
    data int offset;
    int x[200];
    int y[200];
    ...
    for (i = 200; --i; y[i] = x[i] + offset);
    ----------------------------------------------------
    1. Save 200 to i
    2. Load i, Decrement i, Test i, If zero exit loop
    3. Load i, Convert to int, Multiply by 2 and Save 
    multiplied value in R6/R7 pair.
    4. Add x array base address to R6/R7 pair and Save 
    result in DPTR
    5. Retrieve x[i] using DPTR and Save to R6/R7 pair
    6. Add offset to R6/R7 pair
    7. Load i, Convert to int, Multiply by 2 and Save 
    multiplied value in R2/R3 pair.
    8. Add y array base address to R2/R3 pair and 
    Save result in DPTR
    9. Save summation in R6/R7 to y[i] using DPTR
    10. Loop back to Step 2

Now since i and offset were located locally, the DPTR reference is no longer necessary as these variables can be addressed directly. This eliminates 5 loads of DPTR, and the instructions to load from internal memory are faster. This makes the loop run in 3808 fewer machine cycles than the previous example. Remember that you have limited internal memory, 256 bytes or less in many processors, so use it for the variables that save you the most code and time. Since many of these processors implement the stack in internal memory, you have to be careful not to use so much memory as to cause stack overruns as well. If DATA memory were running tight in this example, I would move offset back to external memory as it is only referenced once inside the loop. This would still save 2208 clock cycles over the original example. Reusing counter variables like i in other loops is another way to stretch the internal RAM's usefulness.

Although the 8051 has some zero-page memory, it doesn't have near the addressing capability of some processors. The 6502 was one of the first processors to exploit the power of a complete set of zero-page addressing modes. There are several modern microcontrollers that learned from that early example. These include Mitsubishi's complete 740 CPU line. These processors offer extensive zero-page addressing modes. The main power of zero-page addressing comes through the use of indirection. This allows very efficient implementation of pointers. A 16-bit pointer can be referenced by just 8 bits in the instruction. This saves both code space and time.

Next >>

Code Optimization Archive

Guides and Experts   Analog Avenue   EDA Tools   PLD   DSP   EDA   Embedded Systems   Power   Test
Click here to get your listing up.

Copyright © 2003 ChipCenter-QuestLink
About ChipCenter-Questlink  Contact Us  Privacy Statement   Advertising Information  FAQ