ChipCenter Questlink
SEARCH CHIPCENTER
Search Type:
Search for:




Knowledge Centers
Product Reviews
Data Sheets
Guides & Experts
News
International
Ask Us
Circuit Cellar Online
App Notes
NetSeminars
Careers
Resources
FAQ
EE Times Network
Electronics Group Sites

EE Expert David Gilbert
Code Optimization

Click Here to Go to the Code-Optimization ArchiveClick Here to Go to the Main EE Expert Code-Optimization PageClick Here to Go to the EE Experts Main Page

Code Optimization for Parallel Processing Page 1 of 2
Part 3: Writing Programs for Distributed-Memory Parallel Machines
by David Gilbert

Introduction

The effort to create a standardized means of writing programs for distributed-memory parallel machines began in earnest in 1992. This effort to standardize came from a desire to take parallel computing outside the bounds of government and academia, and achieved success in 1994 with the Message-Passing Interface standard, version 1.0.

Since then, there have been minor revisions to the MPI standard, which reached version 1.2 in 1997. Beyond this, there has been quite a bit of functionality added to the original standard, and this is known as MPI 2.0. In addition, several proprietary "optimized" MPI implementations are in existence for platform-specific applications, and there is now a substantial effort to create a Java MPI. Let's take a look at the MPI standard and what function it serves in parallel programming, as well as some general ideas on how to optimize the use of it.

What Exactly Is MPI?

The MPI standard is essentially a specification, and this is why so many implementations of it can co-exist. The documentation that describes the Message-Passing Interface is left somewhat vague in places, intentionally, so that innovation can be accomplished while still adhering to the procedures that make MPI work. For example, MPI establishes a means of inter-process communication, but MPI does not address a precise definition of just what a "process" is.

This introduces some interesting possibilities, such as the fact that data sets can be passed within the same program, or maybe the same data set can be passed between multiple programs. There is also nothing to prevent multiple data sets from being passed between more than one program. As long as we have some means of connecting these processes, we can use message passing within our program.

It does get tricky in some places, however. Message passing cannot be used to directly access memory that belongs to another process, and so this makes MPI ideally suited to programs written for distributed supercomputers. The MPI standard defines a library of subroutines that can be used to generate inter-process communication, and the programming model is made clear through this standardized library.

The Message-Passing Interface gets its name from the method used to achieve inter-process communication—data are sent and received between processes. Protocols within this message-passing scheme are not defined, and there is no rule that forces different implementations of MPI to work together. This loose drafting of the standard enables MPI to be put to use in a variety of environments with heterogeneous platforms, which is a key to its success in standardizing the creation of programs for parallel-processing machines.

Ups and Downs

In an effort to maintain portability across platforms, there are some aspects of MPI that are not well-defined. Areas such as buffering, I/O, and process management lack firm guidelines, and therefore can be a source of differences among the implementations of MPI. Perhaps with the exception of a small handful of commands, it should not be necessary to rewrite the source code of an MPI routine when transferring it from one platform to another.

As an example, MPI does not require buffers of any particular size for messages, and therefore message-passing routines should not depend on such buffering for proper functionality. This is crucial when these routines are expected to be portable from one platform to another, and especially from one implementation of MPI to another.

With regard to I/O routines, more importance should be placed on standardizing the output than the input. This is because the output must be parsed by humans, whereas the input is read by the computer. A programmer looking at output for the purpose of debugging should have a standardized, easy-to-read format that is useful.

The availability of programming tools for parallel applications that use the Message-Passing Interface is a concern, especially when attempting to optimize for performance. The programmer will need development tools that can profile, debug, and trace processes that use MPI within the program.

It will be necessary for the programmer to be familiar with the hardware platform to some degree, since nearly all vendor that "field" a parallel system have their own proprietary implementation of MPI code. This is where the ease of portability becomes a disadvantage, since sometimes there are no platform-specific tools available. Programmers wishing to optimize their MPI routines may be faced with integrating several third-party tools to do the job.

Optimization Concerns

Typically, the goals of portability and high performance have been mutually exclusive, and difficult to achieve anyway. One case where this conflict can be illustrated with MPI is its support for "noncontiguous" data types. This gives the message-passing routine a means of gathering data that are not organized contiguously in memory, and passing them to a process within a single message. However, transfers of this type are slower than their "contiguous" counterparts.

It should also be noted that data transfers not properly aligned on word boundaries are more expensive in terms of bandwidth than those that are aligned. These factors combine to illustrate how some performance can be gained or lost by the efficiency of the message-passing routines. It should be noted that reliable performance baselines are difficult to achieve with MPI because the execution time can vary with each execution, even with the same program on the same computer. This is one of the pitfalls of parallel programming, and a challenge to the programmer who seeks to optimize the usage of MPI routines.

Next >>

Code Optimization Archive

Guides and Experts   Analog Avenue   EDA Tools   PLD   DSP   EDA   Embedded Systems   Power   Test

Click here to get your listing up.

Copyright © 2003 ChipCenter-QuestLink
About ChipCenter-Questlink  Contact Us  Privacy Statement   Advertising Information  FAQ