Book Lists Book ListsC++Parallel Programming in C with MPI and OpenMP
Chemical Eng.
Electrical Eng.
Flight Simulation
Numerical Methods
Parallel computing is becoming accessible, either as a commercial parallel computer, or through the construction of a cluster using a few PCs, a network and Linux. This book is an introduction to parallel programming, it's a university textbook written in an informal style and so pleasant to read.

While short example programmes are presented for the MPI or OpenMP commands that are introduced, only pseudo code is used for algorithms utilising them, for example, the algorithms for Gaussian Elimination and the six Monte Carlo case studies... and the algorithms presented do not indicate where the parallelling code should be added, so the book is not a guidebook/reference work for a practising programmer

Additionally there are two major problems for readers outside of university. Following on from the above, while the book has a few examples per topic it has a large number of exercises without answers. The second problem is that the book assumes you have a working parallel system, both for the cluster and multi-CPU/core based systems. There is no information on setting up the required hardware or software. It assumes the system has been correctly set up to run the programmes developed.
Brief Description of Contents

1. & 2.Explains the history of parallel programming and gives a high-level overview of parallel architectures.
3.Presents Foster's parallel algorithm design methodology and shows how it is used.
4.Using C code-lets, chapters 4. to 9. progressively introduce the 27 MPI functions which should enable you to implement a wide variety of applications. The chapter begins the introduction of the Message Passing Interface (MPI), which is the most popular message passing specification supporting parallel programming, as means to implement parallel programmes, rather than using a custom high-level language.
5.Introduces The Sieve of Eratosthenes as a tool to teach parallel programming. The chapter explains the sequential algorithm; how to use domain decomposition methodology to come up with a data parallel algorithm; covers the pros and cons of several schemes to allocate contiguous blocks of array elements to tasks. Benchmarks and considers ways to improve performance. An MPI programme for Sieve of Eratosthenes in C is provided.
6.This chapter covers the design, analysis, coding and benchmarking of a parallel version of Floyd's algorithm. It begins the development of a suite of functions that can read matrices from files and distribute them among MPI processes, and to gather matrix elements from MPI processes and print them, as well as discussing how to avoid deadlock.
7.Covers the prediction of the performance of a parallel algorithm so you decide whether it is worth implementing it. Begins by deriving a general formula for the speedup achievable by a parallel algorithm, then looks at well-known performance prediction formulae; Amdahl's Law, Gustafson-Barsis's Law, the Karp-Flatt metric and the isoefficiency.
8.Starting with the pseudo code for a sequential matrix vector algorithm, this chapter covers the design, analysis, implementation and benchmarking of three MPI programs to multiply a dense square matrix by a vector. Each design is based on a different distribution of the matrix and vector elements among the MPI processes. The required MPI functions are introduced and used in two C code programmes which implement parallel matrix-vector multiplication using different algorithms.
9.In the manner employed by Internet search engines, this chapter develops an application that reads a dictionary of key words, locates and reads a set of text documents, then generates and writes a vector of the "fit" for each document. This problem is amenable to functional decomposition and this chapter develops and contains the C code for a manager-worker style parallel programme to solve this problem.
10.Contains six case studies (Neutron Transport, Temperature at a point in a 2-D Plate, 2-D Ising Model, Room Assignment Problem, Parking Garage, Traffic Circle) utilising Monte Carlo methods to solve them, pseudo code only.

The chapter starts with an introduction to the Monte Carlo method and why it works, using a demonstration programme to compute Pi. Programmes implementing the Monte Carlo method rely on high "quality" random numbers and the chapter has a short description on how this is done and on parallel (pseudo)random number generators.
11.Reviews the standard sequential matrix multiplication algorithm (pseudo code) and presents a chart of the algorithm's performance decrease as matrix size increases, especially once the second factor matrix no longer fits inside cache memory.

Next the chapter covers how a recursive implementation of matrix multiplication that multiplies blocks of the original matrices can maintain a high cache rate. The C function implementing recursive, block-oriented matrix multiplication is given.

The design (no code or pseudo code is given) of a parallel algorithm based on a row-wise block-striped decomposition of the matrices, derives an expression for the expected computational time of the algorithm and analysis its isoefficiency. The design and methodology is repeated for a parallel algorithm based on a checker board decomposition of the matrices.
12.Covers the design of methods to solve linear systems. After a brief definition of the terminology, the chapter covers the direct methods for solving linear equations, (Gaussian elimination followed by back substitution).

The section on back substitution contains the pseudo code for the sequential algorithm and compares the row-oriented and column-oriented versions of the algorithm. Pseudo code for a pipelined version of the row-oriented algorithm, which allows computations to occur during the broadcast steps, with back substitution is given.

During the presentation of the Gaussian elimination algorithms the tournament method of reduction is introduced along with its implementation in MPI.

For the solution of spare systems, pseudo code for the Jacobi and conjugate methods is presented along with a description of how to convert the Jacobi method to the Gauss-Seidel method. However only a description of the parallel implementation is covered but there is neither pseudo code or C code.
13.The two most common numerical techniques for solving partial differential questions are the finite element method and finite difference method. The chapter focuses on matrix free implementations of the finite difference method. Using the C code for a single CPU as a basis, the chapter contains two case studies, one for a vibrating string, the other for steady state heat distribution in a thin square plate, which are used to present the techniques for parallelising programmes.
14.This chapter considers the parallelisation of sorting algorithms. The first algorithm discussed is a parallel version of quick-sort, but this does not do a good job of balancing values amongst the processors. Next the Hyperquicksort algorithm is presented. This was inspired by the hypercube processor organisation, but with this algorithm the workloads on the processors can become imbalanced. Parallel Sorting by Regular Sampling addresses the load imbalance problem. As it uses all-to-all communication it is a good fit for current switch based clusters. No pseudo code or C code is given for any of the algorithms.
15.The chapter introduces the discrete Fourier transform, the inverse Fourier transform and the fast Fourier transform. Pseudo code for the recursive and iterative sequential fast Fourier transfer are given and from this the chapter discusses how to implement it on a multicomputer.
16.The chapter discusses four kinds of combinational search algorithms, (divide and conquer, backtrack, branch and bound and alpha-beta search), used to solve decisions and optimisation problems. Pseudo code for parallel backtrack search, generic sequential best-first branch-and-bound, parallel branch-and-bound and sequential alpha-beta pruning algorithms is given.
17.The parallel algorithms in the preceding chapters were implemented on multicomputers using MPI. MPI could be used to implement these parallel algorithms on a multiprocessor computer, but it is often possible to obtain better performance by using a programming language tailored for a shared memory system.

OpenMP is an Application Programming Interface (API) for parallel programming on multiprocessors. It is a set of compiler directives and a library of support functions and works in conjunction with standard FORTRAN, C and C++. It has recently emerged as a shared-memory standard.

This chapter details how the shared memory programming model is different from the message passing model. Using C code-lets it presents enough OpenMP compiler directives and functions to enable you to be able to parallelise a wide variety of C code segments.
18.Many current parallel computers consist of a collection of multiprocessors. While it is possible to programme a collection of multiprocessors using MPI, it is often possible to improve performance by using MPI and OpenMP. MPI is used to handle larger grained communication between the multiple computers, while the lighter-weight threads of OpenMP handle the processor interactions within each multiprocessor.

The chapter contains two examples of transforming a C programme with MPI calls into a hybrid programme suitable for execution on a cluster of multiprocessors.

Last modified 28 Apr 06