|
High Performance Algorithms for
Structured Matrix Problems edited by Peter Arbenz, Marcin Paprzycki, Ahmed Sameh and Vivek Sarin Advances in the Theory of Computation and Computational Mathematics, Volume 2 |
Introduction |
The main aim of this volume is to summarize the state of the art in the area of high performance solutions of structured linear systems as well as the area of structured eigenvalue and singular-value problems. The volume highlights research directions perceived to be the most important for computing the structured problems. We hope that the volume provides a collection of algorithms and ideas that will guide future developments in this area.
The papers address a variety of issues in the field of high performance algorithms for structured matrices. The types of structured matrices considered include banded, structured sparse and dense matrices with special structures.
The topics covered range from parallel solvers for sparse or banded linear systems to parallel computation of eigenvalues and singular values of tridiagonal and bidiagonal matrices. In addition, the volume contains articles on specialized solution techniques for dense structured matrices with low displacement rank.
The papers also discuss implementation issues on numerous parallel architectures such as vector computers, shared and distributed memory multiprocessors, and clusters of workstations.
The articles can be broadly classified into four categories:
Linear Systems:
Linear system solvers: Direct solution techniques for large systems of linear equations are most effective in the presence of a certain structure inherent in the problem. In many cases, the reordering of rows and columns of sparse matrices can significantly reduce computation and may prove to be critical for large problems with unassembled matrices.
In their article, Duff and Scott compare the performance of frontal solvers with other sparse direct solvers on the Cray J90 vector computers as well as on the IBM RS/6000 and the DEC Alpha workstations and study the effect of ordering on a number of problems. They conclude that their technique has some advantages over other methods for unassembled finite-element problems in which the entire matrix is not available.
Esposito, Catalano, Malucelli and Tarricone propose two heuristics for reducing bandwidth of sparse matrices by using symmetric permutations and apply their technique to the solution of linear systems arising from electromagnetics. They report on experiments conducted on the IBM SP2 and on the Cray T3D multiprocessor computers.
The reordering of linear systems to obtain a banded structure has a cost, however. Such algorithms are susceptible to numerical instability when applied to general linear systems.
Arbenz and Hegland propose a stable parallel algorithm for banded and periodically banded linear systems. Their technique implements bidiagonal cyclic reduction with the ability to perform pivoting and demonstrates the effectiveness of parallel banded solvers on shared and distributed memory parallel computers. Their target machines are the Intel Paragon, the Fujitsu AP1000 and the SGI Power Challenge.
Eigenvalue problems:
Classical methods for computing eigenvalues and singular values of matrices require reduction of the matrix to tridiagonal or bidiagonal forms. Lang presents techniques for reducing banded matrices to bidiagonal and tridiagonal form using orthogonal transformations which are numerically stable and efficient on parallel architectures. The theoretical considerations are complemented by experimental results obtained on the Intel Paragon.
The remaining articles on this topic present two competing approaches for computing the eigenvalues of symmetric tridiagonal matrices on parallel computers. The first one by Badia and Vidal provides a survey of efforts to parallelize the bisection algorithms and proposes two new parallel algorithms. Numerical results illustrating their approach are given for the Cray T3D.
In the second article, Bar-On proposes a divide-and-conquer approach for locating eigenvalues, which is quite efficient in locating a few eigenvalues of large problems. He reports on simulations done on the Cray T3D and the Cray J90.
Matrices with special structure:
Matrices with special structure: Many important applications give rise to matrices with special structure. Examples of such matrices include the Hankel, Cauchy, Vandermonde and Toeplitz systems. There are several fast algorithms for computing the solution of such linear systems that exploit the special properties of the structure. However, the critical issue in all of these algorithms is how to ensure the numerical stability of the solution process.
The two main approaches for the solution of Toeplitz systems are the Levinson and Schur algorithms. In practice, one must use a variety of look-ahead schemes to ensure the stability of the algorithm. Hochbruck presents a detailed comparison of these algorithms on the basis of the numerical behavior of look-ahead strategies used.
The article by Huckle considers a larger class of matrices which are characterized by a low displacement rank. A fast algorithm is proposed for the solution of these linear systems along with the analysis of its complexity.
The fast algorithms by Hochbruck and Huckle are intended to run on single processor machines. They thus present numerical results obtained on workstations.
Parallel Computation:
Parallel computation: Algorithms for structured matrices that can be efficiently implemented on parallel architectures have received considerable attention over the last decade. Research in this area has focused mainly on parallelizing existing sequential algorithms. This is due in part to the availability of several highly efficient solution techniques for matrices with special structure.
One such class of algorithms for the Poisson problem on regular grids is the Fourier Analysis Cyclic Reduction (FACR) algorithm. A variation of the basic algorithm is presented by Johnsson and Pitsianis. It uses the numerical features of the matrix to reduce computational complexity by a significant amount. They also study the effect of these modifications on the load-balance achieved on parallel machines and suggest techniques to alleviate this problem.
Recent years have been witness to significant advances in the design of numerical software for parallel architectures. The reduction of the semantic gap between high level specification of algorithms and their optimized implementation on specific parallel computers is crucial to the success of high performance computing. The article by Eisenbiegler, Loewe, Gottlieb, Schlaeger and Thuel proposes parallel implementations of the conjugate gradient method which are automatically optimized for clusters of workstations. The authors report on numerical experiments executed on the Parsytec PowerXplorer.
This volume is a result of a joint effort of many people. The editors wish to thank all referees and authors for their cooperation and support. The referees for this volume were: Greg Ammar (De Kalb), Pierluigi Amodio (Bari), Achim Basermann (Julich), Luigi Brugnano (Florence), Edmond Chow (Minnesota), Kevin Gates (Brisbane), Krassimir Georgiev (Sofia), Ananth Grama (Purdue), Anshul Gupta (IBM), Martin Gutknecht (Zurich), Markus Hegland (Canberra), Polina Jordanova (Sofia), George Karypis (Minnesota), Svetozar Margenov (Sofia), Ulrike Meier Yang (Urbana), Michael Oettli (Minnesota), Tzvetan Ostromski (Sofia), Vellisar Pavlov (Russe), Svetozara Petrova (Sofia), Sefano Serra (Florence), Andreas Stathopolous (Minnesota), Przemyslaw Stpiczynski (Lublin), Zhanye Tong (Purdue), Marian Vajtersic (Bratislava), Plamen Yalamov (Rousse).