|
High Performance Algorithms for
Structured Matrix Problems edited by Peter Arbenz, Marcin Paprzycki, Ahmed Sameh and Vivek Sarin Advances in the Theory of Computation and Computational Mathematics, Volume 2 |
Abstracts |
A Comparison of Frontal Software with other Harwell Subroutine
Library Sparse Direct Solvers
I.S. Duff and J.A. Scott (Rutherford Appleton Laboratory)
We compare the performance of sparse frontal codes from the Harwell Subroutine Library (HSL) against other HSL sparse direct solvers and consider the effect of ordering on the frontal solver. We study the case of assembled and of unassembled systems for both symmetric positive-definite and unsymmetric matrices. We use problems arising in real engineering or industrial applications in our tests.
We compare the performance of sparse frontal codes from the Harwell Subroutine Library (HSL) against other HSL sparse direct solvers and consider the effect of ordering on the frontal solver. We study the case of assembled and of unassembled systems for both symmetric positive-definite and unsymmetric matrices. We use problems arising in real engineering or industrial applications in our tests.
Keywords: sparse matrices, frontal solver, direct methods, finite- elements, BLAS
Sparse Matrix Bandwith Reduction: Algorithms, Applications and Real
Industrial Cases in Electromagnetics
A Espositio, M.S.F. Catalano, F. Malucelli and L. Tarricone
(University of Perugia)
The problem of sparse matrices bandwidth reduction is addressed and solved with two approaches suitable for both symmetric and non-symmetric matrices. The former is a constructive method derived from the Cuthill-McKee approach, the latter is an application of a metaheuristic scheme. This heuristic algorithm has been implemented on parallel machines. These approaches are compared with previous methods in the literature (suited only for symmetric cases). Benchmarks on general data tests, as well as on real industrial cases, show the superior effectiveness and efficiency of the proposed methods. Some results illustrating the parallel performance of the parallel heuristic are also reported.
Keywords: Cuthill-McKee,Tabu Search, Matrix Bandwidth Reduction, Parallel Algorithms.
On the Stable Parallel Solution of General Narrow Banded Linear
Systems
P. Arbenz (ETH Zurich) and M. Hegland (Australian National
University)
We propose a stable algorithm for the parallel solution of banded and periodically banded linear systems. While most of the known parallel algorithms are stable only for symmetric positive definite or diagonally dominant systems, the new algorithm incorporates pivoting without sacrificing efficiency. The principle ingredient of the algorithm is a bidiagonal cyclic reduction that admits pivoting.
Efficient Algorithms for Reducing Banded Matrices to Bidiagonal and
Tridiagonal Form
B. Lang (Bergische University Wuppertal)
This paper presents efficient techniques for the orthogonal reduction of banded matrices to bidiagonal and symmetric tridiagonal form. The algorithms are numerically stable and well suited to parallel execution. Experiments on the Intel Paragon show that even on a single processor these methods usually will be several times faster than the corresponding LAPACK routines. In addition, they yield nearly full speedup when run on multiple processors.
Parallel Bisection Algorithm for Solving Symmetric Tridiagonal
Eigenproblem
J.M. Badia (University Jaume I. Castellon) and A.M. Vidal
(Technical University Valencia)
In this paper we study the different approaches used so far to apply the bisection method for solving the symmetric tridiagonal eigenproblem. We review the sequential methods used to perform the final extraction of the eigenvalues and compare two new techniques that offer very good results. The sequential version of the bisection method we have implemented even surpasses the results of the QR iteration in some cases. We also perform an exhaustive survey of the approaches that can be used to parallelize the bisection method and we compare two parallel algorithms that apply different schemes for distributing the computation among the processors. The experimental analysis developed shows that the bisection method, besides its flexibility in the partial computation of the spectrum, is a method that offers very good results when it is adequately parallelized. We also show that the behaviour of the algorithms is clearly matrix-dependent.
Keywords: symmetric tridiagonal eigenproblem, bisection method, Laguerre iteration, parallel algorithms
A Parallel QR Algorithm for the Symmetric Tridiagonal Eigenvalue
Problem
I. Bar-On (Technion University)
Divide and Conquer algorithms are most efficient in capturing the potential performance of massively parallel supercomputer machines. In this paper, we describe a new D & C approach for locating the eigenvalues of a symmetric tridiagonal matrix using the QR technique. The new algorithm demonstrates high speedups with low communication overhead, and provides a fast and efficient method for locating few eigenvalues of very large size problems.
Keywords: Symmetric Tridiagonal Matrices, Eigenvalues, The QR Algorithm, Parallel Algorithms
A Numerical Comparison of Look-Ahead Levinson and Schur Algorithms
for Non-Hermitian Toeplitz Systems
M. Hochbruck (University of Tubingen)
In two recent papers we proposed different types of look-ahead algorithms for recursively computing PF s located on two adjacent rows of the PT of a formal Laurent series. These methods can be interpreted as look-ahead Levinson and Schur algorithms for solving general non-Hermitian Toeplitz linear systems of equations.
In this paper, we discuss look-ahead strategies and options for solving Toeplitz systems. The main aim of this paper is a numerical comparison of different types of look-ahead methods and of various options for solving a Toeplitz system. We compare our implementation of look-ahead Levinson and Schur algorithms with the classical algorithms and with the look-ahead Levinson solver provided by Hansen and Chan.
Superfast Solution of Linear Equations with Low Displacement
Rank
T. Huckle (Technical University of Munich)
In this paper we consider linear equations with symmetric positive definite matrices of displacement rank 2. As a special case, Toeplitz matrices belong to this class. We will derive a modified version of superfast methods related to the algorithms proposed by Morf and Bitmead/Anderson. The method is based on repeatedly dividing the original problem into two subproblems of half the size, namely the leading principal submatrix and the related Schur complement. All occuring matrices are represented explicitly by proper generating vectors of their displacement characterization. We show that the resulting superfast method takes 81.25 · N · lg(N) 2 + O(N · lg(N)) flops for an N × N-matrix. Furthermore, we present some numerical results of a MATLAB-implementation of the proposed algorithm.
Keywords: Toeplitz matrix, displacement rank, superfast solver
Load Balance in Parallel FACR
S.L. Johnsson and N.P. Pitsianis (University of Houston)
Fourier Analysis Cyclic Reduction is a class of very efficient methods for the solution of Poisson's equation on regular grids. We show that exploiting the numerical properties of the tridiagonal systems involved may reduce the factorization work required to a few percent of a normal factorization. We also show that exploiting this property on distributed memory parallel processor architectures may lead to severe load imbalance and show how the CYCLIC data allocation option in High Performance Fortran (HPF) may improve the load--balance significantly. We also show that folding of the index space enumerating different tridiagonal systems further improves the load--balance. We finally show that a one--dimensional processing array configuration yields lower communication complexity than a two--dimensional configuration in many situations of practical interest.
Parallel CG-Methods - Automatically Optimized for PC and
Workstation Clusters
J. Eisenbiegler, J. Gottlieb, W. Lowe, S. Schlaeger, M. Thul, and
W. Zimmerman (University of Karlsruhe)
We discuss parallel implementations of cg--methods for solving large scale systems in engineering applications. Furthermore we describe automatic optimizations for arbitrary clusters of workstations. We consider each cluster as a finite set of processor--memory pairs linked together with an interconnection network. They are modelled as a LogP machine, which is extended to use functions instead of constants for all important parameters. The optimizations guarantee runtimes within a small constant factor of the optimum. This bound is independent of the target machine. In practice, they give nearly optimal programs. Numerical experiments on a Parsytec PowerXplorer confirm the results obtained.
Keywords: parallel algorithms, compilers, iterative methods, sparse matrices, FD methods