Academia.eduAcademia.edu

Outline

Efficient parallel reduction to bidiagonal form

1999, Parallel Computing

https://doi.org/10.1016/S0167-8191(99)00041-1

Abstract

Most methods for calculating the SVD (singular value decomposition) require to ®rst bidiagonalize the matrix. The blocked reduction of a general, dense matrix to bidiagonal form, as implemented in ScaLAPACK, does about one half of the operations with BLAS3. By subdividing the reduction into two stages dense 3 banded and banded 3 bidiagonal with cubic and quadratic arithmetic costs, respectively, we are able to carry out a much higher portion of the calculations in matrix±matrix multiplications. Thus, higher performance can be expected. This paper presents and compares three parallel techniques for reducing a full matrix to banded form. (The second reduction stage is described in another paper [B. Lang, Parallel Comput. 22 (1996) 1±18]). Numerical experiments on the Intel Paragon and IBM SP/1 distributed memory parallel computers demonstrate that the two-stage reduction approach can be signi®cantly superior if only the singular values are required. Ó . This work was partially funded by Deutsche Forschungsgemeinschaft, Gesch aftszeichen Fr 755/6-1 and Fr 755/6-2. 0167-8191/99/$ ± see front matter Ó 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 -8 1 9 1 ( 9 9 ) 0 0 0 4 1 -1 parallel computers [1,2,4] and to novel accuracy issues, do most of the work on a full or triangular matrix.

References (17)

  1. M. Be cka, S. Robert, M. Vajter sic, Experiments with parallel one-sided and two-sided algorithms for SVD, in: P. Zinterhof, M. Vajter sic, A. Uhl (Eds.), Parallel Computation, Springer, Berlin, 1999, pp. 48±57.
  2. M. Be cka, M. Vajter sic, Block-Jacobi SVD algorithms for distributed memory systems I : Hypercubes and rings, Parallel Algorithms Appl. 13 (1999) 265±287.
  3. M.W. Berry, J.J. Dongarra, Y. Kim, A parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form, Parallel Comput. 21 (8) (1995) 1184±1200.
  4. C. Bischof, Computing the singular value decomposition on a distributed system of vector processors, Parallel Comput. 11 (1989) 171±186.
  5. C. Bischof, C. Van Loan, The WY representation for products of Householder matrices, SIAM J. Sci. Stat. Comput. 8 (1) (1987) s2±s13.
  6. J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, R.C. Whaley, ScaLAPACK: A portable linear algebra library for distributed memory computers ± design issues and performance, Comput. Phys. Comm. 97 (1996) 1±15.
  7. J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker, R.C. Whaley, A proposal for a set of parallel basic linear algebra subprograms, in: J. Dongarra, K. Masden, J. Wa sniewski (Eds.), Applied Parallel Computing, Springer, Berlin, 1995, pp. 107±114.
  8. J. Choi, J.J. Dongarra, L.S. Ostrouchov, A.P. Petitet, D.W. Walker, R.C. Whaley, The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines, Sci. Programming 5 (1996) 173±184.
  9. J. Choi, J.J. Dongarra, D.W. Walker, The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form, Numer. Alg. 10 (1995) 379±399.
  10. J.J. Dongarra, J. Du Croz, S. Hammarling, I. Du, A set of level 3 basic linear algebra subprograms, ACM Trans. Math. Soft. 16 (1) (1990) 1±17.
  11. J.J. Dongarra, J. Du Croz, S. Hammarling, R.J. Hanson, An extended set of FORTRAN basic linear algebra subprograms, ACM Trans. Math. Soft. 14 (1) (1988) 1±17.
  12. J.J. Dongarra, R.C. Whaley, LAPACK Working Note 94: A user's guide to the BLACS v1.0, Technical Report CS-95-281, University of Tennessee at Knoxville, March 1995.
  13. B. Groûer, Parallele zweistu®ge Verfahren zur Reduktion auf Bidiagonalgestalt, Diplomarbeit, Fachbereich Mathematik, Bergische Universit at GH Wuppertal, 1997.
  14. M.R. Hestenes, Inversion of matrices by biorthogonalization and related results, SIAM J. Appl. Math. 6 (1958) 51±90.
  15. E.G. Kogbetliantz, Solution of linear equations by diagonalization of coecients matrix, Quart. Appl. Math. 13 (1955) 123±132.
  16. B. Lang, Parallel reduction of banded matrices to bidiagonal form, Parallel Comput. 22 (1996) 1±18.
  17. R. Schreiber, C. Van Loan, A storage-ecient WY representation for products of Householder transformations, SIAM J. Sci. Stat. Comput. 10 (1) (1989) 53±57.