This paper addresses the problem of algorithm discov- ery, via evolutionary search, in the context of matrix multiplication. The traditional multiplication algorithm requires O(n 3) multiplications for square matrices of or- der n.... more
Strassen's algorithm for fast matrix-matrix multiplication has been implemented for matrices of arbitrary shapes on the CRAY-2 and CRAY Y-MP supercomputers. Several techniques have been usCd to reduce the scratch space requirement for... more
LU decomposition is a fundamental in linear algebra. Numerous tools exists that provide this important factorization. The authors present the conditions for a matrix to have none, one, or infinitely many LU factorizations. In the case... more
LU decomposition is a fundamental in linear algebra. Numerous tools exists that provide this important factorization. The authors present the conditions for a matrix to have none, one, or infinitely many LU factorizations. In the case... more
An efficient implicit lower-upper symmetric Gauss-Seidel (LU-SGS) solution algorithm has been developed for a high order multi-domain spectral difference method on unstructured hexahedral grids. The LU-SGS solver is preconditioned by the... more
This research introduces a row compression and nested product decomposition of an n × n hierarchical representation of a rank structured matrix A, which extends the compression and nested product decomposition of a quasiseparable matrix.... more
Dedicated to my loving parents, John and Anne Hudachek, who are watching from above; and to my very patient husband, Dan Buswell, who is ecstatic to have his wife back. v ACKNOWLEDGEMENTS The monumental task of a doctoral degree can only... more
In this paper, we postulate a new decomposition theorem of a matrix A into two matrices, namely, a lower triangular matrix M, in which all entries are determinants, and an upper triangular matrix U whose entries are also in determinant... more
Abstract. For saddle point problems in fluid dynamics, several popular preconditioners exploit the block struc-ture of the problem to construct block triangular preconditioners. The performance of such preconditioners depends on whether... more
In this work, the solution of a large sparse linear system of equations with an arbitrary sparsity pattern is obtained by using LU-decomposition method as well as numerical structure approach. The LU-decomposition method is based on... more
Displacement decomposition circulant preconditioners for almost incompressible 2D elasticity systems
The robustness of the recently introduced circulant blockfactorization (CBF) preconditioners is studied in the case of finite element matrices arising from the discretization of the 2D Navier equations of elasticity. Conforming triangle... more
Decomposition Approach for Inverse Matrix Calculation 113 of a corresponding inverse of a modified matrix A *-1 are derived in (Strassen, 1969). The components of the inverse matrix can be evaluated analytically. Finding the inverse... more
The comprehensive LU decomposition of a parametric matrix consists of a case analysis of the LU factors for each specialization of the parameters. Special cases can be discontinuous with respect to the parameters, the discontinuities... more
The ScaLAPACK library for parallel dense matrix computations is built on top of the BLACS communications layer. In this work, we investigate the use of BSPlib as the basis for a communications layer. We examine the LU decomposition from... more
The Cholesky decomposition plays an important role in finding the inverse of the correlation matrices. As it is a fast and numerically stable for linear system solving, inversion, and factorization compared to singular valued... more
We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2... more
Hereby, I, Ta Minh THANH, consciously assure that for the manuscript "Consideration of A Robust Watermarking Algorithm for Color Image Using Improved QR Decomposition" the following is fulfilled: 1) This material is the authors' own... more
An efficient technique for partitioning and programming linear algebra algorithms on concurrent architectures is described and applied to 2-D wavefront arrays. The mapping of the computational elements (processes) to processors is based... more
Recursive block decomposition algorithms (also known as quadtree algorithms when the blocks are all square) have been proposed to solve well-known problems such as matrix addition, multiplication, inversion, determinant computation, block... more
This paper presents implementations of new methods solving LU factorization used in engineering applications. The implementations are done on the Alliant FX/80 minisupercomputer and use Level 3 Basic Linear Algebra Subprograms. Three ways... more
The Gauss-Huard algorithm (the GHA) is a specialized version of Gauss-Jordan elimination for the solution of linear systems that, enhanced with column pivoting, exhibits numerical stability and computational cost close to those of the... more
The concept of block{cyclic order elimination can be applied to out{of{ core LU and QR matrix factorizations on distributed memory architectures equipped with a parallel I/O system. This elimination scheme provides load balanced... more
The Gauss-Huard algorithm (the GHA) is a specialized version of Gauss-Jordan elimination for the solution of linear systems that, enhanced with column pivoting, exhibits numerical stability and computational cost close to those of the... more
This paper presents a new approach for the solution of Linear Programming Problems with the help of LU Factorization Method of matrices. This method is based on the fact that a square matrix can be factorized into the product of unit... more
Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorithms, both across the memory hierarchy and between processors. Current approaches either study specific algorithms individually, disallow... more
Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorithms, both across the memory hierarchy and between processors. Current approaches either study specific algorithms individually, disallow... more
Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for... more
A singular matrix A may have more than one LU factorizations. In this work the set of all LU factorizations of A is explicitly described when the lower triangular matrix L is nonsingular. To this purpose, a canonical form of A under left... more
SPEX Left LU is a software package for exactly solving unsymmetric sparse linear systems. As a component of the sparse exact (SPEX) software package, SPEX Left LU can be applied to any input matrix, A, whose entries are integral,... more
We exhibit that the Singular Value Decomposition of a matrix implies a natural full-rank factorization of the matrix
Gaussian Elimination is commonly used to solve dense linear systems in scientific models. In a large number of applications, a need arises to solve many small size problems, instead of few large linear systems. The size of each of these... more
In this paper, we introduce novel fast matrix inversion algorithms that leverage triangular decomposition and recurrent formalism, incorporating Strassen's fast matrix multiplication. Our research places particular emphasis on triangular... more
Sparse LU decomposition has been widely used to solve sparse linear systems of equations found in many scientific and engineering applications, such as circuit simulation, power system modeling and computer vision. However, it is... more
Results are given concerning the LU factorization of H-matrices, and Gaussian elimination with column-diagonaldominant pivoting is shown to be applicable to H-matrices. This algorithm, which uses a symmetric permutation to exchange the... more
This paper presents a deep-pipelined FPGA implementation of real-time ellipse estimation for eye tracking. The system is constructed by the Starburst algorithm on a streamoriented architecture and the RANSAC algorithm without any external... more
In this paper we present new hybrid CPU-GPU routines to accelerate the solution of linear systems, with band coefficient matrix, by off-loading the major part of the computations to the GPU and leveraging highly tuned implementations of... more
An algorithm mainly consisting of a part of Divide and Conquer and the twisted factorization is proposed for bidiagonal SVD. The algorithm costs Oðn 2 Þ flops and is highly parallelizable when singular values are isolated. If strong... more
At the heart of a frontal or multifrontal solver for the solution of sparse symmetric sets of linear equations, there is the need to partially factorize dense matrices (the frontal matrices) and to be able to use their factorizations in... more
We propose a hybrid sparse system solver for handling linear systems using algebraic domain decomposition-based techniques. The solver consists of several stages. The first stage uses a reordering scheme that brings as many of the largest... more
In this paper, we study the Singular Value Decomposition of an arbitrary matrix A , especially its subspaces of activation, which leads in natural manner to the pseudo inverse of Moore-Bjenhammar-Penrose. Besides, we analyze the... more
We present algorithms for the symbolic and numerical factorization phases in the direct solution of sparse unsymmetric systems of linear equations. We have modi ed a classical symbolic factorization algorithm for unsymmetric matrices to... more
This paper presents a new approach for the solution of Linear Programming Problems with the help of LU Factorization Method of matrices. This method is based on the fact that a square matrix can be factorized into the product of unit... more
Realistic and accurate numerical simulations of electrostimulation of tissues and full-body biomodels have been developed and implemented. Typically, whole-body systems are very complex and consist of a multitude of tissues, organs, and... more
A method for solving systems of linear equations is presented based on direct decomposition of the coefficient matrix using the form LAX LB B′ = =. Elements of the reducing lower triangular matrix L can be determined using either row wise... more
In this paper, authors present their work on field-programmable gate array (FPGA) hardware implementation of proposed direction of arrival estimation algorithms employing LU factorization. Both L and U matrices were considered in... more
In this paper, we postulate a new decomposition theorem of a matrix A into two matrices, namely, a lower triangular matrix M, in which all entries are determinants, and an upper triangular matrix U whose entries are also in determinant... more
Sparse parallel factorization is among the most complicated and irregular algorithms to analyze and optimize. Performance depends both on system characteristics such as the floating point rate, the memory hierarchy, and the interconnect... more
In this work, the solution of a large sparse linear system of equations with an arbitrary sparsity pattern is obtained by using LU-decomposition method as well as numerical structure approach. The LU-decomposition method is based on... more
In this paper, we postulate a new decomposition theorem of a matrix A into two matrices, namely, a lower triangular matrix M, in which all entries are determinants, and an upper triangular matrix U whose entries are also in determinant... more
The comprehensive LU decomposition of a parametric matrix consists of a case analysis of the LU factors for each specialization of the parameters. Special cases can be discontinuous with respect to the parameters, the discontinuities... more