Academia.eduAcademia.edu

LU factorization

description313 papers
group2 followers
lightbulbAbout this topic
LU factorization is a mathematical method used to decompose a matrix into the product of a lower triangular matrix (L) and an upper triangular matrix (U). This technique is commonly employed in numerical analysis to simplify the solution of linear systems, matrix inversion, and determinant calculation.
lightbulbAbout this topic
LU factorization is a mathematical method used to decompose a matrix into the product of a lower triangular matrix (L) and an upper triangular matrix (U). This technique is commonly employed in numerical analysis to simplify the solution of linear systems, matrix inversion, and determinant calculation.

Key research themes

1. How can communication costs be minimized in parallel LU factorization on large-scale high-performance computing systems?

This research area focuses on deriving theoretical lower bounds for data movement (communication volume) in parallel LU factorization algorithms and designing practical algorithms that approach these bounds. Minimizing communication costs is critical because data movement dominates runtime and energy consumption on distributed-memory and exascale computing systems, where LU factorization is widely used for solving linear systems in scientific computing.

Key finding: The paper derives a novel parallel I/O lower bound for LU factorization: communicated elements per processor scale as N^3 / (P √M), where N is matrix size, P number of processors, and M local memory size. Building on this... Read more

2. What algorithmic and data-structural techniques enable efficient, scalable parallel LU factorization of hierarchical (H-) matrices with dynamic block structures?

Hierarchical matrices (H-matrices) arising from boundary element and partial differential equations provide data-sparse representations enabling efficient approximate factorizations. Parallelizing LU factorization on modern multicore and manycore architectures requires exploiting nested task parallelism with dynamic, non-uniform data structures due to low-rank blocks whose sizes evolve during computation. Addressing this challenge involves advanced task programming models that can manage dependencies despite changing memory layouts while maximizing concurrency and maintaining fine-grained parallel efficiency.

Key finding: The authors develop a task-parallel implementation of LU factorization for H-matrices using the OmpSs programming model supporting weak dependencies and early release, which enables nested fine-grained concurrency. They... Read more

3. How can the relationship between time and energy consumption in multithreaded LU factorization implementations be quantitatively characterized and optimized on multicore processors?

This research theme examines the scalability of LU factorization algorithms in terms of both execution time and energy consumption using multithreading and dynamic voltage and frequency scaling (DVFS) techniques. Understanding these correlations and tradeoffs is essential for optimizing algorithm implementations for energy-efficient high-performance computing, especially as energy constraints become paramount. The goal is to balance performance and power to minimize overall energy use without sacrificing scalability.

Key finding: The study experimentally demonstrates strong correlations between execution time and energy consumption in multithreaded LU factorization (with and without pivoting) and Cholesky factorizations on an Intel Xeon Gold multicore... Read more

All papers in LU factorization

The sparse matrix-vector multiplication (SpMxV) is a kernel operation widely used in iterative linear solvers. The same sparse matrix is multiplied by a dense vector repeatedly in these solvers. Matrices with irregular sparsity patterns... more
The capability to rapidly execute the power flow (PF) calculations permit engineers in assured with stay bigger assured within the dependability, protection, and economical operation of their system within the case of planned or unplanned... more
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more
Matrix multiplication is a cornerstone of computational mathematics. Standard algorithms for 2x2 matrices require 8 scalar multiplications, while Strassen's algorithm reduces this to 7. This paper introduces and details Surya Matrix... more
For many finite element problems, when represented as sparse matrices, iterative solvers are found to be unreliable because they can impose com-putational bottlenecks. Early pioneering work by Duff et al, explored an alternative strategy... more
In this paper, we derive the general expression of the r th power (r ∈ N) for one type of tridiagonal matrix.
In this study, an algorithm for computing the inverse of periodic k banded matrices , which are needed for solving the differential equations by using the finite differences, the solution of partial differential equations and the solution... more
A new parallel algorithm for the LU factorization of a given dense matrix A is described. The case of banded matrices is also considered. This algorithm can be combined with Sameh and Brent's [SIAM J. Numer. Anal. 14, 1101Anal. 14, -I... more
In this paper, we compare with the inverse iteration algorithms on PowerXCell T M 8i processor, which has been known as a heterogeneous environment. When some of all the eigenvalues are close together or there are clusters of eigenvalues,... more
Example with a memory of size 2 data. The graph of input data dependencies is shown on the left. The figure on the right corresponds to the partition and schedule produced by the scheduler ▸ Deque Model Data Aware Ready (DMDAR): Deque... more
In the current article, the authors present a new recursive symbolic computational algorithm, that will never break down, for inverting general periodic pentadiagonal and anti-pentadiagonal matrices. It is a natural generalization of the... more
The Forrest-Tomlin update has stood the test of time within many generations of commercial mathematical programming systems. Its ease of implementation leads to high efficiency and evidently acceptable reliability. We review its relation... more
Two unresolved issues regarding dynamic programming,over an inflnite time horizon are addressed within this dissertation. Previous research uses policy improvement to flnd a strong-present-value optimal policy in such systems, but the... more
We describe a set of procedures for computing and updating an LU factorization of a sparse matrix A, where A may be square (possibly singular) or rectangular. The procedures include a Markowitz factorization and a Bartels-Golub update,... more
The calculation of overlaps between many-electron wave functions at different nuclear geometries during nonadiabatic dynamics simulations requires the evaluation of a large number of determinants of matrices that differ only in a few... more
In this document, we are concerned with the effccts of data layouts for nonsquare processor meshes on the implementation of common dense linear algebra kernels such as matrix-matrix multiplication, LU factorizations, or eigenvalue... more
This work presents an innovative approach adopted for the development of a new numerical software framework for accelerating dense linear algebra calculations and its application within an engineering context. In particular, response... more
Most recent HPC platforms have heterogeneous nodes composed of a combination of multi-core CPUs and accelerators, like GPUs. Scheduling on such architectures relies on a static partitioning and cost model. In this paper, we present a... more
In this paper, we present a comparison of scheduling strategies for heterogeneous multi-CPU and multi-GPU architectures. We designed and evaluated four scheduling strategies on top of XKaapi runtime: work stealing, data-aware work... more
The high computational cost involved in modeling of the progressive fracture simulations using large discrete lattice networks stems from the requirement to solve a new large set of linear equations every time a new lattice bond is... more
This special issue of Linear Algebra and its Applications honours Pauline van den Driessche, who celebrates her sixty fifth birthday in 2006. Pauline has made significant contributions to mathematics, especially in linear algebra and... more
A class of matrices that simultaneously generalizes the M-matrices and the inverse M-matrices is brought forward and its properties are reviewed. It is interesting to see how this class bridges the properties of the matrices it... more
Calculating the log-determinant of a matrix is useful for statistical computations used in machine learning, such as generative learning which uses the log-determinant of the covariance matrix to calculate the log-likelihood of model... more
This paper extends the ideas behind Bareiss's fraction-free Gauss elimination algorithm in a number of directions. First, in the realm of linear algebra, algorithms are presented for fraction-free LU "factorization" of a... more
Steganography is one of the most important tools in the data security field as there is a huge amount of data transferred each moment over the internet. Hiding secret messages in an image has been widely used because the images are mostly... more
In this work an architecture of an automatically tuned linear algebra library proposed in previous works is extended in order to adapt it to platforms where both the CPU load and the network traffic vary. During the installation process... more
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA... more
In this paper we study properties of definite quadratic eigenproblems. Free vibrations of fluid-solid structures are governed by a nonsymmetric eigenvalue problem. This problem can be transformed into a definite quadratic eigenvalue... more
In this paper, we describe a reliable symbolic computational algorithm for inverting general cyclic heptadiagonal matrices by using parallel computing along with recursion. The algorithm is implementable to the Computer Algebra... more
Das Tondo Die Anbetung der hl. drei Könige, heute in der Gemäldegalerie in Berlin, wird Domenico Veneziano zugeschrieben. Im Vordergrund bezeugt eine Gruppe herrschaftlicher Menschen in prächtigen Gewändern, unter ihnen die drei Könige,... more
Das Tondo Die Anbetung der hl. drei Könige, heute in der Gemäldegalerie in Berlin, wird Domenico Veneziano zugeschrieben. Im Vordergrund bezeugt eine Gruppe herrschaftlicher Menschen in prächtigen Gewändern, unter ihnen die drei Könige,... more
Maurice Potron (1872-1942) is a French Jesuit and mathematician whose main source of inspiration in economics is the encyclical Rerum Novarum. With virtually no knowledge in economic theory, he wrote down a linear model of production in... more
One wishes to remove k − 1 edges of a vertex-weighted tree T such that the weights of the k induced connected components are approximately the same. How well can one do it ? In this paper, we investigate such k-separator for quasi-binary... more
We consider the finite element environment Getfem++ 1 , which is a C++ library of generic finite element functionalities and allows for parallel distributed data manipulation and assembly. For the solution of the large sparse linear... more
Solving linear systems is a fundamental task in several areas of mathematics and engineering, playing a crucial role in solving real-world problems. MATLAB, a powerful numerical computing platform, offers a wide range of tools and... more
This work deals with the numerical solution of systems of oscillatory second-order differential equations which often arise from the semi-discretization in space of partial differential equations. Since these differential equations... more
Subspace channel estimation methods have been<br> studied widely, where the subspace of the covariance matrix is<br> decomposed to separate the signal subspace from noise subspace. The<br> decomposition is normally done... more
Linear feature space transformations are often used for speaker or environment adaptation. Usually, numerical methods are sought to obtain solutions. In this paper, we derive a closed-form solution to ML estimation of full feature... more
The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-prot purposes provided that: • a full... more
Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are... more
Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are... more
A nonsingular real matrix A is said to be inverse-positive if all the elements of its inverse are nonnegative. This class of matrices contains the M-matrices, from which inherit some of their properties and applications, especially in... more
A nonsingular real matrix A is said to be inverse-positive if all the elements of its inverse are nonnegative. This class of matrices contains the M-matrices, from which inherit some of their properties and applications, especially in... more
In this work an architecture of an automatically tuned linear algebra library proposed in previous works is extended in order to adapt it to platforms where both the CPU load and the network traffic vary. During the installation process... more
In this work, we consider the problem of calculating the generalized Moore–Penrose inverse, which is essential in many applications of graph theory. We propose an algorithm for the massively parallel systems based on the recursive... more
We address the parallelization of the LU factorization of hierarchical matrices (H-matrices) arising from boundary element methods. Our approach exploits task-parallelism via the OmpSs programming model and runtime, which discovers the... more
We present a prototype task-parallel algorithm for the solution of hierarchical symmetric positive definite linear systems via the ℋ-Cholesky factorization that builds upon the parallel programming standards and associated runtimes for... more
We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target the scenario... more
Download research papers for free!