Sivasankaran Rajamanickam

Followers

Following

Co-authors

Public Views

Interests

Uploads

Papers by Sivasankaran Rajamanickam

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate

by William Hager and Sivasankaran Rajamanickam

Acm Transactions on Mathematical Software, 2008

CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the f... more CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AA T , updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for both symmetric and unsymmetric matrices. Its supernodal Cholesky factorization relies on LAPACK and the Level-3 BLAS, and obtains a substantial fraction of the peak performance of the BLAS. Both real and complex matrices are supported. CHOLMOD is written in ANSI/ISO C, with both C and MATLAB TM interfaces. It appears in MATLAB 7.2 as x=A\b when A is sparse symmetric positive definite, as well as in several other sparse matrix functions.

Download

Zoltan2: Next-Generation Combinatorial Toolkit

by Karen Devine, Sivasankaran Rajamanickam, and V. Leung

BFS and Coloring-Based Parallel Algorithms for Strongly Connected Components and Related Problems

2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

A Hybrid Approach for Parallel Transistor-Level Full-Chip Circuit Simulation

Lecture Notes in Computer Science, 2015

High-Performance Graph Analytics on Manycore Processors

2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

by Mark Hoemmen, Ichitaro Yamazaki, and Sivasankaran Rajamanickam

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, 2014

Scalable matrix computations on large scale-free graphs using 2D graph partitioning

by Sivasankaran Rajamanickam and Karen Devine

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13, 2013

Scalable parallel computing is essential for processing large scale-free (power-law) graphs. The ... more Scalable parallel computing is essential for processing large scale-free (power-law) graphs. The distribution of data across processes becomes important on distributed-memory computers with thousands of cores. It has been shown that twodimensional layouts (edge partitioning) can have significant advantages over traditional one-dimensional layouts. However, simple 2D block distribution does not use the structure of the graph, and more advanced 2D partitioning methods are too expensive for large graphs. We propose a new two-dimensional partitioning algorithm that combines graph partitioning with 2D block distribution. The computational cost of the algorithm is essentially the same as 1D graph partitioning. We study the performance of sparse matrix-vector multiplication (SpMV) for scale-free graphs from the web and social networks using several different partitioners and both 1D and 2D data layouts. We show that SpMV run time is reduced by exploiting the graph's structure. Contrary to popular belief, we observe that current graph and hypergraph partitioners often yield relatively good partitions on scale-free graphs. We demonstrate that our new 2D partitioning method consistently outperforms the other methods considered, for both SpMV and an eigensolver, on matrices with up to 1.6 billion nonzeros using up to 16,384 cores.

Download

Using architecture information and real-time resource state to reduce power consumption and communication costs in parallel applications

Multi-Jagged: A Scalable Parallel Spatial Partitioning Algorithm

IEEE Transactions on Parallel and Distributed Systems, 2015

ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms

With the ubiquity of multicore processors, it is crucial that solvers adapt to the hierarchical s... more With the ubiquity of multicore processors, it is crucial that solvers adapt to the hierarchical structure of modern architectures. We present ShyLU, a "hybrid-hybrid" solver for general sparse linear systems that is hybrid in two ways: First, it combines direct and iterative methods. The iterative part is based on approximate Schur complements where we compute the approximate Schur complement using a value-based dropping strategy or structure-based probing strategy.

Download

PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks

2014 IEEE International Conference on Big Data (Big Data), 2014

Multithreaded Algorithms for Maxmum Matching in Bipartite Graphs

2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012

We design, implement, and evaluate algorithms for computing a matching of maximum cardinality in ... more We design, implement, and evaluate algorithms for computing a matching of maximum cardinality in a bipartite graph on multicore and massively multithreaded computers. As computers with larger numbers of slower cores dominate the commodity processor market, the design of multithreaded algorithms to solve large matching problems becomes a necessity. Recent work on serial algorithms for the matching problem has shown that their performance is sensitive to the order in which the vertices are processed for matching. In a multithreaded environment, imposing a serial order in which vertices are considered for matching would lead to loss of concurrency and performance. But this raises the question: Would parallel matching algorithms on multithreaded machines improve performance over a serial algorithm?

Download

Exploiting Geometric Partitioning in Task Mapping for Parallel Computers

2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

ABSTRACT We present a new method for mapping applications&#39; MPI tasks to cores of a parall... more ABSTRACT We present a new method for mapping applications&#39; MPI tasks to cores of a parallel computer such that communication and execution time are reduced. We consider the case of sparse node allocation within a parallel machine, where the nodes assigned to a job are not necessarily located within a contiguous block nor within close proximity to each other in the network. The goal is to assign tasks to cores so that interdependent tasks are performed by &quot;nearby&quot; cores, thus lowering the distance messages must travel, the amount of congestion in the network, and the overall cost of communication. Our new method applies a geometric partitioning algorithm to both the tasks and the processors, and assigns task parts to the corresponding processor parts. We show that, for the structured finite difference mini-app Mini Ghost, our mapping method reduced execution time 34% on average on 65,536 cores of a Cray XE6. In a molecular dynamics mini-app, Mini MD, our mapping method reduced communication time by 26% on average on 6144 cores. We also compare our mapping with graph-based mappings from the LibTopoMap library and show that our mappings reduced the communication time on average by 15% in MiniGhost and 10% in MiniMD.

Electrical modeling and simulation for stockpile stewardship

by Sivasankaran Rajamanickam and Heidi Thornquist

XRDS: Crossroads, The ACM Magazine for Students, 2013

ABSTRACT

Towards Extreme-Scale Simulations for Low Mach Fluids with Second-Generation Trilinos

by Mark Hoemmen, Stefan P Domino, and Sivasankaran Rajamanickam

Parallel Processing Letters, 2014

ABSTRACT

Towards Extreme-Scale Simulations with Next-Generation Trilinos: A Low Mach Fluid Application Case Study

by Mark Hoemmen, Stefan P Domino, and Sivasankaran Rajamanickam

2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Trilinos is an object-oriented software framework for the solution of large-scale, complex multip... more Trilinos is an object-oriented software framework for the solution of large-scale, complex multiphysics engineering and scientific problems. While the original version of Trilinos was designed for highly scalable solutions for large problems, the need for increasingly higher fidelity simulations has pushed the problem sizes beyond what could have been envisioned two decades ago. When problem sizes exceed a billion elements even highly scalable applications and solver stacks require a complete revision. The next-generation Trilinos employs C++ templates in order to solve arbitrarily large problems and enable extreme-scale simulations. We present a case study that involves integration of Trilinos with an engineering application (Sierra low Mach module/Nalu), involving the simulation of low Mach fluid flow for problems of size up to nine billion elements. Through the use of improved algorithms and better software engineering practices, we demonstrate good weak scaling for the matrix assembly and solve for the engineering application for up to a nine billion element fluid flow large eddy simulation (LES) problem on unstructured meshes with a 27 billion row matrix on 131,072 cores of a Cray XE6 platform.

Download

Poster: a hybrid-hybrid solver for manycore platforms

Tutorial: the Zoltan toolkit

by C. Chevalier, Sivasankaran Rajamanickam, and Karen Devine

The Energy Citations Database (ECD) provides access to historical and current research (1948 to t... more

Load balancing with Zoltan and Isorropia

by C. Chevalier, Sivasankaran Rajamanickam, and Karen Devine

The Energy Citations Database (ECD) provides access to historical and current research (1948 to t... more

Sivasankaran Rajamanickam

Uploads

Papers by Sivasankaran Rajamanickam

Log In