Reliable Simulations on Many-Core Architectures

Claus Braun

Outline

Title

Reliable Simulations on Many-Core Architectures

Claus Braun

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

For the investigation of extrinsic pro-apoptotic signaling pathways, systems of stochastic differential equations are solved by Euler-Maruyama approximation. Forces and moments between particles are computed in parallel on GPGPU Density Functional Theory on GPGPU Development and parallelization of Density Functional Theory (DFT) methods for tightly coupled GPGPU many-core architectures. Focus: Parallelized computation of orbitals and density matrices. Matrix formulation allows the use of developed fault-tolerant matrix operations (e.g. DGEMM

Related papers

On-Board Multi-GPU Molecular Dynamics

Jaime Eduardo Gonzalez-Díaz

Lecture Notes in Computer Science, 2013

Molecular dynamics simulations allow us to study the behavior of complex biomolecular systems. These simulations suffer a large computational complexity that leads to simulation times of several weeks in order to recreate just a few microseconds of a molecule's motion even on high-performance computing platforms. In recent years, state-ofthe-art molecular dynamics algorithms have benefited from the parallel computing capabilities of multicore systems, as well as GPUs used as co-processors. In this paper we present a parallel molecular dynamics algorithm for on-board multi-GPU architectures. We parallelize a stateof-the-art molecular dynamics algorithm at two levels. We employ a spatial partitioning approach to simulate the dynamics of one portion of a molecular system on each GPU, and we take advantage of direct communication between GPUs to transfer data among portions. We also parallelize the simulation algorithm to exploit the multi-processor computing model of GPUs. Most importantly, we present novel parallel algorithms to update the spatial partitioning and set up transfer data packages on each GPU. We demonstrate the feasibility and scalability of our proposal through a comparative study with NAMD, a well known parallel molecular dynamics implementation.

downloadDownload free PDF View PDFchevron_right

Tinker-HP: Accelerating Molecular Dynamics Simulations of Large Complex Systems with Advanced Point Dipole Polarizable Force Fields Using GPUs and Multi-GPU Systems

Luc-henri Jolly

Journal of Chemical Theory and Computation

We present the extension of the Tinker-HP package (Lagardere, et al. Chem. Sci. 2018, 9, 956−972) to the use of Graphics Processing Unit (GPU) cards to accelerate molecular dynamics simulations using polarizable many-body force fields. The new highperformance module allows for an efficient use of single-and multiple-GPU architectures ranging from research laboratories to modern supercomputer centers. After detailing an analysis of our general scalable strategy that relies on OPENACC and CUDA, we discuss the various capabilities of the package. Among them, the multiprecision possibilities of the code are discussed. If an efficient double precision implementation is provided to preserve the possibility of fast reference computations, we show that a lower precision arithmetic is preferred providing a similar accuracy for molecular dynamics while exhibiting superior performances. As Tinker-HP is mainly dedicated to accelerate simulations using new generation point dipole polarizable force field, we focus our study on the implementation of the AMOEBA model. Testing various NVIDIA platforms including 2080Ti, 3090, V100, and A100 cards, we provide illustrative benchmarks of the code for single-and multicards simulations on large biosystems encompassing up to millions of atoms. The new code strongly reduces time to solution and offers the best performances to date obtained using the AMOEBA polarizable force field. Perspectives toward the strong-scaling performance of our multinode massive parallelization strategy, unsupervised adaptive sampling and large scale applicability of the Tinker-HP code in biophysics are discussed. The present software has been released in phase advance on GitHub in link with the High Performance Computing community COVID-19 research efforts and is free for Academics (see https://github.com/ TinkerTools/tinker-hp).

downloadDownload free PDF View PDFchevron_right

SPFP: Speed without compromise—A mixed precision model for GPU accelerated molecular dynamics simulations

Andreas Götz

Computer Physics Communications, 2013

A new precision model is proposed for the acceleration of all-atom classical molecular dynamics (MD) simulations on graphics processing units (GPUs). This precision model replaces double precision arithmetic with fixed point integer arithmetic for the accumulation of force components as compared to a previously introduced model that uses mixed single/double precision arithmetic. This significantly boosts performance on modern GPU hardware without sacrificing numerical accuracy. We present an implementation for NVIDIA GPUs of both generalized Born implicit solvent simulations as well as explicit solvent simulations using the particle mesh Ewald (PME) algorithm for long-range electrostatics using this precision model. Tests demonstrate both the performance of this implementation as well as its numerical stability for constant energy and constant temperature biomolecular MD as compared to a double precision CPU implementation and double and mixed single/double precision GPU implementations.

downloadDownload free PDF View PDFchevron_right

Sop-GPU: Accelerating biomolecular simulations in the centisecond timescale using graphics processors

Valeri Barsegov, Ruxandra i. Dima

Proteins: Structure, Function, and Bioinformatics, 2010

downloadDownload free PDF View PDFchevron_right

General purpose molecular dynamics simulations fully implemented on graphics processing units

Christian Lorenz

Journal of Computational Physics, 2008

We present a GPU implementation of LAMMPS, a widely-used parallel molecular dynamics (MD) software package, and show 5x to 13x single node speedups versus the CPU-only version of LAMMPS. This new CUDA package for LAMMPS also enables multi-GPU simulation on hybrid heterogeneous clusters, using MPI for inter-node communication, CUDA kernels on the GPU for all methods working with particle data, and standard LAMMPS C++ code for CPU execution. Cell and neighbor list approaches are compared for best performance on GPUs, with thread-peratom and block-per-atom neighbor list variants showing best performance at low and high neighbor counts, respectively. Computational performance results of GPU-enabled LAMMPS are presented for a variety of materials classes (e.g. biomolecules, polymers, metals, semiconductors), along with a speed comparison versus other available GPU-enabled MD software. Finally, we show strong and weak scaling performance on a CPU/GPU cluster using up to 128 dual GPU nodes.

downloadDownload free PDF View PDFchevron_right

Scaling molecular dynamics beyond 100,000 processor cores for large‐scale biophysical simulations

Will Fischer

Journal of Computational Chemistry, 2019

The growing interest in the complexity of biological interactions is continuously driving the need to increase system size in biophysical simulations, requiring not only powerful and advanced hardware but adaptable software that can accommodate a large number of atoms interacting through complex forcefields. To address this, we developed and implemented strategies in the GENESIS molecular dynamics package designed for large numbers of processors. Long‐range electrostatic interactions were parallelized by minimizing the number of processes involved in communication. A novel algorithm was implemented for nonbonded interactions to increase single instruction multiple data (SIMD) performance, reducing memory usage for ultra large systems. Memory usage for neighbor searches in real‐space nonbonded interactions was reduced by approximately 80%, leading to significant speedup. Using experimental data describing physical 3D chromatin interactions, we constructed the first atomistic model of...

downloadDownload free PDF View PDFchevron_right

Scalable molecular dynamics on CPU and GPU architectures with NAMD

Zaida Luthey-schulten

The Journal of Chemical Physics, 2020

NAMD is a molecular dynamics program designed for high-performance simulations of very large biological objects on CPU- and GPU-based architectures. NAMD offers scalable performance on petascale parallel supercomputers consisting of hundreds of thousands of cores, as well as on inexpensive commodity clusters commonly found in academic environments. It is written in C++ and leans on Charm++ parallel objects for optimal performance on low-latency architectures. NAMD is a versatile, multipurpose code that gathers state-of-the-art algorithms to carry out simulations in apt thermodynamic ensembles, using the widely popular CHARMM, AMBER, OPLS, and GROMOS biomolecular force fields. Here, we review the main features of NAMD that allow both equilibrium and enhanced-sampling molecular dynamics simulations with numerical efficiency. We describe the underlying concepts utilized by NAMD and their implementation, most notably for handling long-range electrostatics; controlling the temperature, p...

downloadDownload free PDF View PDFchevron_right

GPU-accelerated molecular modeling coming of age

David J Hardy

2010

Graphics processing units (GPUs) have traditionally been used in molecular modeling solely for visualization of molecular structures and animation of trajectories resulting from molecular dynamics simulations. Modern GPUs have evolved into fully programmable, massively parallel co-processors that can now be exploited to accelerate many scientific computations, typically providing about one order of magnitude speedup over CPU code and in special cases providing speedups of two orders of magnitude. This paper surveys the development of molecular modeling algorithms that leverage GPU computing, the advances already made and remaining issues to be resolved, and the continuing evolution of GPU technology that promises to become even more useful to molecular modeling. Hardware acceleration with commodity GPUs is expected to benefit the overall computational biology community by bringing teraflops performance to desktop workstations and in some cases potentially changing what were formerly batch-mode computational jobs into interactive tasks.

downloadDownload free PDF View PDFchevron_right

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

Adnan Ozsoy

Bioinformatics and Computational …, 2009

downloadDownload free PDF View PDFchevron_right

Tinker-HP : Accelerating Molecular Dynamics Simulations of Large Complex Systems with Advanced Point Dipole Polarizable Force Fields using GPUs and Multi-GPUs systems

Luc-henri Jolly

ArXiv, 2020

We present the extension of the Tinker-HP package (Lagard\`ere et al., Chem. Sci., 2018,9, 956-972) to the use of Graphics Processing Unit (GPU) cards to accelerate molecular dynamics simulations using polarizable many-body force fields. The new high-performance module allows for an efficient use of single- and multi-GPU architectures ranging from research laboratories to modern pre-exascale supercomputer centers. After detailing an analysis of our general scalable strategy that relies on OpenACC and CUDA, we discuss the various capabilities of the package. Among them, the multi-precision possibilities of the code are discussed. If an efficient double precision implementation is provided to preserve the possibility of fast reference computations, we show that a lower precision arithmetic is preferred providing a similar accuracy for molecular dynamics while exhibiting superior performances. As Tinker-HP is mainly dedicated to accelerate simulations using new generation point dipole ...

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (4)

Claus Braun and Hans-Joachim Wunderlich, "Algorithm-based fault tolerance for many-core architectures," in 15th IEEE European Test Symposium (ETS), Prague, 2010, p. 253.
Claus Braun and Hans-Joachim Wunderlich, "Algorithmen-basierte Fehlertoleranz für Many- Core-Architekturen," it -Information Technology, vol. 52, no. 4, pp. 209-215, August 2010.
K. H. Huang and J. A. Abraham, "Algorithm-based fault tolerance for matrix operations," IEEE Transactions on Computers, vol. C-33, no. 6, pp. 518-528, June 1984.
-17. June 2011 Institute of Computer Architecture and Computer Engineering Ultra parallel many-cores Hundreds of processing cores Performance improvements of > 70% per year

Claus Braun

2012 IEEE International Conference on Bioinformatics and Biomedicine, 2012

Apoptosis, the programmed cell death, is a physiological process that handles the removal of unwanted or damaged cells in living organisms. The process itself is initiated by signaling through tumor necrosis factor (TNF) receptors and ligands, which form clusters on the cell membrane. The exact function of this process is not yet fully understood and currently subject of basic research. Different mathematical models have been developed to describe and simulate the apoptotic receptor-clustering. In this interdisciplinary work, a previously introduced model of the apoptotic receptor-clustering has been extended by a new receptor type to allow a more precise description and simulation of the signaling process. Due to the high computational requirements of the model, an ef?cient algorithmic mapping to a modern many-core GPGPU architecture has been developed. Such architectures enable high-performance computing (HPC) simulation tasks on the desktop at low costs. The developed mapping reduces average simulation times from months to days (peak speedup of 256x), allowing the productive use of the model in research.

downloadDownload free PDF View PDFchevron_right

A comparison between parallelization approaches in molecular dynamics simulations on GPUs

Flavio Romano

We test the performances of two different approaches to the computation of forces for molecular dynamics simulations on Graphics Processing Units. A "vertex-based" approach, where a computing thread is started per particle, is compared to a newly proposed "edge-based" approach, where a thread is started per each potentially non-zero interaction. We find that the former is more efficient for systems with many simple interactions per particle, while the latter is more efficient if the system has more complicated interactions or fewer of them. By comparing computation times on more and less recent GPU technology, we predict that, if the current trend of increasing the number of processing cores -as opposed to their computing power -remains, the "edge-based" approach will gradually become the most efficient choice in an increasing number of cases.

downloadDownload free PDF View PDFchevron_right

Scalable On-Board Multi-GPU Simulation of Long-Range Molecular Dynamics

Jaime Eduardo Gonzalez-Díaz

Lecture Notes in Computer Science, 2014

Molecular dynamics simulations allow us to study the behavior of complex biomolecular systems by modeling the pairwise interaction forces between all atoms. Molecular systems are subject to slowly decaying electrostatic potentials, which turn molecular dynamics into an n-body problem. In this paper, we present a parallel and scalable solution to compute long-range molecular forces, based on the multilevel summation method (MSM). We first demonstrate an optimization of MSM that replaces 3D convolutions with FFTs, and we achieve a single-GPU performance comparable to the particle mesh Ewald (PME) method, the de facto standard for long-range molecular force computation. But most importantly, we propose a distributed MSM that avoids the scalability difficulties of PME. Our distributed solution is based on a spatial partitioning of the MSM multilevel grid, together with massively parallel algorithms for interface update and synchronization. We demonstrate the scalability of our approach on an on-board multi-GPU platform.

downloadDownload free PDF View PDFchevron_right

GPGPU performance evaluation of some basic molecular dynamics algorithms

Boris Potapkin

2015 International Conference on High Performance Computing & Simulation (HPCS), 2015

Molecular dynamics is a computationally intensive problem but it is extremely amenable for parallel computation. Many-body potentials used for modeling of carbon and metallic nanostructures usually require much more computing resources than pair potentials. One of the ways to improve their performance is to transform them for running on computing systems that combines CPU and GPU. In this work OpenCL performance of basic molecular dynamics algorithms such as neighbor list generation along with different implementations of energy-force computation of Lennard-Jones, Tersoff and EAM potentials is evaluated. It is shown that concurrent memory writes are effective for Tersoff bond order potential and are not good for embedded-atom potential. Performance measurements show a significant GPU acceleration of basic molecular dynamics algorithms over the corresponding serial implementations.

downloadDownload free PDF View PDFchevron_right

Accelerating molecular modeling applications with graphics processors

James F . Phillips

Journal of Computational Chemistry, 2007

Molecular mechanics simulations offer a computational approach to study the behavior of biomolecules at atomic detail, but such simulations are limited in size and timescale by the available computing resources. Stateof-the-art graphics processing units (GPUs) can perform over 500 billion arithmetic operations per second, a tremendous computational resource that can now be utilized for general purpose computing as a result of recent advances in GPU hardware and software architecture. In this article, an overview of recent advances in programmable GPUs is presented, with an emphasis on their application to molecular mechanics simulations and the programming techniques required to obtain optimal performance in these cases. We demonstrate the use of GPUs for the calculation of long-range electrostatics and nonbonded forces for molecular dynamics simulations, where GPU-based calculations are typically 10-100 times faster than heavily optimized CPU-based implementations. The application of GPU acceleration to biomolecular simulation is also demonstrated through the use of GPU-accelerated Coulomb-based ion placement and calculation of time-averaged potentials from molecular dynamics trajectories. A novel approximation to Coulomb potential calculation, the multilevel summation method, is introduced and compared with direct Coulomb summation. In light of the performance obtained for this set of calculations, future applications of graphics processors to molecular dynamics simulations are discussed.

downloadDownload free PDF View PDFchevron_right

Molecular dynamics---Scalable algorithms for molecular dynamics simulations on commodity clusters

David Shaw

Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006

Although molecular dynamics (MD) simulations of biomolecular systems often run for days to months, many events of great scientific interest and pharmaceutical relevance occur on long time scales that remain beyond reach. We present several new algorithms and implementation techniques that significantly accelerate parallel MD simulations compared with current stateof-the-art codes. These include a novel parallel decomposition method and message-passing techniques that reduce communication requirements, as well as novel communication primitives that further reduce communication time. We have also developed numerical techniques that maintain high accuracy while using single precision computation in order to exploit processor-level vector instructions. These methods are embodied in a newly developed MD code called Desmond that achieves unprecedented simulation throughput and parallel scalability on commodity clusters. Our results suggest that Desmond's parallel performance substantially surpasses that of any previously described code. For example, on a standard benchmark, Desmond's performance on a conventional Opteron cluster with 2K processors slightly exceeded the reported performance of IBM's Blue Gene/L machine with 32K processors running its Blue Matter MD code.

downloadDownload free PDF View PDFchevron_right

Long-range interactions and parallel scalability in molecular simulations

Mikko Karttunen

2007

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modelling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes -we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e., communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

downloadDownload free PDF View PDFchevron_right

Reliable Simulations on Many-Core Architectures

Sign up for access to the world's latest research

Abstract

Related papers

References (4)

Related papers

Related topics