Reliable Simulations on Many-Core Architectures
Sign up for access to the world's latest research
Abstract
For the investigation of extrinsic pro-apoptotic signaling pathways, systems of stochastic differential equations are solved by Euler-Maruyama approximation. Forces and moments between particles are computed in parallel on GPGPU Density Functional Theory on GPGPU Development and parallelization of Density Functional Theory (DFT) methods for tightly coupled GPGPU many-core architectures. Focus: Parallelized computation of orbitals and density matrices. Matrix formulation allows the use of developed fault-tolerant matrix operations (e.g. DGEMM
Related papers
Lecture Notes in Computer Science, 2013
Molecular dynamics simulations allow us to study the behavior of complex biomolecular systems. These simulations suffer a large computational complexity that leads to simulation times of several weeks in order to recreate just a few microseconds of a molecule's motion even on high-performance computing platforms. In recent years, state-ofthe-art molecular dynamics algorithms have benefited from the parallel computing capabilities of multicore systems, as well as GPUs used as co-processors. In this paper we present a parallel molecular dynamics algorithm for on-board multi-GPU architectures. We parallelize a stateof-the-art molecular dynamics algorithm at two levels. We employ a spatial partitioning approach to simulate the dynamics of one portion of a molecular system on each GPU, and we take advantage of direct communication between GPUs to transfer data among portions. We also parallelize the simulation algorithm to exploit the multi-processor computing model of GPUs. Most importantly, we present novel parallel algorithms to update the spatial partitioning and set up transfer data packages on each GPU. We demonstrate the feasibility and scalability of our proposal through a comparative study with NAMD, a well known parallel molecular dynamics implementation.
Journal of Chemical Theory and Computation
We present the extension of the Tinker-HP package (Lagardere, et al. Chem. Sci. 2018, 9, 956−972) to the use of Graphics Processing Unit (GPU) cards to accelerate molecular dynamics simulations using polarizable many-body force fields. The new highperformance module allows for an efficient use of single-and multiple-GPU architectures ranging from research laboratories to modern supercomputer centers. After detailing an analysis of our general scalable strategy that relies on OPENACC and CUDA, we discuss the various capabilities of the package. Among them, the multiprecision possibilities of the code are discussed. If an efficient double precision implementation is provided to preserve the possibility of fast reference computations, we show that a lower precision arithmetic is preferred providing a similar accuracy for molecular dynamics while exhibiting superior performances. As Tinker-HP is mainly dedicated to accelerate simulations using new generation point dipole polarizable force field, we focus our study on the implementation of the AMOEBA model. Testing various NVIDIA platforms including 2080Ti, 3090, V100, and A100 cards, we provide illustrative benchmarks of the code for single-and multicards simulations on large biosystems encompassing up to millions of atoms. The new code strongly reduces time to solution and offers the best performances to date obtained using the AMOEBA polarizable force field. Perspectives toward the strong-scaling performance of our multinode massive parallelization strategy, unsupervised adaptive sampling and large scale applicability of the Tinker-HP code in biophysics are discussed. The present software has been released in phase advance on GitHub in link with the High Performance Computing community COVID-19 research efforts and is free for Academics (see https://github.com/ TinkerTools/tinker-hp).
Computer Physics Communications, 2013
A new precision model is proposed for the acceleration of all-atom classical molecular dynamics (MD) simulations on graphics processing units (GPUs). This precision model replaces double precision arithmetic with fixed point integer arithmetic for the accumulation of force components as compared to a previously introduced model that uses mixed single/double precision arithmetic. This significantly boosts performance on modern GPU hardware without sacrificing numerical accuracy. We present an implementation for NVIDIA GPUs of both generalized Born implicit solvent simulations as well as explicit solvent simulations using the particle mesh Ewald (PME) algorithm for long-range electrostatics using this precision model. Tests demonstrate both the performance of this implementation as well as its numerical stability for constant energy and constant temperature biomolecular MD as compared to a double precision CPU implementation and double and mixed single/double precision GPU implementations.
Proteins: Structure, Function, and Bioinformatics, 2010
Journal of Computational Physics, 2008
We present a GPU implementation of LAMMPS, a widely-used parallel molecular dynamics (MD) software package, and show 5x to 13x single node speedups versus the CPU-only version of LAMMPS. This new CUDA package for LAMMPS also enables multi-GPU simulation on hybrid heterogeneous clusters, using MPI for inter-node communication, CUDA kernels on the GPU for all methods working with particle data, and standard LAMMPS C++ code for CPU execution. Cell and neighbor list approaches are compared for best performance on GPUs, with thread-peratom and block-per-atom neighbor list variants showing best performance at low and high neighbor counts, respectively. Computational performance results of GPU-enabled LAMMPS are presented for a variety of materials classes (e.g. biomolecules, polymers, metals, semiconductors), along with a speed comparison versus other available GPU-enabled MD software. Finally, we show strong and weak scaling performance on a CPU/GPU cluster using up to 128 dual GPU nodes.
Journal of Computational Chemistry, 2019
The growing interest in the complexity of biological interactions is continuously driving the need to increase system size in biophysical simulations, requiring not only powerful and advanced hardware but adaptable software that can accommodate a large number of atoms interacting through complex forcefields. To address this, we developed and implemented strategies in the GENESIS molecular dynamics package designed for large numbers of processors. Long‐range electrostatic interactions were parallelized by minimizing the number of processes involved in communication. A novel algorithm was implemented for nonbonded interactions to increase single instruction multiple data (SIMD) performance, reducing memory usage for ultra large systems. Memory usage for neighbor searches in real‐space nonbonded interactions was reduced by approximately 80%, leading to significant speedup. Using experimental data describing physical 3D chromatin interactions, we constructed the first atomistic model of...
The Journal of Chemical Physics, 2020
NAMD is a molecular dynamics program designed for high-performance simulations of very large biological objects on CPU- and GPU-based architectures. NAMD offers scalable performance on petascale parallel supercomputers consisting of hundreds of thousands of cores, as well as on inexpensive commodity clusters commonly found in academic environments. It is written in C++ and leans on Charm++ parallel objects for optimal performance on low-latency architectures. NAMD is a versatile, multipurpose code that gathers state-of-the-art algorithms to carry out simulations in apt thermodynamic ensembles, using the widely popular CHARMM, AMBER, OPLS, and GROMOS biomolecular force fields. Here, we review the main features of NAMD that allow both equilibrium and enhanced-sampling molecular dynamics simulations with numerical efficiency. We describe the underlying concepts utilized by NAMD and their implementation, most notably for handling long-range electrostatics; controlling the temperature, p...
2010
Graphics processing units (GPUs) have traditionally been used in molecular modeling solely for visualization of molecular structures and animation of trajectories resulting from molecular dynamics simulations. Modern GPUs have evolved into fully programmable, massively parallel co-processors that can now be exploited to accelerate many scientific computations, typically providing about one order of magnitude speedup over CPU code and in special cases providing speedups of two orders of magnitude. This paper surveys the development of molecular modeling algorithms that leverage GPU computing, the advances already made and remaining issues to be resolved, and the continuing evolution of GPU technology that promises to become even more useful to molecular modeling. Hardware acceleration with commodity GPUs is expected to benefit the overall computational biology community by bringing teraflops performance to desktop workstations and in some cases potentially changing what were formerly batch-mode computational jobs into interactive tasks.
Bioinformatics and Computational …, 2009
ArXiv, 2020
We present the extension of the Tinker-HP package (Lagard\`ere et al., Chem. Sci., 2018,9, 956-972) to the use of Graphics Processing Unit (GPU) cards to accelerate molecular dynamics simulations using polarizable many-body force fields. The new high-performance module allows for an efficient use of single- and multi-GPU architectures ranging from research laboratories to modern pre-exascale supercomputer centers. After detailing an analysis of our general scalable strategy that relies on OpenACC and CUDA, we discuss the various capabilities of the package. Among them, the multi-precision possibilities of the code are discussed. If an efficient double precision implementation is provided to preserve the possibility of fast reference computations, we show that a lower precision arithmetic is preferred providing a similar accuracy for molecular dynamics while exhibiting superior performances. As Tinker-HP is mainly dedicated to accelerate simulations using new generation point dipole ...

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (4)
- Claus Braun and Hans-Joachim Wunderlich, "Algorithm-based fault tolerance for many-core architectures," in 15th IEEE European Test Symposium (ETS), Prague, 2010, p. 253.
- Claus Braun and Hans-Joachim Wunderlich, "Algorithmen-basierte Fehlertoleranz für Many- Core-Architekturen," it -Information Technology, vol. 52, no. 4, pp. 209-215, August 2010.
- K. H. Huang and J. A. Abraham, "Algorithm-based fault tolerance for matrix operations," IEEE Transactions on Computers, vol. C-33, no. 6, pp. 518-528, June 1984.
- -17. June 2011 Institute of Computer Architecture and Computer Engineering Ultra parallel many-cores Hundreds of processing cores Performance improvements of > 70% per year