Academia.eduAcademia.edu

Graphics processing unit

description1,164 papers
group7 followers
lightbulbAbout this topic
A graphics processing unit (GPU) is a specialized electronic circuit designed to accelerate the rendering of images and video by performing rapid mathematical calculations. It is essential in computer graphics, enabling efficient processing of complex visual data and enhancing performance in applications such as gaming, simulations, and machine learning.
lightbulbAbout this topic
A graphics processing unit (GPU) is a specialized electronic circuit designed to accelerate the rendering of images and video by performing rapid mathematical calculations. It is essential in computer graphics, enabling efficient processing of complex visual data and enhancing performance in applications such as gaming, simulations, and machine learning.

Key research themes

1. How can CPU and GPU be efficiently combined to optimize heterogeneous computing performance?

This theme investigates strategies for workload partitioning, memory management, communication, and synchronization to maximize overall system performance when CPUs and GPUs operate cooperatively. Such heterogeneous computing leverages fundamentally different architectures—the latency-optimized CPU and throughput-oriented GPU—to accelerate data-intensive and parallel applications. Proper optimization must address load balancing, data access latency, and minimizing communication overhead to realize the full potential of hybrid systems.

Key finding: This chapter identifies and analyzes several critical optimization strategies for hybrid CPU-GPU systems: partitioning and load balancing computations between CPU and GPU; hierarchical memory usage to hide data access latency... Read more
Key finding: This study develops a hybrid analytical performance modeling approach to decide at runtime whether a computing kernel should execute on CPU or GPU within an OpenMP environment. Key contributions include modeling GPU memory... Read more
Key finding: This work characterizes the impact of GPU grid geometry (block and thread configuration) on kernel performance in OpenMP offloading contexts, demonstrating substantial performance improvements—up to 7x speedup—over naive... Read more
Key finding: AEminiumGPU framework facilitates writing data-parallel programs that transparently execute on CPUs or GPUs by compiling Java Map-Reduce style programs into hybrid executables. A notable contribution is the use of machine... Read more

2. What programming models and architectural features enable effective GPU task scheduling for dynamic and irregular workloads?

Traditional GPU programming predominantly targets static, data-parallel workloads, but many advanced applications have dynamic, irregular, or recursive parallelism patterns that challenge existing shading or compute language models. Research in this theme is focused on developing programming abstractions and scheduling models that support multiple instruction streams, dynamic work creation, fine-grained load balancing, data locality preservation, and varying parallelism granularities to better map irregular tasks onto massively parallel GPU architectures.

Key finding: Whippletree introduces a task-based programming model for GPUs addressing four critical requirements for dynamic irregular workloads: multiprogramming (MIMD), dynamic work generation, preserving data locality via shared... Read more

3. How can GPUs be leveraged and optimized for domain-specific high-performance computing applications?

This research area explores the implementation and acceleration of computationally demanding scientific and engineering problems using GPUs. It involves algorithm redesign, architecture-tailored numerical methods, and efficient programming to exploit GPUs’ massive parallelism and memory hierarchies. Across domains such as thermal simulations, fluid dynamics, radar signal processing, and bioinformatics, GPU computing enables orders-of-magnitude speedups, real-time analysis capability, and improved simulation fidelity, which are vital for engineering design, environmental monitoring, and biomedical research.

Key finding: This study presents a CUDA-C based framework specifically optimized for simulating conjugate heat transfer in squared heated cavities. By leveraging tailored GPU parallelization techniques, the framework achieves up to 99.7%... Read more
Key finding: This paper introduces a large-eddy simulation (LES) solver designed from scratch for modern NVIDIA GPUs with Ampere architecture to accelerate aerodynamic analysis of electric vertical takeoff and landing (eVTOL) aircraft.... Read more
Key finding: This work implements and optimizes Incoherent Scatter Radar plasma line spectral analysis algorithms on GPUs using NVIDIA CUDA. By mapping computationally expensive spectral binning, FFT, and parameter estimation operations... Read more
Key finding: The paper demonstrates a heterogeneous CPU+GPU implementation of Gaussian Process (GP) models for emulating computationally expensive simulators. By leveraging GPU parallelism for key linear algebraic operations (matrix... Read more
Key finding: This research introduces an implementation called LightKernel that exploits the cluster-based architecture of modern NVIDIA GPUs to improve predictability and timing guarantees in single-GPU real-time embedded systems. Using... Read more
Key finding: This study applies GPU acceleration to deep learning models for drug-target interaction prediction tasks pertinent to COVID-19. The DNN implementation leverages GPUs’ embarrassingly parallel computations during feed-forward... Read more
Key finding: This paper proposes a novel unsupervised weed detection method leveraging multispectral UAV imagery and the PC/BC-DIM neural network to generate fused saliency maps without requiring large labeled datasets or computationally... Read more

4. What are the architectural design considerations and comparative analyses for GPU programming models on modern supercomputers?

With the increasing reliance on GPUs in high-performance computing (HPC) facilities such as pre-exascale and exascale systems, this research area examines the performance tradeoffs, ease of use, portability, and optimization techniques of various GPU programming models. It involves evaluating vendor-supported languages (e.g., CUDA, HIP), directive-based models (OpenMP, OpenACC), and other abstractions (SYCL, Kokkos), especially focusing on AMD and NVIDIA GPUs in production supercomputers. Insights gained inform best practices for efficient GPU utilization and software portability across diverse HPC architectures.

Key finding: This work performs a comprehensive evaluation of widely used GPU programming models—CUDA, HIP, OpenMP offloading, OpenACC, hipSYCL, Kokkos, and Alpaka—on the LUMI supercomputer’s AMD MI250X GPUs, compared against NVIDIA... Read more

5. How can GPU memory access and data transfer mechanisms be enhanced using neural networks and advanced DMA controllers to improve multimedia and computing system performance?

GPU performance is often bottlenecked by inefficient memory access and data transfer between host and device. This research direction develops intelligent direct memory access (DMA) controllers leveraging back-propagation neural networks and advanced adaptive data placement to optimize memory channel usage, support high bandwidth transfers, and reduce power consumption. Applications include multimedia processing and heterogeneous computing involving GPU-FPGA systems, showing performance gains and reduced latency crucial for high-throughput GPU workloads.

Key finding: This work proposes a back-propagation algorithm (BPA) based neural network to dynamically control the DMA engine for multimedia applications on GPUs. By training the network to optimize gradient loss and adapting to various... Read more

All papers in Graphics processing unit

Sequence alignment remains a fundamental problem with practical applications ranging from pattern recognition to computational biology. Traditional algorithms based on dynamic programming are hard to parallelize, require significant... more
SummaryRealistic applications of numerical modeling of acoustic wave dynamics usually demand high‐performance computing because of the large size of study domains and demanding accuracy requirements on simulation results. Forward modeling... more
A new strategy is proposed for implementing computationally intensive high-throughput decoders based on the long length irregular LDPC codes adopted in the DVB-S2 standard. It is supported on manycore graphics processing unit (GPU)... more
A full ring ultrasonic array-based photoacoustic tomography system was recently developed for small animal brain imaging. The 512-element array is cylindrically focused in the elevational direction, and can acquire a twodimensional (2D)... more
In this contribution we describe a specialised data processing system for Spectral Optical Coherence Tomography (SOCT) biomedical imaging which utilises massively parallel data processing on a low-cost, Graphics Processing Unit (GPU). One... more
The Absolute Nodal Coordinate Formulation (ANCF) has been widely used to carry out the dynamics analysis of flexible bodies that undergo large rotation and large deformation. This formulation is consistent with the nonlinear theory of... more
In this paper we address the massive parallelization of the characterization of heartbeats by means of Graphics Processors. Heartbeats are represented with Hermite polynomials due to the compactness and robustness of this representation.... more
Mesh simplification and mesh compression are important processes in computer graphics and scientific computing, as such contexts allow for a mesh which takes up less memory than the original mesh. Current simplification and compression... more
The graphics processing unit (GPU) has emerged as a powerful and cost effective processor for general performance computing. GPUs are capable of an order of magnitude more floating-point operations per second as compared to modern central... more
The graphics processing unit (GPU) has emerged as a powerful and cost effective processor for general performance computing. GPUs are capable of an order of magnitude more floating-point operations per second as compared to modern central... more
Optical mapping of action potentials or calcium transients in contracting cardiac tissues are challenging because of the severe sensitivity of the measurements to motion. The measurements rely on the accurate numerical tracking and... more
The compressive sensing (CS) theory shows that real signals can be exactly recovered from very few samplings. Inspired by the CS theory, the interior problem in computed tomography is proved uniquely solvable by minimizing the... more
GPUs are increasingly being used in security applications, especially for accelerating encryption/decryption. While GPUs are an attractive platform in terms of performance, the security of these devices raises a number of concerns. One... more
The main purpose of the study is to determine the effect of the Spaced Learning Method on learners' scientific literacy skills and performance in Science 10. This research also aims to determine the extent of spaced learning method and... more
Quantitative sodium magnetic resonance imaging permits noninvasive measurement of the tissue sodium concentration (TSC) bioscale in the brain. Computing the TSC bioscale requires reconstructing and combining multiple datasets acquired... more
High Efficiency Video Coding (HEVC), the latest video compression standard, will play an important role in many multimedia applications in the foreseeable future. Its superior compression performance enables HEVC to be particularly... more
The high efficiency video coding standard provides excellent coding performance but is also very complex. Especially, the intra mode decision is very time-consuming due to the large number of available prediction modes and the flexible... more
High Efficiency Video Coding (HEVC), the latest video compression standard, will play an important role in many multimedia applications in the foreseeable future. Its superior compression performance enables HEVC to be particularly... more
Wavelet transform (WT) is widely used in signal processing. The frequency modulation reflectometer in the KSTAR applies this technique to get the phase information from the mixer output measurements. Since WT is a time consuming process,... more
Wide baseline matching is the state of the art for object recognition and image registration problems in computer vision. Though effective, the computational expense of these algorithms limits their application to many real-world... more
Full resolution electron microscopic tomographic (EMT) reconstruction of large-scale tilt series requires significant computing power. The desire to perform multiple cycles of iterative reconstruction and realignment dramatically... more
The estimation of many unknown parameters is carried out using a simplified Sequential Importance Sampling (SIS) algorithm which is implemented in a graphic processing unit (GPU). The aim of the present work is to show technical points to... more
Positron emission tomography (PET) is an important imaging modality in both clinical usage and research studies. We have developed a compact high-sensitivity PET system that consisted of two large-area panel PET detector heads, which... more
relacionados con la dinámica de fluidos. res pro emas c sicos de di erente nive de comp e idad: convección-difusión en un canal, la cavidad movida por pared y la cavidad movida por diferencia de temperatura, fueron solucionados por el... more
An exponential increase in the speed of DNA sequencing over the past decade has driven demand for fast, spaceefficient algorithms to process the resultant data. The first step in processing is alignment of many short DNA sequences, or... more
An exponential increase in the speed of DNA sequencing over the past decade has driven demand for fast, space-efficient algorithms to process the resultant data. The first step in processing is alignment of many short DNA sequences, or... more
ROMS is software that models and simulates an ocean region using a finite difference grid and time stepping. ROMS simulations can take from hours to days to complete due to the compute-intensive nature of the software. As a result, the... more
The growing adoption of supercomputers across various scientific disciplines, particularly by researchers without a background in computer science, has intensified the demand for parallel applications. These applications are typically... more
The growing adoption of supercomputers across various scientific disciplines, particularly by researchers without a background in computer science, has intensified the demand for parallel applications. These applications are typically... more
This paper is concerned with the development of a new GPU (Graphics Processing Units) accelerated computational tool for validating the effectiveness of a hypervelocity kinetic impact and a subsequent nuclear subsurface explosion.... more
Graphics Processing Unit (GPU) computing is becoming an alternate computing platform for numerical simulations. However, it is not clear which numerical scheme will provide the highest computational efficiency for different types of... more
Digital Breast Tomosynthesis (DBT) is a modern 3D Computed Tomography X-ray technique for the early detection of breast tumors, which is receiving growing interest in the medical and scientific community. Since DBT performs incomplete... more
В этой статье рассматриваются фундаментальные различия между алгебраической структурой поля и структурой векторного пространства. Несмотря на внешнюю схожесть — наличие операций сложения и умножения (насколько таковая используется) —... more
We extend the number sorting algorithms on the GPU to sort large multi-field records. We notice that traditional way of sorting the records by first sorting a (key, index) pair to obtain the sorted permutation of the records followed by... more
Non-orthogonal multiple access (NOMA) can achieve high throughput by using the same time and frequency resources for multiple users. NOMA distinguishes multiple users in power domain by computationally-heavy successive interference... more
Non-orthogonal multiple access (NOMA) can achieve high throughput by using the same time and frequency resources for multiple users. NOMA distinguishes multiple users in power domain by computationally-heavy successive interference... more
For many finite element problems, when represented as sparse matrices, iterative solvers are found to be unreliable because they can impose com-putational bottlenecks. Early pioneering work by Duff et al, explored an alternative strategy... more
Download research papers for free!