Graphics processing unit

description1,164 papers

group7 followers

lightbulbAbout this topic

A graphics processing unit (GPU) is a specialized electronic circuit designed to accelerate the rendering of images and video by performing rapid mathematical calculations. It is essential in computer graphics, enabling efficient processing of complex visual data and enhancing performance in applications such as gaming, simulations, and machine learning.

lightbulbAbout this topic

Key research themes

1. How can CPU and GPU be efficiently combined to optimize heterogeneous computing performance?

This theme investigates strategies for workload partitioning, memory management, communication, and synchronization to maximize overall system performance when CPUs and GPUs operate cooperatively. Such heterogeneous computing leverages fundamentally different architectures—the latency-optimized CPU and throughput-oriented GPU—to accelerate data-intensive and parallel applications. Proper optimization must address load balancing, data access latency, and minimizing communication overhead to realize the full potential of hybrid systems.

CPU-GPU Computing

by Xiongwei Fei

2022, Innovative Research and Applications in Next-Generation High Performance Computing

Key finding: This chapter identifies and analyzes several critical optimization strategies for hybrid CPU-GPU systems: partitioning and load balancing computations between CPU and GPU; hierarchical memory usage to hide data access latency... Read more

articleView Paper downloadDownload

Toward an Analytical Performance Model to Select between GPU and CPU Execution

by José Amaral

2024, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Key finding: This study develops a hybrid analytical performance modeling approach to decide at runtime whether a computing kernel should execute on CPU or GPU within an OpenMP environment. Key contributions include modeling GPU memory... Read more

articleView Paper downloadDownload

Automated GPU Grid Geometry Selection for OPENMP Kernels

by José Amaral

2024, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Key finding: This work characterizes the impact of GPU grid geometry (block and thread configuration) on kernel performance in OpenMP offloading contexts, demonstrating substantial performance improvements—up to 7x speedup—over naive... Read more

articleView Paper downloadDownload

AEminiumGPU: An Intelligent Framework for GPU Programming

by Bruno Cabral

2021

Key finding: AEminiumGPU framework facilitates writing data-parallel programs that transparently execute on CPUs or GPUs by compiling Java Map-Reduce style programs into hybrid executables. A notable contribution is the use of machine... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What programming models and architectural features enable effective GPU task scheduling for dynamic and irregular workloads?

Traditional GPU programming predominantly targets static, data-parallel workloads, but many advanced applications have dynamic, irregular, or recursive parallelism patterns that challenge existing shading or compute language models. Research in this theme is focused on developing programming abstractions and scheduling models that support multiple instruction streams, dynamic work creation, fine-grained load balancing, data locality preservation, and varying parallelism granularities to better map irregular tasks onto massively parallel GPU architectures.

Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU

by Pedro Boechat

2015

Key finding: Whippletree introduces a task-based programming model for GPUs addressing four critical requirements for dynamic irregular workloads: multiprogramming (MIMD), dynamic work generation, preserving data locality via shared... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can GPUs be leveraged and optimized for domain-specific high-performance computing applications?

This research area explores the implementation and acceleration of computationally demanding scientific and engineering problems using GPUs. It involves algorithm redesign, architecture-tailored numerical methods, and efficient programming to exploit GPUs’ massive parallelism and memory hierarchies. Across domains such as thermal simulations, fluid dynamics, radar signal processing, and bioinformatics, GPU computing enables orders-of-magnitude speedups, real-time analysis capability, and improved simulation fidelity, which are vital for engineering design, environmental monitoring, and biomedical research.

Accelerating Conjugate Heat Transfer Simulations in Squared Heated Cavities through Graphics Processing Unit (GPU) Computing

by César Augusto

2024, Computation

Key finding: This study presents a CUDA-C based framework specifically optimized for simulating conjugate heat transfer in squared heated cavities. By leveraging tailored GPU parallelization techniques, the framework achieves up to 99.7%... Read more

articleView Paper downloadDownload

GPU-accelerated simulations for eVTOL aerodynamic analysis

by Stefan Hickel

2024, AIAA SCITECH 2023 Forum

Key finding: This paper introduces a large-eddy simulation (LES) solver designed from scratch for modern NVIDIA GPUs with Ampere architecture to accelerate aerodynamic analysis of electric vertical takeoff and landing (eVTOL) aircraft.... Read more

articleView Paper downloadDownload

GPGPU Acceleration of Incoherent Scatter Radar Plasma Line Analysis

by Natalie Hilliard

2024, arXiv (Cornell University)

Key finding: This work implements and optimizes Incoherent Scatter Radar plasma line spectral analysis algorithms on GPUs using NVIDIA CUDA. By mapping computationally expensive spectral binning, FFT, and parameter estimation operations... Read more

articleView Paper downloadDownload

A Short Note on Gaussian Process Modeling for Large Datasets using Graphics Processing Units

by Pritam Ranjan

2025

Key finding: The paper demonstrates a heterogeneous CPU+GPU implementation of Gaussian Process (GP) models for emulating computationally expensive simulators. By leveraging GPU parallelism for key linear algebraic operations (matrix... Read more

articleView Paper downloadDownload

Enabling predictable parallelism in single-GPU systems with persistent CUDA threads

by Paolo Burgio

2024, arXiv (Cornell University)

Key finding: This research introduces an implementation called LightKernel that exploits the cluster-based architecture of modern NVIDIA GPUs to improve predictability and timing guarantees in single-GPU real-time embedded systems. Using... Read more

articleView Paper downloadDownload

Deep learning optimization for drug-target interaction prediction in COVID-19 using graphic processing unit

by Hendra Rahmawan

2023, International Journal of Power Electronics and Drive Systems

Key finding: This study applies GPU acceleration to deep learning models for drug-target interaction prediction tasks pertinent to COVID-19. The DNN implementation leverages GPUs’ embarrassingly parallel computations during feed-forward... Read more

articleView Paper downloadDownload

Saliency-Based Semantic Weeds Detection and Classification Using UAV Multispectral Imaging

by Javaid Aslam

2023, IEEE Access

Key finding: This paper proposes a novel unsupervised weed detection method leveraging multispectral UAV imagery and the PC/BC-DIM neural network to generate fused saliency maps without requiring large labeled datasets or computationally... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

4. What are the architectural design considerations and comparative analyses for GPU programming models on modern supercomputers?

With the increasing reliance on GPUs in high-performance computing (HPC) facilities such as pre-exascale and exascale systems, this research area examines the performance tradeoffs, ease of use, portability, and optimization techniques of various GPU programming models. It involves evaluating vendor-supported languages (e.g., CUDA, HIP), directive-based models (OpenMP, OpenACC), and other abstractions (SYCL, Kokkos), especially focusing on AMD and NVIDIA GPUs in production supercomputers. Insights gained inform best practices for efficient GPU utilization and software portability across diverse HPC architectures.

Evaluating GPU Programming Models for the LUMI Supercomputer

by Michael Bussmann

2024, Supercomputing Frontiers

Key finding: This work performs a comprehensive evaluation of widely used GPU programming models—CUDA, HIP, OpenMP offloading, OpenACC, hipSYCL, Kokkos, and Alpaka—on the LUMI supercomputer’s AMD MI250X GPUs, compared against NVIDIA... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

5. How can GPU memory access and data transfer mechanisms be enhanced using neural networks and advanced DMA controllers to improve multimedia and computing system performance?

GPU performance is often bottlenecked by inefficient memory access and data transfer between host and device. This research direction develops intelligent direct memory access (DMA) controllers leveraging back-propagation neural networks and advanced adaptive data placement to optimize memory channel usage, support high bandwidth transfers, and reduce power consumption. Applications include multimedia processing and heterogeneous computing involving GPU-FPGA systems, showing performance gains and reduced latency crucial for high-throughput GPU workloads.

Improving graphics processing unit performance based on neural network direct memory access controller

by Indonesian Journal of Electrical Engineering and Computer Science and

2024, Indonesian Journal of Electrical Engineering and Computer Science

Key finding: This work proposes a back-propagation algorithm (BPA) based neural network to dynamically control the DMA engine for multimedia applications on GPUs. By training the network to optimize gradient loss and adapting to various... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Graphics processing unit

Accelerating Edit-Distance Sequence Alignment on GPU Using the Wavefront Algorithm

by David Castells-Rufas

2025, IEEE Access

Sequence alignment remains a fundamental problem with practical applications ranging from pattern recognition to computational biology. Traditional algorithms based on dynamic programming are hard to parallelize, require significant... more

descriptionView Paper arrow_downwardDownload

A performance analysis of a mimetic finite difference scheme for acoustic wave propagation on GPU platforms

by Beatriz Otero Calviño

2025, Concurrency and Computation: Practice and Experience

SummaryRealistic applications of numerical modeling of acoustic wave dynamics usually demand high‐performance computing because of the large size of study domains and demanding accuracy requirements on simulation results. Forward modeling... more

descriptionView Paper arrow_downwardDownload

GPU-based DVB-S2 LDPC decoder with high throughput and fast error floor detection

by João Pedro Souza Andrade

2025, Electronics Letters

A new strategy is proposed for implementing computationally intensive high-throughput decoders based on the long length irregular LDPC codes adopted in the DVB-S2 standard. It is supported on manycore graphics processing unit (GPU)... more

descriptionView Paper arrow_downwardDownload

Three-dimensional photoacoustic tomography based on the focal-line concept

by Konstantin Maslov

2025, Journal of Biomedical Optics

A full ring ultrasonic array-based photoacoustic tomography system was recently developed for small animal brain imaging. The 512-element array is cylindrically focused in the elevational direction, and can acquire a twodimensional (2D)... more

descriptionView Paper arrow_downwardDownload

Arithmétique et Précision des Calculs sur Ordinateur

by Jean-Michel Muller

2025, HAL (Le Centre pour la Communication Scientifique Directe)

descriptionView Paper arrow_downwardDownload

Real-time massively parallel processing of spectral optical coherence tomography data on graphics processing units

by Piotr Targowski

2025, Proceedings of SPIE

In this contribution we describe a specialised data processing system for Spectral Optical Coherence Tomography (SOCT) biomedical imaging which utilises massively parallel data processing on a low-cost, Graphics Processing Unit (GPU). One... more

descriptionView Paper arrow_downwardDownload

A GPU Parallelization of the Absolute Nodal Coordinate Formulation for Applications in Flexible Multibody Dynamics

by naresh khude

2025, Volume 2: 32nd Computers and Information in Engineering Conference, Parts A and B

The Absolute Nodal Coordinate Formulation (ANCF) has been widely used to carry out the dynamics analysis of flexible bodies that undergo large rotation and large deformation. This formulation is consistent with the nonlinear theory of... more

descriptionView Paper arrow_downwardDownload

Hermite Polynomial Characterization of Heartbeats with Graphics Processing Units

by David López Márquez

2025

In this paper we address the massive parallelization of the characterization of heartbeats by means of Graphics Processors. Heartbeats are represented with Hermite polynomials due to the compactness and robustness of this representation.... more

descriptionView Paper arrow_downwardDownload

Implementation of a Lattice–Boltzmann method for numerical fluid mechanics using the nVIDIA CUDA technology

by Thomas Indinger

2025, Computer Science - Research and Development

descriptionView Paper arrow_downwardDownload

CPU-GPU Algorithms for Triangular Surface Mesh Simplification

by Suzanne Shontz

2025, Proceedings of the 21st International Meshing Roundtable

Mesh simplification and mesh compression are important processes in computer graphics and scientific computing, as such contexts allow for a mesh which takes up less memory than the original mesh. Current simplification and compression... more

descriptionView Paper arrow_downwardDownload

A Short Note on Gaussian Process Modeling for Large Datasets using Graphics Processing Units

by Pritam Ranjan

2025, arXiv (Cornell University)

The graphics processing unit (GPU) has emerged as a powerful and cost effective processor for general performance computing. GPUs are capable of an order of magnitude more floating-point operations per second as compared to modern central... more

descriptionView Paper arrow_downwardDownload

A Short Note on Gaussian Process Modeling for Large Datasets using Graphics Processing Units

by Pritam Ranjan

2025

descriptionView Paper arrow_downwardDownload

Real-Time Optical Mapping of Contracting Cardiac Tissues With GPU-Accelerated Numerical Motion Tracking

by namita ravi

2025, Frontiers in Cardiovascular Medicine

Optical mapping of action potentials or calcium transients in contracting cardiac tissues are challenging because of the severe sensitivity of the measurements to motion. The measurements rely on the accurate numerical tracking and... more

descriptionView Paper arrow_downwardDownload

GPU-Based Acceleration for Interior Tomography

by Hengyong Yu

2025, IEEE Access

The compressive sensing (CS) theory shows that real signals can be exactly recovered from very few samplings. Inspired by the CS theory, the interior problem in computed tomography is proved uniquely solvable by minimizing the... more

descriptionView Paper arrow_downwardDownload

Hardware/Software Obfuscation against Timing Side-channel Attack on a GPU

by David Kaeli

2025

GPUs are increasingly being used in security applications, especially for accelerating encryption/decryption. While GPUs are an attractive platform in terms of performance, the security of these devices raises a number of concerns. One... more

descriptionView Paper arrow_downwardDownload

Spaced Learning Method in Improving Scientific Literacy Skills and Performance in Science 10

by IJMRAP Editor

2025, IJMRAP

The main purpose of the study is to determine the effect of the Spaced Learning Method on learners' scientific literacy skills and performance in Science 10. This research also aims to determine the extent of spaced learning method and its key learning strategies such as: spaced repetition, time allocation and spaced practice; in the level of learners' scientific literacy skills in terms of: conceptual understanding, conceptual explanation and scientific inquiry in terms of their performance between experimental and control group and the significant difference between the learners' level of scientific literacy skills in terms of formative test. Descriptive research design was utilized, applying weighted mean and standard deviation for data analysis, and multiple linear regression to evaluate the effect of the Spaced Learning Method. Forty-five Los Baños Integrated School (LBNHS) Grade 10 students were purposively chosen to be respondents. The results showed that the Spaced Learning Method, and the main strategies used in it, had a remarkable positive impact on learners' scientific literacy skills, especially in the areas of concept understanding and explanation. The experimental group demonstrated a higher degree of ability in scientific literacy compared to the control group. Their experimental presentation performance was graded as "Outstanding," reflecting significant improvement in their skill to explain scientific concepts and execute scientific procedures. Although the important learning strategies (spaced repetition, time allocation, and spaced practice) were found to be useful, although they did not reflect a strong direct impact on learners' performance when presenting experiments. The research concludes that the Spaced Learning Method greatly improves learners' scientific literacy skills and their presentation skills for experiments. It suggests that teachers integrate spaced learning techniques into science courses to ensure improved retention and comprehension of scientific principles. Additional research is suggested to examine the ideal spacing intervals and how they affect learning outcomes.

descriptionView Paper arrow_downwardDownload

Rapid computation of sodium bioscales using gpu‐accelerated image reconstruction

by Wen-mei Hwu

2025, International Journal of Imaging Systems and Technology

Quantitative sodium magnetic resonance imaging permits noninvasive measurement of the tissue sodium concentration (TSC) bioscale in the brain. Computing the TSC bioscale requires reconstructing and combining multiple datasets acquired... more

descriptionView Paper arrow_downwardDownload

Bi-predictive motion estimation for HEVC on a graphics processing unit (GPU)

by Stefan Radicke

2025, IEEE Transactions on Consumer Electronics

High Efficiency Video Coding (HEVC), the latest video compression standard, will play an important role in many multimedia applications in the foreseeable future. Its superior compression performance enables HEVC to be particularly... more

descriptionView Paper arrow_downwardDownload

A Parallel HEVC Intra Prediction Algorithm for Heterogeneous CPU+GPU Platforms

by Stefan Radicke

2025, IEEE Transactions on Broadcasting

The high efficiency video coding standard provides excellent coding performance but is also very complex. Especially, the intra mode decision is very time-consuming due to the large number of available prediction modes and the flexible... more

descriptionView Paper arrow_downwardDownload

Bi-predictive motion estimation for HEVC on a graphics processing unit (GPU)

by Stefan Radicke

2025, IEEE Transactions on Consumer Electronics

descriptionView Paper arrow_downwardDownload

Graphics processing unit-assisted density profile calculations in the KSTAR reflectometer

by Seong-Heon Seo

2025, Review of Scientific Instruments

descriptionView Paper arrow_downwardDownload

Graphics processing unit-assisted density profile calculations in the KSTAR reflectometer

by Seong-Heon Seo

2025, The Review of scientific instruments

Wavelet transform (WT) is widely used in signal processing. The frequency modulation reflectometer in the KSTAR applies this technique to get the phase information from the mixer output measurements. Since WT is a time consuming process,... more

descriptionView Paper arrow_downwardDownload

Implementing wide baseline matching algorithms on a graphics processing unit

by ANTONIO LUIS CRISTOBAL GONZALES

2025

Wide baseline matching is the state of the art for object recognition and image registration problems in computer vision. Though effective, the computational expense of these algorithms limits their application to many real-world... more

descriptionView Paper arrow_downwardDownload

A distributed multi-GPU system for high speed electron microscopic tomographic reconstruction

by Michael Braunfeld

2025, Ultramicroscopy

Full resolution electron microscopic tomographic (EMT) reconstruction of large-scale tilt series requires significant computing power. The desire to perform multiple cycles of iterative reconstruction and realignment dramatically... more

descriptionView Paper arrow_downwardDownload

Implementation of Sequential Importance Sampling in GPGPU

by 正也齋藤

2025, International Conference on Information Fusion

The estimation of many unknown parameters is carried out using a simplified Sequential Importance Sampling (SIS) algorithm which is implemented in a graphic processing unit (GPU). The aim of the present work is to show technical points to... more

descriptionView Paper arrow_downwardDownload

Accelerating Image Reconstruction in Dual-Head PET System by GPU and Symmetry Properties

by Chintu Chen

2025, PLoS ONE

Positron emission tomography (PET) is an important imaging modality in both clinical usage and research studies. We have developed a compact high-sensitivity PET system that consisted of two large-area panel PET detector heads, which... more

descriptionView Paper arrow_downwardDownload

GPU-Accelerated FDTD Modeling of Radio-Frequency Field–Tissue Interactions in High-Field MRI

by Ewald Weber

2025, IEEE Transactions on Biomedical Engineering

descriptionView Paper arrow_downwardDownload

GPU-Accelerated FDTD Modeling of Radio-Frequency Field–Tissue Interactions in High-Field MRI

by Ewald Weber

2025, IEEE Transactions on Biomedical Engineering

descriptionView Paper arrow_downwardDownload

Application of the CUDA technology to the solution of fluid dynamics problems

by Ediguer Enrique Franco Guzman

2025

relacionados con la dinámica de fluidos. res pro emas c sicos de di erente nive de comp e idad: convección-difusión en un canal, la cavidad movida por pared y la cavidad movida por diferencia de temperatura, fueron solucionados por el... more

descriptionView Paper arrow_downwardDownload

Семейство процессоров обработки сигналов с векторно-матричной архитектурой NeuroMatrix

by Павел Шевченко

2025

descriptionView Paper arrow_downwardDownload

WOODSTOCC: Extracting Latent Parallelism from a DNA Sequence Aligner on a GPU

by Jeremy Buhler

2025

An exponential increase in the speed of DNA sequencing over the past decade has driven demand for fast, spaceefficient algorithms to process the resultant data. The first step in processing is alignment of many short DNA sequences, or... more

descriptionView Paper arrow_downwardDownload

WOODSTOCC: Extracting Latent Parallelism from a DNA Sequence Aligner on a GPU

by Jeremy Buhler

2025, 2014 IEEE 13th International Symposium on Parallel and Distributed Computing

An exponential increase in the speed of DNA sequencing over the past decade has driven demand for fast, space-efficient algorithms to process the resultant data. The first step in processing is alignment of many short DNA sequences, or... more

descriptionView Paper arrow_downwardDownload

Numerical ocean modeling and simulation with CUDA

by Paul Choboter

2025, OCEANS'11 MTS/IEEE KONA

ROMS is software that models and simulates an ocean region using a finite difference grid and time stepping. ROMS simulations can take from hours to days to complete due to the compute-intensive nature of the software. As a result, the... more

descriptionView Paper arrow_downwardDownload

High Performance Regional Ocean Modeling with GPU Acceleration

by Paul Choboter

2025

descriptionView Paper arrow_downwardDownload

Error Classification and Static Detection Methods in Tri-Programming Models: MPI, OpenMP, and CUDA

by Saeed Altalhi

2025, Computers

The growing adoption of supercomputers across various scientific disciplines, particularly by researchers without a background in computer science, has intensified the demand for parallel applications. These applications are typically developed using a combination of programming models within languages such as C, C++, and Fortran. However, modern multi-core processors and accelerators necessitate fine-grained control to achieve effective parallelism, complicating the development process. To address this, developers commonly utilize high-level programming models such as Open Multi-Processing (OpenMP), Open Accelerators (OpenACCs), Message Passing Interface (MPI), and Compute Unified Device Architecture (CUDA). These models may be used independently or combined into dual-or tri-model applications to leverage their complementary strengths. However, integrating multiple models introduces subtle and difficult-to-detect runtime errors such as data races, deadlocks, and livelocks that often elude conventional compilers. This complexity is exacerbated in applications that simultaneously incorporate MPI, OpenMP, and CUDA, where the origin of runtime errors, whether from individual models, user logic, or their interactions, becomes ambiguous. Moreover, existing tools are inadequate for detecting such errors in tri-model applications, leaving a critical gap in development support. To address this gap, the present study introduces a static analysis tool designed specifically for tri-model applications combining MPI, OpenMP, and CUDA in C++-based environments. The tool analyzes source code to identify both actual and potential runtime errors prior to execution. Central to this approach is the introduction of error dependency graphs, a novel mechanism for systematically representing and analyzing error correlations in hybrid applications. By offering both error classification and comprehensive static detection, the proposed tool enhances error visibility and reduces manual testing effort. This contributes significantly to the development of more robust parallel applications for high-performance computing (HPC) and future exascale systems.

descriptionView Paper arrow_downwardDownload

Error Classification and Static Detection Methods in Tri-Programming Models: MPI, OpenMP, and CUDA

by Saeed Altalhi

2025

descriptionView Paper arrow_downwardDownload

Fast, parallel implementation of particle filtering on the GPU architecture

by András Horváth

2025, EURASIP Journal on Advances in Signal Processing

descriptionView Paper arrow_downwardDownload

GPU-accelerated computational tool for studying the effectiveness of asteroid disruption techniques

by Bong Wie

2025, Acta Astronautica

This paper is concerned with the development of a new GPU (Graphics Processing Units) accelerated computational tool for validating the effectiveness of a hypervelocity kinetic impact and a subsequent nuclear subsurface explosion.... more

descriptionView Paper arrow_downwardDownload

Nuclear fragmentation/dispersion modeling and simulation of hazardous near-Earth objects

by Bong Wie

2025, Acta Astronautica

descriptionView Paper arrow_downwardDownload

A Comparative Study of 2D Numerical Methods with GPU Computing

by Bong Wie

2025, arXiv (Cornell University)

Graphics Processing Unit (GPU) computing is becoming an alternate computing platform for numerical simulations. However, it is not clear which numerical scheme will provide the highest computational efficiency for different types of... more

descriptionView Paper arrow_downwardDownload

A Highly Accelerated Parallel Multi-GPU based Reconstruction Algorithm for Generating Accurate Relative Stopping Powers

by hạnh Nguyên

2025, 2017 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC)

descriptionView Paper arrow_downwardDownload

GPU acceleration of a model-based iterative method for Digital Breast Tomosynthesis

by Elena Morotti

2025, Scientific Reports

Digital Breast Tomosynthesis (DBT) is a modern 3D Computed Tomography X-ray technique for the early detection of breast tumors, which is receiving growing interest in the medical and scientific community. Since DBT performs incomplete... more

descriptionView Paper arrow_downwardDownload

Почему векторное пространство не является полем

by Mikhail V L A D I M I R O V I C H Eliseev

2025, Почему векторное пространство не является полем

В этой статье рассматриваются фундаментальные различия между алгебраической структурой поля и структурой векторного пространства. Несмотря на внешнюю схожесть — наличие операций сложения и умножения (насколько таковая используется) —... more

descriptionView Paper arrow_downwardDownload

GRS — GPU radix sort for multifield records

by Sartaj Sahni

2025, 2010 International Conference on High Performance Computing

We extend the number sorting algorithms on the GPU to sort large multi-field records. We notice that traditional way of sorting the records by first sorting a (key, index) pair to obtain the sorted permutation of the records followed by... more

descriptionView Paper arrow_downwardDownload

GPU Accelerated Successive Interference Cancellation for NOMA Uplink with User Clustering

by Yau Kho

2025, Wireless Personal Communications

Non-orthogonal multiple access (NOMA) can achieve high throughput by using the same time and frequency resources for multiple users. NOMA distinguishes multiple users in power domain by computationally-heavy successive interference... more

descriptionView Paper arrow_downwardDownload

GPU Accelerated Successive Interference Cancellation for NOMA Uplink with User Clustering

by Yau Kho

2025, Wireless Personal Communications

descriptionView Paper arrow_downwardDownload

Multifrontal Sparse Matrix Factorization on Graphics Processing Units

by John Tran

2025

For many finite element problems, when represented as sparse matrices, iterative solvers are found to be unreliable because they can impose com-putational bottlenecks. Early pioneering work by Duff et al, explored an alternative strategy... more

descriptionView Paper arrow_downwardDownload

A combined GPGPU-FPGA high-performance desktop

by Abdellah Touhafi

2025

descriptionView Paper arrow_downwardDownload

Fast Variable Center-Biased Windowing for High-Speed Stereo on Programmable Graphics Hardware

by G. Lafruit

2025

descriptionView Paper arrow_downwardDownload

Real-time stereo matching: A cross-based local approach

by G. Lafruit

2025