Human Action Recognition

description390 papers

group2,499 followers

lightbulbAbout this topic

Human Action Recognition is a field of computer vision and machine learning focused on identifying and classifying human actions in video sequences or images. It involves analyzing motion patterns and contextual information to enable systems to understand and interpret human behavior in various environments.

lightbulbAbout this topic

Key research themes

1. How can spatial-temporal feature extraction and data fusion improve accuracy and robustness in human action recognition?

This research theme investigates the development and integration of spatial and temporal features for human action recognition (HAR), focusing on methods that combine multiple feature types or modalities to capture intricate motion and appearance cues. The goal is to enhance recognition accuracy and robustness across varied environmental conditions and datasets by effectively modeling both static pose and dynamic movement patterns.

STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

by Byoung Chul Ko

2024, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Key finding: Proposes the STAR-transformer model which aggregates cross-modal data (video frames and skeleton sequences) into multi-class tokens using novel spatio-temporal attention mechanisms (zigzag and binary attention) to efficiently... Read more

articleView Paper downloadDownload

Histogram of Oriented Gradient-Based Fusion of Features for Human Action Recognition in Action Video Sequences

by Sharnil Pandya

2021, Sensors

Key finding: Develops a feature descriptor fusing Histogram of Oriented Gradient (HOG) features with displacement and velocity to capture spatial gradient and motion information in video sequences. The fusion technique reduces descriptor... Read more

articleView Paper downloadDownload

Human action recognition using trajectory-based representation

by Elsayed Hemayed

2022, Egyptian Informatics Journal

Key finding: Improves temporal relationship modeling by extracting trajectories via tracking spatio-temporal interest points (cuboids) using SIFT descriptor matching. The approach represents human actions by volumes around trajectory... Read more

articleView Paper downloadDownload

Improving Skeleton-Based Action Recognition Using Part-Aware Graphs in a Multi-Stream Fusion Context

by Zois Tsakiris

2024, IEEE Access

Key finding: Introduces part-aware graphs to improve skeleton-based HAR by segregating skeleton data into semantically meaningful parts emphasizing motion-relevant areas. The multi-stream fusion aggregates different part-based graph... Read more

articleView Paper downloadDownload

Multi-Sensor-Based Action Monitoring and Recognition via Hybrid Descriptors and Logistic Regression

by ABDULWAHAB ALAZEB

2025, IEEE Access

Key finding: Integrates multi-modal sensor data—combining inertial (accelerometers, gyroscopes) and computer vision inputs (RGB, skeleton data)—to extract time-frequency and geometric features. Their fusion using logistic regression... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What are the effective dimensionality reduction strategies for handling high-dimensional features in large-scale human action recognition datasets?

This area focuses on addressing the computational and storage challenges posed by increasingly high-dimensional feature vectors, especially those derived from Fisher vectors and Bag-of-Words models on large-scale datasets. The studies explore how dimensionality reduction techniques such as principal component analysis (PCA) or learned projections can unearth latent structures in feature spaces, reduce redundancy, and facilitate efficient and accurate classification in expansive HAR datasets comprising numerous action classes and real-world variability.

Dimensionality reduction of Fisher vectors for human action recognition

by Roland Goecke

2023, IET Computer Vision

Key finding: Demonstrates that reducing the dimension of high-dimensional Fisher vector features (up to ~500K dimensions) using projection techniques can maintain or improve classification performance on large-scale unconstrained datasets... Read more

articleView Paper downloadDownload

3. How can skeletal data and body part representations be leveraged for efficient and interpretable human action recognition?

This theme explores techniques that utilize human skeleton-based features and body part models to improve interpretability, reduce feature size, and increase recognition accuracy. Approaches include representing body dimensions variations, part-based graph models, and compact skeleton descriptors to capture meaningful and discriminative motion patterns. Such methods offer the advantage of robustness to occlusion and viewpoint changes and facilitate lightweight, explainable HAR systems.

Human Action Recognition Utilizing Variations in Skeleton Dimensions

by Heba Elnemr and

2018

Key finding: Proposes an action recognition method exploiting global variations in skeleton-derived human body dimensions during motion, using both 2D and 3D data. Achieves high accuracy (above 94%) across Weizmann, Berkeley MHAD, and... Read more

articleView Paper downloadDownload

Improving Skeleton-Based Action Recognition Using Part-Aware Graphs in a Multi-Stream Fusion Context

by Zois Tsakiris

2024, IEEE Access

Key finding: Uses part-aware graph convolutional networks to isolate and emphasize dominant skeleton sub-parts across actions, improving feature discrimination. Fusion of streams trained on different parts yields substantial performance... Read more

articleView Paper downloadDownload

All papers in Human Action Recognition

TransMODAL: A Dual-Stream Transformer with Adaptive Co-Attention for Efficient Human Action Recognition

by Mehdi Imani

2025, Journal of Electronics

Human Action Recognition has seen significant advances through transformer-based architectures, yet achieving a nuanced understanding often requires fusing multiple data modalities. Standard models relying solely on RGB video can struggle... more

descriptionView Paper arrow_downwardDownload

Human action recognition in videos: a comparative evaluation of the classical and velocity adaptation space-time interest points techniques

by Bruno L . Macchiavello

2025

Human action recognition is a topic widely studied over time, using numerous techniques and methods to solve a fundamental problem in automatic video analysis. Basically, a traditional human action recognition system collects video frames... more

descriptionView Paper arrow_downwardDownload

Resonance in Interacting Induced-Dipole Polarizing Force Fields: Application to Force-Field Derivatives

by Jesús García

2025, Algorithms

The Silberstein model of the molecular polarizability of diatomic molecules, generalized by Applequist et al. for polyatomic molecules, is analyzed. The atoms are regarded as isotropically polarizable points located at their nuclei,... more

descriptionView Paper arrow_downwardDownload

Graph Convolutional Networks for Temporal Action Localization

by Junzhou Huang

2025, arXiv (Cornell University)

Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action... more

descriptionView Paper arrow_downwardDownload

Graph Convolutional Networks for Temporal Action Localization

by Junzhou Huang

2025, 2019 IEEE/CVF International Conference on Computer Vision (ICCV)

descriptionView Paper arrow_downwardDownload

An analysis of collaborative representation schemes for the classification of hyperspectral images

by Mauro Dalla Mura

2025

Collaborative-based representation classifiers have widely spread in the latest years achieving remarkable results in signal and image processing tasks. In this paper, we consider these approaches for the hyperspectral image... more

descriptionView Paper arrow_downwardDownload

Activity Recognition from Video Data Using Spatial and Temporal Features

by Djamel Azzi

2025

A method to monitor elderly people in an indoor environment using conventional cameras is presented. The method can be used to identify people's activities and initiate suitable actions as needed. The originality of our approach is in... more

descriptionView Paper arrow_downwardDownload

Development of a User-Adaptable Human Fall Detection Based on Fall Risk Levels Using Depth Sensor

by Mohd Mohd

2025, Sensors (Basel, Switzerland)

Unintentional falls are a major public health concern for many communities, especially with aging populations. There are various approaches used to classify human activities for fall detection. Related studies have employed wearable,... more

descriptionView Paper arrow_downwardDownload

Online Spatio-Temporal Action Detection in Long-Distance Imaging Affected by the Atmosphere

by Yitzhak Yitzhaky

2025, IEEE Access

Current state-of-the-art approaches for spatio-temporal action detection deal with stable videos and quite sterilized environments, as seen in the UCF-101 benchmark. In addition, the objects of interest are typically relatively close to... more

descriptionView Paper arrow_downwardDownload

A Survey of Vision-Based Trajectory Learning and Analysis for Surveillance

by Mohan Trivedi

2025, IEEE Transactions on Circuits and Systems for Video Technology

This paper presents a survey of trajectory-based activity analysis for visual surveillance. It describes techniques that use trajectory data to define a general set of activities that are applicable to a wide range of scenes and... more

descriptionView Paper arrow_downwardDownload

Fine-grained activity recognition for assembly videos

by Barbara Landau

2025, arXiv (Cornell University)

In this paper we address the task of recognizing assembly actions as a structure (e.g. a piece of furniture or a toy block tower) is built up from a set of primitive objects. Recognizing the full range of assembly actions requires... more

descriptionView Paper arrow_downwardDownload

Fine-Grained Activity Recognition for Assembly Videos

by Barbara Landau

2025, IEEE Robotics and Automation Letters

descriptionView Paper arrow_downwardDownload

TransAction: ICL-SJTU Submission to EPIC-Kitchens Action Anticipation Challenge 2021

by Benny Lo

2025, arXiv (Cornell University)

In this report, the technical details of our submission to the EPIC-Kitchens Action Anticipation Challenge 2021 are given. We developed a hierarchical attention model for action anticipation, which leverages Transformer-based attention... more

descriptionView Paper arrow_downwardDownload

A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection

by Mehdi Imani and

2025, Technologies

Recognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human-computer interaction. While deep learning models such as 3D convolutional neural networks... more

descriptionView Paper arrow_downwardDownload

Pose-based 3D human motion analysis using Extreme Learning Machine

by Arif Budiman

2025, 2013 IEEE 2nd Global Conference on Consumer Electronics (GCCE)

In 3D human motion pose-based analysis, the main problem is how to classify multi-class label activities based on primitive action (pose) inputs efficiently for both accuracy and processing time. Because, pose is not unique and the same... more

descriptionView Paper arrow_downwardDownload

Human Action Recognition Using Dynamic Time Warping and Voting Algorithm

by Thành Ngân Nguyễn Lê

2025, Vnu Journal of Science Computer Science and Communication Engineering

This paper presents a human action recognition method using dynamic time warping and voting algorithms on 3D human skeletal models. In this method human actions, which are the combinations of multiple body part movements, are described by... more

descriptionView Paper arrow_downwardDownload

DEEP LEARNING FOR AUTONOMOUS VEHICLE SCENE UNDERSTANDING

by Durga K

2025, international journal of computer science

The rapid development of autonomous vehicles (AVs) has led to significant advancements in scene understanding, a critical aspect of ensuring safe and efficient navigation. Deep learning (DL) has emerged as a powerful tool for... more

descriptionView Paper arrow_downwardDownload

HUMAN-AI COLLABORATION FRAMEWORK FOR COMPLEX DECISION MAKING

by Nirmala annadurai

2025, international journal of computer science

The increasing complexity of decisionmaking in critical domains such as healthcare, finance, and emergency response has highlighted the need for a collaborative approach that integrates human intuition and expertise with artificial... more

descriptionView Paper arrow_downwardDownload

AI-POWERED PERSONALIZED LEARNING PATHWAYS IN ONLINE EDUCATION

by Priya Papu

2025, international journal of computer science

The rapid advancement of Artificial Intelligence (AI) is revolutionizing online education by offering new opportunities for creating personalized learning pathways tailored to individual learners. This paper explores how AI-powered... more

descriptionView Paper arrow_downwardDownload

VIDEO-BASED ACTION RECOGNITION USING SPATIOTEMPORAL DEEP LEARNING MODELS

by Lavanya Kumar

2025, INTERNATIONAL JOURNAL OF COMPUTER SCIENCE

Video-based action recognition is a crucial task in computer vision with applications spanning surveillance, sports analytics, human-computer interaction, and autonomous systems. This paper explores the application of spatiotemporal... more

descriptionView Paper arrow_downwardDownload

DEEP LEARNING MODELS FOR ENHANCING LOW-LIGHT PHOTOGRAPHY

by HEMALATHA M

2025, International journal of computer science

The Low-light photography often suffers from issues such as noise, low contrast, and color distortion, which degrade image quality and limit its applicability in various fields such as night-time photography, security surveillance, and... more

descriptionView Paper arrow_downwardDownload

Edge Adaptive Gradient Action Descriptor and Kernel Discriminant Analysis for Human Action Recognition

by Birru Devender

2025, International journal of engineering research and technology

Human Action Recognition is a challenging issue in the real time constraints where the action videos or images are contaminated with several side effects like noises, moving backgrounds, multiple views, hindered movements etc. Under these... more

descriptionView Paper arrow_downwardDownload

Transfer Learning of Temporal Information for Driver Action Classification

by Shabab Bazrafkan

2025

Correct classication of image data can depend on features learned in multiple sequential frames. We focus on the problem of learning action from video data with an emphasis on driver behavior monitoring. An insucient quantity of high... more

descriptionView Paper arrow_downwardDownload

Human Action Recognition Using Dynamic Time Warping and Voting Algorithm

by Thành công Trần lê

2025, Vnu Journal of Science Computer Science and Communication Engineering

descriptionView Paper arrow_downwardDownload

Integrating Human Motion Dynamics in CNN Architecture to Recognize Human Activity from Different Camera Angles

by Kishan Kesari Gupta

2025, Elsevier

Human Activity Recognition (HAR) is a crucial component of computer vision, with applications in human-computer interaction and surveillance. As the need for HAR technology keeps increasing, so does the desire for solutions that can help... more

descriptionView Paper arrow_downwardDownload

Self-Supervised Approach for Facial Movement Based Optical Flow

by Muhannad Alkaddour

2025, IEEE Transactions on Affective Computing

Computing optical flow is a fundamental problem in computer vision. However, deep learning-based optical flow techniques do not perform well for non-rigid movements such as those found in faces, primarily due to lack of the training data... more

descriptionView Paper arrow_downwardDownload

Human Action Recognition using an Ensemble Deep Learning Model for Video Datasets

by Ramanpreet Kaur

2025, Journal of Harbin Engineering University

Human Action Recognition is used to analyse the videos to identify the actions performed by humans. In recent years, it has gained much popularity due to its large domain of applications presented in various... more

descriptionView Paper arrow_downwardDownload

A Review of Vision-Based Techniques Applied to Detecting Human-Object Interactions in Still Images

by Ramanpreet Kaur

2025, Journal of Computing Science and Engineering

Due to the rising demand for automatic interpretation of visual relationships in several domains, human-object interaction (HOI) detection and recognition have also gained more attention from researchers over the last decade. This survey... more

descriptionView Paper arrow_downwardDownload

Relational Long Short-Term Memory for Video Action Recognition

by Zexi Chen

2025, ArXiv

Spatial and temporal relationships, both short-range and long-range, between objects in videos, are key cues for recognizing actions. It is a challenging problem to model them jointly. In this paper, we first present a new variant of Long... more

descriptionView Paper arrow_downwardDownload

Marginalized Denoising Autoencoders for Domain Adaptation

by Minmin Chen

2025, arXiv (Cornell University)

Stacked denoising autoencoders (SDAs) have been successfully used to learn new representations for domain adaptation. Recently, they have attained record accuracy on standard benchmark tasks of sentiment analysis across different text... more

descriptionView Paper arrow_downwardDownload

Mises, Morgenstern, Hoselitz, and Nash: The Austrian Connection to Early Game Theory

by Nathan M Moore

2025, The Quarterly Journal of Austrian Economics

This paper examines the connection between Ludwig von Mises and early contributors to game theory. What becomes clear is that early game theorists were trained by Austrians who thus influenced the field from its beginning. The connection... more

descriptionView Paper arrow_downwardDownload

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

by Yogesh Rawat

2025, arXiv (Cornell University)

Activity detection in security videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in activity... more

descriptionView Paper arrow_downwardDownload

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Surveillance Videos

by Yogesh Rawat

2025, arXiv (Cornell University)

Activity detection in surveillance videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in... more

descriptionView Paper arrow_downwardDownload

Action Recognition in the Frequency Domain

by Jinyan Guan

2025, arXiv (Cornell University)

In this paper, we describe a simple strategy for mitigating variability in temporal data series by shifting focus onto long-term, frequency domain features that are less susceptible to variability. We apply this method to the human action... more

descriptionView Paper arrow_downwardDownload

Action Recognition in the Frequency Domain

by Jinyan Guan

2025

descriptionView Paper arrow_downwardDownload

Reserch Paper(HAR)

by Kavya Bhardwaj

2025, KavyaBhardwaj

Human Action Recognition (HAR) has gained increasing importance in various fields, including surveillance, healthcare, and sports analysis. This paper presents the development of a Human Action Recognition model using the Long-term... more

descriptionView Paper arrow_downwardDownload

Detecting violent crowds using temporal analysis of GLCM texture

by David Marshall

2024, arXiv (Cornell University)

The severity of sustained injury resulting from assault-related violence can be minimised by reducing detection time. However, it has been shown that human operators perform poorly at detecting events found in video footage when presented... more

descriptionView Paper arrow_downwardDownload

Multimodal vision-based human action recognition using deep learning: a review

by Ahmad R. Naghsh-Nilchi

2024, Artificial Intelligence Review ( 2024) 57:178

Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources... more

descriptionView Paper arrow_downwardDownload

Kobe University and Muroran Institute of Technology at TRECVID 20112 Semantic Indexing Task

by Kuniaki Uehara

2024, TRECVID

This paper describes our method developed for TRECVID 2012 Semantic INdexing (SIN) Task. Our main research purpose is the development of a fast method, which can work on a single processor with no performance degradation. To this end, computationally expensive processes are re-formulated based on matrix operation. We re-formulate the Euclidian distance computation for the kernel value computation in an SVM, and the probability density computation of multivariate normal distributions for the GMM supervector representation. This enables accurate concept detection using a large number of training examples, and spatially-temporally dense features. The following four runs were submitted to SIN (light) task: • L A kobe muro l5 4: This is our baseline run using five features, 1. SIFT at Harris-Affine regions, 2. SIFT at Hessian-Affine regions, 3. Trajectory displacement, 4. HOG around trajectories, and 5. MFCC. Five SVMs built on these features are fused using the weighted linear combination approach. This run achieved the MAP 0.320. • L A kobe muro l6 1: In addition to the above five features, this run uses the sixth feature, Spatio-Temporal-Dense RGB SIFT (STD-RGB-SIFT), consisting of SIFT descriptors sampled at every sixth pixel in every other frame. The extraction of this feature becomes feasible because of the significant speedup of the probability density computation. This run achieved the MAP 0.348. • L A kobe muro l18 3: To cover the diversity of a concept's appearances, this run utilizes bagging where three SVMs are built on each of six features using different subsets of training examples. Such many SVMs can be built due to the fast kernel value computation. SVMs are fused using the weighted linear combination. This run achieved the MAP 0.358, which is the highest score among all the runs submitted to SIN (light) task. • L A kobe muro r18 2: This run uses rough set theory to fuse SVMs in L A kobe muro l18 3, and achieved the MAP 0.323. The above results indicate the effectiveness of the spatiallytemporally dense feature STD-RGB-SIFT. In particular, the MAP 0.302 was achieved only using STD-RGB-SIFT. This is significantly higher than MAPs using the other single features. Also, the effectiveness of bagging can be seen from the above results.

descriptionView Paper arrow_downwardDownload

A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset

by Jose Padilla

2024

This paper aims to determine which is the best human action recognition method based on features extracted from RGB-D devices, such as the Microsoft Kinect. A review of all the papers that make reference to MSR Action3D, the most used... more

descriptionView Paper arrow_downwardDownload

Fusion of Skeletal and Silhouette-Based Features for Human Action Recognition with RGB-D Devices

by Jose Padilla

2024, 2013 IEEE International Conference on Computer Vision Workshops

Since the Microsoft Kinect has been released, the usage of marker-less body pose estimation has been enormously eased. Based on 3D skeletal pose information, complex human gestures and actions can be recognised in real time. However, due... more

descriptionView Paper arrow_downwardDownload

A Vision-Based System for Intelligent Monitoring: Human Behaviour Analysis and Privacy by Context

by Jose Padilla

2024, Sensors

Due to progress and demographic change, society is facing a crucial challenge related to increased life expectancy and a higher number of people in situations of dependency. As a consequence, there exists a significant demand for support... more

descriptionView Paper arrow_downwardDownload

Evolutionary joint selection to improve human action recognition with RGB-D devices

by Jose Padilla

2024, Expert Systems with Applications

Interest in RGB-D devices is increasing due to their low price and the wide range of possible applications that come along. These devices provide a marker-less body pose estimation by means of skeletal data consisting of 3D positions of... more

descriptionView Paper arrow_downwardDownload

Self-Supervised Spatio-Temporal Representation Learning Using Variable Playback Speed Prediction

by Hyung Jin Chang

2024, arXiv (Cornell University)

We propose a self-supervised visual learning method by predicting the variable playback speeds of a video. Without semantic labels, we learn the spatio-temporal visual representation of the video by leveraging the variations in the visual... more

descriptionView Paper arrow_downwardDownload

Human interaction recognition based on the co-occurence of visual words

by nour el houda khadidja SLIMANI

2024, HAL (Le Centre pour la Communication Scientifique Directe)

This paper describes a novel methodology for automated recognition of high-level activities. A key aspect of our framework relies on the concept of cooccurring visual words for describing interactions between several persons. Motivated by... more

descriptionView Paper arrow_downwardDownload

Robust Human Action Recognition Using History Trace Templates

by Stefanos Kollias

2024

Due to the growing use of human action recognition in every day life applications, it has become one of the very hot topics in image analysis and pattern recognition. This paper presents a new feature extraction method for human action... more

descriptionView Paper arrow_downwardDownload

A Survey on Image Processing and Human Action Recognition

by Samreen Dhillon

2024

Image processing is a method that takes image as input and performs some operation on it and gives image as output. It takes input as photograph or video frame and gives output as the parameters related to that image. Image processing... more

descriptionView Paper arrow_downwardDownload

Pose-based 3D human motion analysis using Extreme Learning Machine

by arif budiman

2024, 2013 IEEE 2nd Global Conference on Consumer Electronics (GCCE)

descriptionView Paper arrow_downwardDownload

A blockchain-based fog computing framework for activity recognition as an application to e-Healthcare services

by Yasir Faheem

2024, Future Generation Computer Systems

In modern e-Healthcare systems, human activity recognition (HAR) is one of the most challenging tasks in remote monitoring of patients suffering from mental illness or disabilities for necessary assistance. One of the major issues is to... more

descriptionView Paper arrow_downwardDownload

A Mixed-Perception Approach for Safe Human–Robot Collaboration in Industrial Automation

by Hossein Karimpour

2024, Sensors

Digital-enabled manufacturing systems require a high level of automation for fast and low-cost production but should also present flexibility and adaptiveness to varying and dynamic conditions in their environment, including the presence... more

descriptionView Paper arrow_downwardDownload

Human Action Recognition

Key research themes

1. How can spatial-temporal feature extraction and data fusion improve accuracy and robustness in human action recognition?

2. What are the effective dimensionality reduction strategies for handling high-dimensional features in large-scale human action recognition datasets?

3. How can skeletal data and body part representations be leveraged for efficient and interpretable human action recognition?

Related Topics

All papers in Human Action Recognition