Academia.eduAcademia.edu

Human Action Recognition

description390 papers
group2,499 followers
lightbulbAbout this topic
Human Action Recognition is a field of computer vision and machine learning focused on identifying and classifying human actions in video sequences or images. It involves analyzing motion patterns and contextual information to enable systems to understand and interpret human behavior in various environments.
lightbulbAbout this topic
Human Action Recognition is a field of computer vision and machine learning focused on identifying and classifying human actions in video sequences or images. It involves analyzing motion patterns and contextual information to enable systems to understand and interpret human behavior in various environments.

Key research themes

1. How can spatial-temporal feature extraction and data fusion improve accuracy and robustness in human action recognition?

This research theme investigates the development and integration of spatial and temporal features for human action recognition (HAR), focusing on methods that combine multiple feature types or modalities to capture intricate motion and appearance cues. The goal is to enhance recognition accuracy and robustness across varied environmental conditions and datasets by effectively modeling both static pose and dynamic movement patterns.

Key finding: Proposes the STAR-transformer model which aggregates cross-modal data (video frames and skeleton sequences) into multi-class tokens using novel spatio-temporal attention mechanisms (zigzag and binary attention) to efficiently... Read more
Key finding: Develops a feature descriptor fusing Histogram of Oriented Gradient (HOG) features with displacement and velocity to capture spatial gradient and motion information in video sequences. The fusion technique reduces descriptor... Read more
Key finding: Improves temporal relationship modeling by extracting trajectories via tracking spatio-temporal interest points (cuboids) using SIFT descriptor matching. The approach represents human actions by volumes around trajectory... Read more
Key finding: Introduces part-aware graphs to improve skeleton-based HAR by segregating skeleton data into semantically meaningful parts emphasizing motion-relevant areas. The multi-stream fusion aggregates different part-based graph... Read more
Key finding: Integrates multi-modal sensor data—combining inertial (accelerometers, gyroscopes) and computer vision inputs (RGB, skeleton data)—to extract time-frequency and geometric features. Their fusion using logistic regression... Read more

2. What are the effective dimensionality reduction strategies for handling high-dimensional features in large-scale human action recognition datasets?

This area focuses on addressing the computational and storage challenges posed by increasingly high-dimensional feature vectors, especially those derived from Fisher vectors and Bag-of-Words models on large-scale datasets. The studies explore how dimensionality reduction techniques such as principal component analysis (PCA) or learned projections can unearth latent structures in feature spaces, reduce redundancy, and facilitate efficient and accurate classification in expansive HAR datasets comprising numerous action classes and real-world variability.

Key finding: Demonstrates that reducing the dimension of high-dimensional Fisher vector features (up to ~500K dimensions) using projection techniques can maintain or improve classification performance on large-scale unconstrained datasets... Read more

3. How can skeletal data and body part representations be leveraged for efficient and interpretable human action recognition?

This theme explores techniques that utilize human skeleton-based features and body part models to improve interpretability, reduce feature size, and increase recognition accuracy. Approaches include representing body dimensions variations, part-based graph models, and compact skeleton descriptors to capture meaningful and discriminative motion patterns. Such methods offer the advantage of robustness to occlusion and viewpoint changes and facilitate lightweight, explainable HAR systems.

by Heba Elnemr and 
1 more
Key finding: Proposes an action recognition method exploiting global variations in skeleton-derived human body dimensions during motion, using both 2D and 3D data. Achieves high accuracy (above 94%) across Weizmann, Berkeley MHAD, and... Read more
Key finding: Uses part-aware graph convolutional networks to isolate and emphasize dominant skeleton sub-parts across actions, improving feature discrimination. Fusion of streams trained on different parts yields substantial performance... Read more

All papers in Human Action Recognition

Human Action Recognition has seen significant advances through transformer-based architectures, yet achieving a nuanced understanding often requires fusing multiple data modalities. Standard models relying solely on RGB video can struggle... more
Human action recognition is a topic widely studied over time, using numerous techniques and methods to solve a fundamental problem in automatic video analysis. Basically, a traditional human action recognition system collects video frames... more
The Silberstein model of the molecular polarizability of diatomic molecules, generalized by Applequist et al. for polyatomic molecules, is analyzed. The atoms are regarded as isotropically polarizable points located at their nuclei,... more
Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action... more
Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action... more
Collaborative-based representation classifiers have widely spread in the latest years achieving remarkable results in signal and image processing tasks. In this paper, we consider these approaches for the hyperspectral image... more
A method to monitor elderly people in an indoor environment using conventional cameras is presented. The method can be used to identify people's activities and initiate suitable actions as needed. The originality of our approach is in... more
Unintentional falls are a major public health concern for many communities, especially with aging populations. There are various approaches used to classify human activities for fall detection. Related studies have employed wearable,... more
Current state-of-the-art approaches for spatio-temporal action detection deal with stable videos and quite sterilized environments, as seen in the UCF-101 benchmark. In addition, the objects of interest are typically relatively close to... more
This paper presents a survey of trajectory-based activity analysis for visual surveillance. It describes techniques that use trajectory data to define a general set of activities that are applicable to a wide range of scenes and... more
In this paper we address the task of recognizing assembly actions as a structure (e.g. a piece of furniture or a toy block tower) is built up from a set of primitive objects. Recognizing the full range of assembly actions requires... more
In this paper we address the task of recognizing assembly actions as a structure (e.g. a piece of furniture or a toy block tower) is built up from a set of primitive objects. Recognizing the full range of assembly actions requires... more
In this report, the technical details of our submission to the EPIC-Kitchens Action Anticipation Challenge 2021 are given. We developed a hierarchical attention model for action anticipation, which leverages Transformer-based attention... more
by Mehdi Imani and 
1 more
Recognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human-computer interaction. While deep learning models such as 3D convolutional neural networks... more
In 3D human motion pose-based analysis, the main problem is how to classify multi-class label activities based on primitive action (pose) inputs efficiently for both accuracy and processing time. Because, pose is not unique and the same... more
This paper presents a human action recognition method using dynamic time warping and voting algorithms on 3D human skeletal models. In this method human actions, which are the combinations of multiple body part movements, are described by... more
The rapid development of autonomous vehicles (AVs) has led to significant advancements in scene understanding, a critical aspect of ensuring safe and efficient navigation. Deep learning (DL) has emerged as a powerful tool for... more
The increasing complexity of decisionmaking in critical domains such as healthcare, finance, and emergency response has highlighted the need for a collaborative approach that integrates human intuition and expertise with artificial... more
The rapid advancement of Artificial Intelligence (AI) is revolutionizing online education by offering new opportunities for creating personalized learning pathways tailored to individual learners. This paper explores how AI-powered... more
Video-based action recognition is a crucial task in computer vision with applications spanning surveillance, sports analytics, human-computer interaction, and autonomous systems. This paper explores the application of spatiotemporal... more
The Low-light photography often suffers from issues such as noise, low contrast, and color distortion, which degrade image quality and limit its applicability in various fields such as night-time photography, security surveillance, and... more
Human Action Recognition is a challenging issue in the real time constraints where the action videos or images are contaminated with several side effects like noises, moving backgrounds, multiple views, hindered movements etc. Under these... more
Correct classication of image data can depend on features learned in multiple sequential frames. We focus on the problem of learning action from video data with an emphasis on driver behavior monitoring. An insucient quantity of high... more
This paper presents a human action recognition method using dynamic time warping and voting algorithms on 3D human skeletal models. In this method human actions, which are the combinations of multiple body part movements, are described by... more
Human Activity Recognition (HAR) is a crucial component of computer vision, with applications in human-computer interaction and surveillance. As the need for HAR technology keeps increasing, so does the desire for solutions that can help... more
Computing optical flow is a fundamental problem in computer vision. However, deep learning-based optical flow techniques do not perform well for non-rigid movements such as those found in faces, primarily due to lack of the training data... more
Human Action Recognition is used to analyse the videos to identify the actions performed by humans. In recent years, it has gained much popularity due to its large domain of applications presented in various... more
Due to the rising demand for automatic interpretation of visual relationships in several domains, human-object interaction (HOI) detection and recognition have also gained more attention from researchers over the last decade. This survey... more
Spatial and temporal relationships, both short-range and long-range, between objects in videos, are key cues for recognizing actions. It is a challenging problem to model them jointly. In this paper, we first present a new variant of Long... more
Stacked denoising autoencoders (SDAs) have been successfully used to learn new representations for domain adaptation. Recently, they have attained record accuracy on standard benchmark tasks of sentiment analysis across different text... more
This paper examines the connection between Ludwig von Mises and early contributors to game theory. What becomes clear is that early game theorists were trained by Austrians who thus influenced the field from its beginning. The connection... more
Activity detection in security videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in activity... more
Activity detection in surveillance videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in... more
In this paper, we describe a simple strategy for mitigating variability in temporal data series by shifting focus onto long-term, frequency domain features that are less susceptible to variability. We apply this method to the human action... more
In this paper, we describe a simple strategy for mitigating variability in temporal data series by shifting focus onto long-term, frequency domain features that are less susceptible to variability. We apply this method to the human action... more
Human Action Recognition (HAR) has gained increasing importance in various fields, including surveillance, healthcare, and sports analysis. This paper presents the development of a Human Action Recognition model using the Long-term... more
The severity of sustained injury resulting from assault-related violence can be minimised by reducing detection time. However, it has been shown that human operators perform poorly at detecting events found in video footage when presented... more
Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources... more
This paper describes our method developed for TRECVID 2012 Semantic INdexing (SIN) Task. Our main research purpose is the development of a fast method, which can work on a single processor with no performance degradation. To this end,... more
This paper aims to determine which is the best human action recognition method based on features extracted from RGB-D devices, such as the Microsoft Kinect. A review of all the papers that make reference to MSR Action3D, the most used... more
Since the Microsoft Kinect has been released, the usage of marker-less body pose estimation has been enormously eased. Based on 3D skeletal pose information, complex human gestures and actions can be recognised in real time. However, due... more
Due to progress and demographic change, society is facing a crucial challenge related to increased life expectancy and a higher number of people in situations of dependency. As a consequence, there exists a significant demand for support... more
Interest in RGB-D devices is increasing due to their low price and the wide range of possible applications that come along. These devices provide a marker-less body pose estimation by means of skeletal data consisting of 3D positions of... more
We propose a self-supervised visual learning method by predicting the variable playback speeds of a video. Without semantic labels, we learn the spatio-temporal visual representation of the video by leveraging the variations in the visual... more
This paper describes a novel methodology for automated recognition of high-level activities. A key aspect of our framework relies on the concept of cooccurring visual words for describing interactions between several persons. Motivated by... more
Due to the growing use of human action recognition in every day life applications, it has become one of the very hot topics in image analysis and pattern recognition. This paper presents a new feature extraction method for human action... more
Image processing is a method that takes image as input and performs some operation on it and gives image as output. It takes input as photograph or video frame and gives output as the parameters related to that image. Image processing... more
In modern e-Healthcare systems, human activity recognition (HAR) is one of the most challenging tasks in remote monitoring of patients suffering from mental illness or disabilities for necessary assistance. One of the major issues is to... more
Digital-enabled manufacturing systems require a high level of automation for fast and low-cost production but should also present flexibility and adaptiveness to varying and dynamic conditions in their environment, including the presence... more
Download research papers for free!