Academia.eduAcademia.edu

Human Action Recognition

description390 papers
group2,499 followers
lightbulbAbout this topic
Human Action Recognition is a field of computer vision and machine learning focused on identifying and classifying human actions in video sequences or images. It involves analyzing motion patterns and contextual information to enable systems to understand and interpret human behavior in various environments.
lightbulbAbout this topic
Human Action Recognition is a field of computer vision and machine learning focused on identifying and classifying human actions in video sequences or images. It involves analyzing motion patterns and contextual information to enable systems to understand and interpret human behavior in various environments.

Key research themes

1. How can spatial-temporal feature extraction and data fusion improve accuracy and robustness in human action recognition?

This research theme investigates the development and integration of spatial and temporal features for human action recognition (HAR), focusing on methods that combine multiple feature types or modalities to capture intricate motion and appearance cues. The goal is to enhance recognition accuracy and robustness across varied environmental conditions and datasets by effectively modeling both static pose and dynamic movement patterns.

Key finding: Proposes the STAR-transformer model which aggregates cross-modal data (video frames and skeleton sequences) into multi-class tokens using novel spatio-temporal attention mechanisms (zigzag and binary attention) to efficiently... Read more
Key finding: Develops a feature descriptor fusing Histogram of Oriented Gradient (HOG) features with displacement and velocity to capture spatial gradient and motion information in video sequences. The fusion technique reduces descriptor... Read more
Key finding: Improves temporal relationship modeling by extracting trajectories via tracking spatio-temporal interest points (cuboids) using SIFT descriptor matching. The approach represents human actions by volumes around trajectory... Read more
Key finding: Introduces part-aware graphs to improve skeleton-based HAR by segregating skeleton data into semantically meaningful parts emphasizing motion-relevant areas. The multi-stream fusion aggregates different part-based graph... Read more
Key finding: Integrates multi-modal sensor data—combining inertial (accelerometers, gyroscopes) and computer vision inputs (RGB, skeleton data)—to extract time-frequency and geometric features. Their fusion using logistic regression... Read more

2. What are the effective dimensionality reduction strategies for handling high-dimensional features in large-scale human action recognition datasets?

This area focuses on addressing the computational and storage challenges posed by increasingly high-dimensional feature vectors, especially those derived from Fisher vectors and Bag-of-Words models on large-scale datasets. The studies explore how dimensionality reduction techniques such as principal component analysis (PCA) or learned projections can unearth latent structures in feature spaces, reduce redundancy, and facilitate efficient and accurate classification in expansive HAR datasets comprising numerous action classes and real-world variability.

Key finding: Demonstrates that reducing the dimension of high-dimensional Fisher vector features (up to ~500K dimensions) using projection techniques can maintain or improve classification performance on large-scale unconstrained datasets... Read more

3. How can skeletal data and body part representations be leveraged for efficient and interpretable human action recognition?

This theme explores techniques that utilize human skeleton-based features and body part models to improve interpretability, reduce feature size, and increase recognition accuracy. Approaches include representing body dimensions variations, part-based graph models, and compact skeleton descriptors to capture meaningful and discriminative motion patterns. Such methods offer the advantage of robustness to occlusion and viewpoint changes and facilitate lightweight, explainable HAR systems.

by Heba Elnemr and 
1 more
Key finding: Proposes an action recognition method exploiting global variations in skeleton-derived human body dimensions during motion, using both 2D and 3D data. Achieves high accuracy (above 94%) across Weizmann, Berkeley MHAD, and... Read more
Key finding: Uses part-aware graph convolutional networks to isolate and emphasize dominant skeleton sub-parts across actions, improving feature discrimination. Fusion of streams trained on different parts yields substantial performance... Read more

All papers in Human Action Recognition

A number of review or survey articles have previously appeared on human action recognition where either vision sensors or inertial sensors are used individually. Considering that each sensor modality has its own limitations, in a number... more
Modeling human behaviors and activity patterns for recognition or detection of special event has attracted significant research interest in recent years. Diverse methods that are abound for building intelligent vision systems aimed at... more
In this work we propose a new approach to the spatiotemporal localisation (detection) and classification of multiple concurrent actions within temporally untrimmed videos. Our framework is composed of three stages. In stage 1, a cascade... more
Deep learning has been demonstrated to achieve excellent results for image classification and object detection. However, the impact of deep learning on video analysis (e.g. action detection and recognition) has been limited due to... more
Human action recognition based on skeletons has wide applications in human-computer interaction and intelligent surveillance. However, view variations and noisy data bring challenges to this task. What's more, it remains a problem to... more
Human Behaviour Analysis (HBA) is more and more being of interest for computer vision and artificial intelligence researchers. Its main application areas, like Video Surveillance and Ambient-Assisted Living (AAL), have been in great... more
This paper pursues the best multiclass classification strategy for pose-based 3D human motion recognition using Extreme Learning Machines (ELM). Such classification task is one of the most difficult classification problem because the pose... more
Capabilities of domestic service robots could be further improved, if the robot is equipped with an ability to recognize activities performed by humans in its sensory range. For example in a simple scenario a floor cleaning robot can... more
We propose a novel conditional GAN (cGAN) model for continuous fine-grained human action segmentation, that utilises multi-modal data and learned scene context information. The proposed approach utilises two GANs: termed Action GAN and... more
In this paper, we propose a new Human Action Recognition algorithm depending on hybrid features extraction from silhouettes and Neural Networks for classification. The hybrid features include contour history images which is a new method,... more
A B S T R A C T Action recognition, aiming to automatically classify actions from a series of observations, has attracted more attention in the computer vision community. The state-of-the-art action recognition methods utilize dense... more
Current state-of-the-art human action recognition is fo-cused on the classification of temporally trimmed videos in which only one action occurs per frame. In this work we address the problem of action localisation and instance... more
Assisting patients to perform activity of daily living (ADLs) is a challenging task for both human and machine. Hence, developing a computer-based rehabilitation system to re-train patients to carry out daily activities is an essential... more
In this paper we address the problem of continuous fine-grained action segmentation, in which multiple actions are present in an unsegmented video stream. The challenge for this task lies in the need to represent the hierarchical nature... more
Recent applications of Convolutional Neural Networks, especially 3-Dimensional Convolutional Neural Networks (3DCNNs) for human action recognition (HAR) in videos have widely used. In this paper, we use a multi-stream framework which is a... more
Gotten from fast advances in computer vision and AI, video investigation errands have been moving from inferring the present state to anticipating the future state. Vision-based activity acknowledgment and forecast from... more
—Human Action recognition research is an interesting and active filed of research in the current years. Human Action Recognition (HAR) has many potential and promising applications, in such fields as security, surveillance, professional... more
— This paper presents an effective multi-scale energy-based Global Ternary Image (GTI) representation for action recognition from depth sequences. The unique property of our representation is that it takes the spatial-temporal... more
In this paper, we studied Stacked Denoising Autoencoder(SDA) model for Human pose-based action recognition. We used public dataset Chalearn 2013 which contains Italian body language actions from 27 persons. We studied two model of SDA... more
Human action recognition is an important yet challenging task. This paper presents a low-cost descriptor called 3D Histograms of Texture (3DHoTs) to extract discriminant features from a sequence of depth maps. 3DHoTs are derived from... more
We present a deep-learning framework for real-time multiple spatio-temporal (S/T) action localisation and classification. Current state-of-the-art approaches work offline, and are too slow to be useful in real-world settings. To overcome... more
by Heba Elnemr and 
1 more
This paper presents a human action recognition system that distinguishes between different actions using a new set of features based on global variation in the visual appearance of the subject body. The proposed technique utilizes the... more
In this paper, we apply MCMCLDA (Multi-class Markov Chain Latent Dirichlet Allocation) model to classify abnormal activity of students in an examination. Abnormal activity in exams is defined as a cheating activity. We compare the usage... more
In contrast to the widely studied problem of recognizing an action given a complete sequence, action anticipation aims to identify the action from only partially available videos. As such, it is therefore key to the success of computer... more
This paper presents a new framework for human action recognition from depth sequences. An effective depth feature representation is developed based on the fusion of 2D and 3D auto-correlation of gradients features. Specifically, depth... more
Human action recognition has a wide range of applications in-cluding biometrics, surveillance, and human computer interaction. The use of multimodal sensors for human action recognition is steadily increasing. However, there are limited... more
Dominant approaches to action detection can only provide sub-optimal solutions to the problem, as they rely on seeking frame-level detections, to later compose them into 'action tubes' in a post-processing step. With this paper we... more
3D action recognition has broad applications in human-computer interaction and intelligent surveillance. However , recognizing similar actions remains challenging since previous literature fails to capture motion and shape cues... more
by Yi Zhu
In this work, we implement a real-time human action recognition framework, termed hidden two-stream networks [1]. This method only takes raw video frames as input and directly predicts action classes without explicitly computing optical... more
This paper presents a fusion approach for improving human action recognition based on two differing modality sensors consisting of a depth camera and an inertial body sensor. Computationally efficient action features are extracted from... more
In this paper, we propose a novel motion descriptor Seg-SIFT-ACC for human action recognition. The proposed descriptor is based both on the accordion representation of the video and its temporal segmentation into elementary motion... more
We propose a new self-supervised CNN pre-training technique based on a novel auxiliary task called odd-one-out learning. In this task, the machine is asked to identify the unrelated or odd element from a set of otherwise related elements.... more
In 3D human motion pose-based analysis, the main problem is how to classify multi-class label activities based on primitive action (pose) inputs efficiently for both accuracy and processing time. Because, pose is not unique and the same... more
This paper presents an effective local spatio-temporal descriptor for action recognition from depth video sequences. The unique property of our descriptor is that it takes the shape discrimination and action speed variations into account,... more
This paper presents a human action recognition system that runs in real-time and uses a depth camera and an inertial sensor simultaneously based on a previously developed sensor fusion method. Computationally efficient depth image... more
Action recognition " in the wild " is extremely challenging, particularly when complex 3D actions are projected down to the image plane, losing a great deal of information. The recent growth of 3D data in broadcast content and commercial... more
Human Behaviour Analysis (HBA) is more and more being of interest for computer vision and artificial intelligence researchers. Its main application areas, like Video Surveillance and Ambient-Assisted Living (AAL), have been in great... more
This paper presents a human action recognition method by using depth motion maps. Each depth frame in a depth video sequence is projected onto three orthogonal Cartesian planes. Under each projection view, the absolute difference between... more
Recognizing human actions in videos has become a rapidly growing area of research. Most existing research has focused only on a single aspect i.e. recognition of actions. However, humans tend to perform different actions in their own... more
This paper presents a human action recognition approach by the simultaneous deployment of a second generation Kinect depth sensor and a wearable inertial sensor. Three data modalities consisting of depth images, skeleton joint positions,... more
Human action recognition is gaining interest from many computer vision researchers because of its wide variety of potential applications. For instance: surveillance, advanced human computer interaction, content-based video retrieval, or... more
We propose a novel semi supervised, Multi-Level Sequential Generative Adversarial Network (MLS-GAN) architecture for group activity recognition. In contrast to previous works which utilise manually annotated individual human action... more
The visual recognition of complex, articulated human movements is fundamental for a wide range of artificial systems oriented toward human-robot communication, action classification, and action-driven perception. These challenging tasks... more
Bali traditional dance has gain international reputation thanks to its highly articulated body-part motions, fascinating eyes movement, facial expressions, and colorful costumes. Although the motions are viewed as the main aesthetic... more
In this paper, a human action recognition method is presented in which pose representation is based on the contour points of the human silhouette and actions are learned by making use of sequences of multi-view key poses. Our contribution... more
This paper, for the first time, introduces a multiple-class boosting scheme (MBS) to combine depth motion maps (DMMs) and completed local binary patterns (CLBP) for action recognition. DMMs derive from projecting depth frames onto three... more
Current state-of-the-art methods solve spatio-temporal action locali-sation by extending 2D anchors to 3D-cuboid proposals on stacks of frames, to generate sets of temporally connected bounding boxes called action micro-tubes. However,... more
In this work, we present a method to predict an entire 'action tube' (a set of temporally linked bounding boxes) in a trimmed video just by observing a smaller subset of it. Predicting where an action is going to take place in the near... more
This paper presents a new method for human activity recognition using depth sequences. Each depth sequence is represented by three depth motion maps (DMMs) from three projection views (front, side and top) to capture motion cues. A... more
Download research papers for free!