Being capable of estimating the pose of uncooperative objects in space has been proposed as a key... more Being capable of estimating the pose of uncooperative objects in space has been proposed as a key asset for enabling safe close-proximity operations such as space rendezvous, in-orbit servicing and active debris removal. Usual approaches for pose estimation involve classical computer vision-based solutions or the application of Deep Learning (DL) techniques. This work explores a novel DL-based methodology, using Convolutional Neural Networks (CNNs), for estimating the pose of uncooperative spacecrafts. Contrary to other approaches, the proposed CNN directly regresses poses without needing any prior 3D information. Moreover, bounding boxes of the spacecraft in the image are predicted in a simple, yet efficient manner. The performed experiments show how this work competes with the state-of-the-art in uncooperative spacecraft pose estimation, including works which require 3D information as well as works which predict bounding boxes through sophisticated CNNs.
Generally, the evaluation of action recognition method is done thanks to a main criterion : the a... more Generally, the evaluation of action recognition method is done thanks to a main criterion : the accuracy of recognition. However, if the computational latency of a given method is too important, it becomes difficult to apply it in a wide range of applications. In this paper, we introduce a novel approach of action recognition based on RGB-D cameras which presents a good accuracy and a low execution time. Thus, a new descriptor based on the cubic spline interpolation of kinematic values is proposed. To evaluate the eventual applicability of our approach in online scenarios (unsegmented videos) with a sliding window approach, we propose to train a model of classification using incomplete actions which are randomly generated from the MSRAction3D benchmark. FIGURE 1 – Exemple d’image de profondeur (à gauche) et de squelette (à droite)
2020 25th International Conference on Pattern Recognition (ICPR)
This paper extends the Spatial-Temporal Graph Convolutional Network (ST-GCN) for skeleton-based a... more This paper extends the Spatial-Temporal Graph Convolutional Network (ST-GCN) for skeleton-based action recognition by introducing two novel modules, namely, the Graph Vertex Feature Encoder (GVFE) and the Dilated Hierarchical Temporal Convolutional Network (DH-TCN). On the one hand, the GVFE module learns appropriate vertex features for action recognition by encoding raw skeleton data into a new feature space. On the other hand, the DH-TCN module is capable of capturing both short-term and long-term temporal dependencies using a hierarchical dilated convolutional network. Experiments have been conducted on the challenging NTU RGB-D-60 and NTU RGB-D 120 datasets. The obtained results show that our method competes with stateof-the-art approaches while using a smaller number of layers and parameters; thus reducing the required training time and memory.
The recent availability of RGB-D cameras has renewed the interest of researchers in the topic of ... more The recent availability of RGB-D cameras has renewed the interest of researchers in the topic of human action recognition. More precisely, several action recognition methods have been proposed based on the novel modalities provided by these cameras, namely, depth maps and skeleton sequences. These approaches have been mainly evaluated in terms of recognition accuracy. This thesis aims to study the issue of fast action recognition from RGB-D cameras. It focuses on proposing an action recognition method realizing a trade-off between accuracy and latency for the purpose of applying it in real-time scenarios. As a first step, we propose a comparative study of recent RGB-D based action recognition methods using the two cited criteria: accuracy of recognition and rapidity of execution. Then, oriented by the conclusions stated thanks to this comparative study, we introduce a novel, fast and accurate human action descriptor called Kinematic Spline Curves (KSC).This latter is based on the cu...
ARecemment, les cameras RGB-D ont ete introduites sur le marche et ont permis l’exploration de no... more ARecemment, les cameras RGB-D ont ete introduites sur le marche et ont permis l’exploration de nouvelles approches de reconnaissance d’actions par l’utilisation de deux modalites autres que les images RGB, a savoir, les images de profondeur et les sequences de squelette. Generalement, ces approches ont ete evaluees en termes de taux de reconnaissance. Cette these s’interesse principalement a la reconnaissance rapide d’actions a partir de cameras RGB-D. Le travail a ete focalise sur une amelioration conjointe de la rapidite de calcul et du taux de reconnaissance en vue d’une application temps-reel. Dans un premier temps, nous menons une etude comparative des methodes existantes de reconnaissance d’actions basees sur des cameras RGB-D en utilisant les deux criteres enonces : le taux de reconnaissance et la rapidite de calcul. Suite aux conclusions resultant de cette etude, nous introduisons un nouveau descripteur de mouvement, a la fois precis et rapide, qui se base sur l’interpolatio...
This paper presents an intuitive feedback tool able to implicitly guide motion with respect to a ... more This paper presents an intuitive feedback tool able to implicitly guide motion with respect to a reference movement. Such a tool is important in multiple applications requiring assisting physical activities as in sports or rehabilitation. Our proposed approach is based on detecting key skeleton frames from a reference sequence of skeletons. The feedback is based on the 3D geometry analysis of the skeletons by taking into account the key-skeletons. Finally, the feedback is illustrated by a color-coded tool, which reflects the motion accuracy.
DeepVI: A Novel Framework for Learning Deep View-Invariant Human Action Representations using a Single RGB Camera
2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)
In this paper, we address the problem of cross-view action recognition from a monocular RGB camer... more In this paper, we address the problem of cross-view action recognition from a monocular RGB camera. This topic has been considered extremely challenging due to the lack of 3D information in 2D images. Exploiting the advances in 3D pose estimation from a single RGB camera, we propose a new framework termed DeepVI, for cross-view action recognition without the need for pose alignment. Virtual viewpoints are used to augment the variability of training data along with the use of an end-to-end Deep Neural Network (DNN). The proposed network is composed of two modules. The first one, called SmoothNet, implicitly smooths skeleton joint trajectories using revisited temporal convolution in order to reduce the noise in the estimated 3D skeletons. The second module consists of a state-of-the-art approach designed for action recognition based on Spatial Temporal Graph Convolutional Networks (ST-GCN [40]). Experiments have been conducted in cross-view settings on two datasets, namely, NTU RGB-D and Northwestern-UCLA. The obtained results show the effectiveness of the proposed framework.
In this paper, a novel approach for action detection from RGB sequences is proposed. This concept... more In this paper, a novel approach for action detection from RGB sequences is proposed. This concept takes advantage of the recent development of CNNs to estimate 3D human poses from a monocular camera. To show the validity of our method, we propose a 3D skeletonbased two-stage action detection approach. For localizing actions in unsegmented sequences, Relative Joint Position (RJP) and Histogram Of Displacements (HOD) are used as inputs to a k-nearest neighbor binary classifier in order to define action segments. Afterwards, to recognize the localized action proposals, a compact Long Short-Term Memory (LSTM) network with a de-noising expansion unit is employed. Compared to previous RGB-based methods, our approach offers robustness to radial motion, view-invariance and low computational complexity. Results on the Online Action Detection dataset show that our method outperforms earlier RGB-based approaches.
Home-Based Rehabilitation System for Stroke Survivors: A Clinical Evaluation
Journal of Medical Systems
Recently, a home-based rehabilitation system for stroke survivors (Baptista et al. Comput. Meth. ... more Recently, a home-based rehabilitation system for stroke survivors (Baptista et al. Comput. Meth. Prog. Biomed. 176:111–120 2019), composed of two linked applications (one for the therapist and another one for the patient), has been introduced. The proposed system has been previously tested on healthy subjects. However, for a fair evaluation, it is necessary to carry out a clinical study considering stroke survivors. This work aims at evaluating the home-based rehabilitation system on 10 chronic post-stroke spastic patients. For this purpose, each patient carries out two exercises implying the motion of the spastic upper limb using the home-based rehabilitation system. The impact of the color-based 3D skeletal feedback, guiding the patients during the training, is studied. The Time Variable Replacement (TVR)-based average distance, as well as the average postural angle used in Baptista et al. (Comput. Meth. Prog. Biomed. 176:111–120 2019), are reported to compare the movement and the posture of the patient with and without showing the feedback proposals, respectively. Furthermore, three different questionnaires, specifically designed for this study, are used to evaluate the user experience of the therapist and the patients. Overall, the reported results suggest the relevance of the proposed system for home-based rehabilitation of stroke survivors.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 1, 2019
In this paper, we propose a novel view-invariant action recognition method using a single monocul... more In this paper, we propose a novel view-invariant action recognition method using a single monocular RGB camera. Viewinvariance remains a very challenging topic in 2D action recognition due to the lack of 3D information in RGB images. Most successful approaches make use of the concept of knowledge transfer by projecting 3D synthetic data to multiple viewpoints. Instead of relying on knowledge transfer, we propose to augment the RGB data by a third dimension by means of 3D skeleton estimation from 2D images using a CNN-based pose estimator. In order to ensure viewinvariance, a pre-processing for alignment is applied followed by data expansion as a way for denoising. Finally, a Long-Short Term Memory (LSTM) architecture is used to model the temporal dependency between skeletons. The proposed network is trained to directly recognize actions from aligned 3D skeletons. The experiments performed on the challenging Northwestern-UCLA dataset show the superiority of our approach as compared to state-of-the-art ones.
Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2019
View-invariant action recognition using a single RGB camera represents a very challenging topic d... more View-invariant action recognition using a single RGB camera represents a very challenging topic due to the lack of 3D information in RGB images. Lately, the recent advances in deep learning made it possible to extract a 3D skeleton from a single RGB image. Taking advantage of this impressive progress, we propose a simple framework for fast and view-invariant action recognition using a single RGB camera. The proposed pipeline can be seen as the association of two key steps. The first step is the estimation of a 3D skeleton from a single RGB image using a CNN-based pose estimator such as VNect. The second one aims at computing view-invariant skeleton-based features based on the estimated 3D skeletons. Experiments are conducted on two well-known benchmarks, namely, IXMAS and Northwestern-UCLA datasets. The obtained results prove the validity of our concept, which suggests a new way to address the challenge of RGB-based view-invariant action recognition.
Fast Adaptive Reparametrization (FAR) With Application to Human Action Recognition
IEEE Signal Processing Letters
In this letter, a fast approach for curve reparametrization, called Fast Adaptive Reparamterizati... more In this letter, a fast approach for curve reparametrization, called Fast Adaptive Reparamterization (FAR), is introduced. Instead of computing an optimal matching between two curves such as Dynamic Time Warping (DTW) and elastic distance-based approaches, our method is applied to each curve independently, leading to linear computational complexity. It is based on a simple replacement of the curve parameter by a variable invariant under specific variations of reparametrization. The choice of this variable is heuristically made according to the application of interest. In addition to being fast, the proposed reparametrization can be applied not only to curves observed in Euclidean spaces but also to feature curves living in Riemannian spaces. To validate our approach, we apply it to the scenario of human action recognition using curves living in the Riemannian product Special Euclidean space $\mathbb {SE}(3)^n$. The obtained results on three benchmarks for human action recognition (MSRAction3D, Florence3D, and UTKinect) show that our approach competes with state-of-the-art methods in terms of accuracy and computational cost.
The Dense Trajectories concept is one of the most successful approaches in action recognition, su... more The Dense Trajectories concept is one of the most successful approaches in action recognition, suitable for scenarios involving a significant amount of motion. However, due to noise and background motion, many generated trajectories are irrelevant to the actual human activity and can potentially lead to performance degradation. In this paper, we propose Localized Trajectories as an improved version of Dense Trajectories where motion trajectories are clustered around human body joints provided by RGB-D cameras and then encoded by local Bag-of-Words. As a result, the Localized Trajectories concept provides an advanced discriminative representation of actions. Moreover, we generalize Localized Trajectories to 3D by using the depth modality. One of the main advantages of 3D Localized Trajectories is that they describe radial displacements that are perpendicular to the image plane. Extensive experiments and analysis were carried out on five different datasets.
Background and Objective: With the increase in the number of stroke survivors, there is an urgent... more Background and Objective: With the increase in the number of stroke survivors, there is an urgent need for designing appropriate home-based rehabilitation tools to reduce health-care costs. The objective is to empower the rehabilitation of post-stroke patients at the comfort of their homes by supporting them while exercising without the physical presence of the therapist. Methods: A novel low-cost home-based training system is introduced. This system is designed as a composition of two linked applications: one for the therapist and another one for the patient. The therapist prescribes personalized exercises remotely, monitors the home-based training and re-adapts the exercises if required. On the other side, the patient loads the prescribed exercises, trains the prescribed exercise while being guided by color-based visual feedback and gets updates about the exercise performance. To achieve that, our system provides three main functionalities, namely: 1) Feedback proposals guiding a personalized exercise session, 2) Posture monitoring optimizing the effectiveness of the session, 3) Assessment of the quality of the motion. Results: The proposed system is evaluated on 10 healthy participants without any previous contact with the system. To analyze the impact of the feedback proposals, we carried out two different experimental sessions: without and with feedback proposals. The obtained results give preliminary assessments about the interest of using such feedback. Conclusions: Obtained results on 10 healthy participants are promising. This encourages to test the system in a realistic clinical context for the rehabilitation of stroke survivors.
Over the last few decades, action recognition applications have attracted the growing interest of... more Over the last few decades, action recognition applications have attracted the growing interest of researchers, especially with the advent of RGB-D cameras. These applications increasingly require fast processing. Therefore, it becomes important to include the computational latency in the evaluation criteria.
IEEE Transactions on Cognitive and Developmental Systems
Human Action recognition (HAR) is largely used in the field of Ambient Assisted Living (AAL) to c... more Human Action recognition (HAR) is largely used in the field of Ambient Assisted Living (AAL) to create an interaction between humans and computers. In these applications, it cannot be asked to people to act non-naturally. The algorithm has to adapt and the interaction has to be as quick as possible to make this interaction fluent. To improve the existing algorithms with regards to that points, we propose a novel method based on skeleton information provided by RGB-D cameras. This approach is able to carry out early action recognition and is more robust to viewpoint variability. To reach this goal, a new descriptor called Body Directional Velocity is proposed and a real-time classification is performed. Experimental results on four benchmarks show that our method competes with various skeleton-based HAR algorithms. We also show the suitability of our method for early recognition of human actions.
In this article, we introduce a fast, accurate and invariant method for RGB-D based human action ... more In this article, we introduce a fast, accurate and invariant method for RGB-D based human action recognition using a Hierarchical Kinematic Covariance (HKC) descriptor. Recently, non singular covariance matrices of pattern features which are elements of the space of Symmetric Definite Positive (SPD) matrices, have been proven to be very efficient descriptors in the field of pattern recognition. However, in the case of action recognition, singular covariance matrices cannot be avoided because the dimension of features could be higher than the number of samples. Such covariance matrices (non singular and singular) belong to the space of Symmetric Positive semi-Definite (SPsD) matrices. Thus, in order to classify actions, we propose to adapt kernel methods such as Support Vector Machines (SVM) and Multiple Kernel Learning (MKL) to the space of SPsD matrices by using a perturbed Log-Euclidean distance (Arsigny et al., 2006). The mathematical validity of this perturbed distance (called Modified Log-Euclidean distance) for SPsD is therefore studied. The offline experiments are conducted on three challenging benchmarks, namely MSRAction3D, UTKinect and Multiview3D datasets. A fair comparison demonstrates that our approach competes with state-of-the-art methods in terms of accuracy and computational latency. Finally, our method is extended to an online scenario and experiments on MSRC12 prove the efficiency of this extension.
2016 23rd International Conference on Pattern Recognition (ICPR), 2016
With the availability of the recent human skeleton extraction algorithm introduced by Shotton et ... more With the availability of the recent human skeleton extraction algorithm introduced by Shotton et al. [1], an interest for skeleton-based action recognition methods has been renewed. Despite the importance of the low-latency aspect in applications, it can be noted that the majority of recent approaches has not been evaluated in terms of computational cost. In this paper, a novel fast and accurate human action descriptor named Kinematic Spline Curves (KSC) is introduced. This descriptor is built by interpolating the kinematics of joints (position, velocity and acceleration). To overcome the anthropometric and the execution rate variabilities, we respectively propose the use of a skeleton normalization and a temporal normalization. For this purpose, a new temporal normalization method based on the Normalized Accumulated kinetic Energy (NAE) of the human skeleton is suggested. Finally, the classification step is performed using a linear Support Vector Machine (SVM). Experimental results on challenging benchmarks show the efficiency of our approach in terms of recognition accuracy and computational latency.
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Being capable of estimating the pose of uncooperative objects in space has been proposed as a key... more Being capable of estimating the pose of uncooperative objects in space has been proposed as a key asset for enabling safe close-proximity operations such as space rendezvous, in-orbit servicing and active debris removal. Usual approaches for pose estimation involve classical computer vision-based solutions or the application of Deep Learning (DL) techniques. This work explores a novel DLbased methodology, using Convolutional Neural Networks (CNNs), for estimating the pose of uncooperative spacecrafts. Contrary to other approaches, the proposed CNN directly regresses poses without needing any prior 3D information. Moreover, bounding boxes of the spacecraft in the image are predicted in a simple, yet efficient manner. The performed experiments show how this work competes with the state-of-the-art in uncooperative spacecraft pose estimation, including works which require 3D information as well as works which predict bounding boxes through sophisticated CNNs.
Uploads
Papers by enjie ghorbel