Video-based action recognition is a
crucial task in computer vision with
applications spanning ... more Video-based action recognition is a crucial task in computer vision with applications spanning surveillance, sports analytics, human-computer interaction, and autonomous systems. This paper explores the application of spatiotemporal deep learning models for action recognition, leveraging advancements in neural network architectures to analyze video data effectively. Traditional approaches often process spatial and temporal dimensions independently, limiting their ability to capture complex motion patterns and contextual relationships. In contrast, spatiotemporal deep learning models integrate spatial and temporal features simultaneously, enabling robust recognition of dynamic actions. The study highlights key methods, including convolutional neural networks (CNNs) for spatial feature extraction and recurrent neural networks (RNNs) or 3D convolutional networks (3D-CNNs) for temporal modelling. It also delves into transformer-based architectures and attention mechanisms, which enhance model performance by selectively focusing on salient regions and time steps.
Uploads
Papers by Lavanya Kumar
crucial task in computer vision with
applications spanning surveillance, sports
analytics, human-computer interaction, and
autonomous systems. This paper explores the
application of spatiotemporal deep learning
models for action recognition, leveraging
advancements in neural network architectures
to analyze video data effectively. Traditional
approaches often process spatial and temporal
dimensions independently, limiting their
ability to capture complex motion patterns
and contextual relationships. In contrast,
spatiotemporal deep learning models
integrate spatial and temporal features
simultaneously, enabling robust recognition of
dynamic actions. The study highlights key
methods, including convolutional neural
networks (CNNs) for spatial feature extraction
and recurrent neural networks (RNNs) or 3D
convolutional networks (3D-CNNs) for
temporal modelling. It also delves into
transformer-based architectures and attention
mechanisms, which enhance model
performance by selectively focusing on salient
regions and time steps.