Papers by Nikolaos Gkalelis

In this paper two new learning-based eXplainable AI (XAI) methods for deep convolutional neural n... more In this paper two new learning-based eXplainable AI (XAI) methods for deep convolutional neural network (DCNN) image classifiers, called L-CAM-Fm and L-CAM-Img, are proposed. Both methods use an attention mechanism that is inserted in the original (frozen) DCNN and is trained to derive class activation maps (CAMs) from the last convolutional layer's feature maps. During training, CAMs are applied to the feature maps (L-CAM-Fm) or the input image (L-CAM-Img) forcing the attention mechanism to learn the image regions explaining the DCNN's outcome. Experimental evaluation on ImageNet shows that the proposed methods achieve competitive results while requiring a single forward pass at the inference stage. Moreover, based on the derived explanations a comprehensive qualitative analysis is performed providing valuable insight for understanding the reasons behind classification errors, including possible dataset biases affecting the trained classifier.
Fractional Step Discriminant Pruning: A Filter Pruning Framework for Deep Convolutional Neural Networks
2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)
Incremental Accelerated Kernel Discriminant Analysis
Proceedings of the 25th ACM international conference on Multimedia
GPU Accelerated Generalised Subclass Discriminant Analysis for Event and Concept Detection in Video
Proceedings of the 23rd ACM international conference on Multimedia
ObjectGraphs: Using Objects and a Graph Convolutional Network for the Bottom-up Recognition and Explanation of Events in Video
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

ITI-CERTH participation to TRECVID 2013
ABSTRACT This paper provides an overview of the tasks submitted to TRECVID 2013 by ITI-CERTH. ITI... more ABSTRACT This paper provides an overview of the tasks submitted to TRECVID 2013 by ITI-CERTH. ITI- CERTH participated in the Semantic Indexing (SIN), the Event Detection in Internet Multimedia (MED), the Multimedia Event Recounting (MER) and the Instance Search (INS) tasks. In the SIN task, techniques are developed, which combine new video representations (video tomographs) with existing well-performing descriptors such as SIFT, Bag-of-Words for shot representation, ensemble construction techniques and a multi-label learning method for score re�nement. In the MED task, an e�cient method that uses only static visual features as well as limited audio information is evaluated. In the MER sub-task of MED a discriminant analysis-based feature selection method is combined with a model vector approach for selecting the key semantic entities depicted in the video that best describe the detected event. Finally, the INS task is performed by employing VERGE, which is an in- teractive retrieval application combining retrieval functionalities in various modalities, used previously for supporting the Known Item Search (KIS) task.
In this paper a novel method for human movement representation and recognition is proposed. A mov... more In this paper a novel method for human movement representation and recognition is proposed. A movement is regarded as a sequence of basic movement patterns, the so-called dynemes. Initially, the fuzzy c-mean (FCM) algorithm is used to identify the dynemes in the input space, and then principal component analysis plus linear discriminant analysis (PCA plus LDA) is employed to project the postures of a movement to the identified dynemes. In this space, the posture representations of the movement are combined to represent the movement in terms of its comprising dynemes. This representation allows for efficient Mahalanobis or cosine-based nearest centroid classification of variable length movements.
Immersive Multimedia (Video)
Accelerated nonlinear discriminant analysis
ABSTRACT

This paper provides an overview of the tasks submitted to TRECVID 2011 by ITI-CERTH. ITI-CERTH pa... more This paper provides an overview of the tasks submitted to TRECVID 2011 by ITI-CERTH. ITI-CERTH participated in the Known-item search (KIS) as well as in the Semantic Indexing (SIN) and the Event Detection in Internet Multimedia (MED) tasks. In the SIN task, techniques are developed, which combine motion information with existing well-performing descriptors such as SURF, Random Forests and Bag-of-Words for shot representation. In the MED task, the trained concept detectors of the SIN task are used to represent video sources with model vector sequences, then a dimensionality reduction method is used to derive a discriminant subspace for recognizing events, and, finally, SVMbased event classifiers are used to detect the underlying video events. The KIS search task is performed by employing VERGE, which is an interactive retrieval application combining retrieval functionalities in various modalities and exploiting implicit user feedback. 1

This paper provides an overview of the tasks submitted to TRECVID 2012 by ITI-CERTH. ITI-CERTH pa... more This paper provides an overview of the tasks submitted to TRECVID 2012 by ITI-CERTH. ITI-CERTH participated in the Known-item search (KIS), in the Semantic Indexing (SIN), as well as in the Event Detection in Internet Multimedia (MED) and the Multimedia Event Recounting (MER) tasks. In the SIN task, techniques are developed, which combine video representations that express motion semantics with existing well-performing descriptors such as SIFT and Bag-of-Words for shot representation. In the MED task, two methods are evaluated, one that is based on Gaussian mixture models (GMM) and audio features, and a "semantic model vector approach" that combines a pool of subclass kernel support vector machines (KSVMs) in an ECOC framework for event detection exploiting visual information only. Furthermore, we investigate fusion strategies of the two systems in an intermediate semantic level or in score level (late fusion). In the MER task, a "model vector approach" is used to describe the semantic content of the videos, similar to the MED task, and a novel feature selection method is utilized to select the most discriminant concepts regarding the target event. Finally, the KIS search task is performed by employing VERGE, which is an interactive retrieval application combining retrieval functionalities in various modalities. Representation Classifiers 12 local-image-feature-based Classifiers (3 descriptors (SIFT, Opponent-SIFT, RGB-SIFT) x Key-frame 2 sampling strategies (Dense, Harris-Laplace) x 2 BoW strategies (soft-, hard-assignment)) 1 global-image-feature-based classifier (color histograms) Tomographs 12 tomograph-based Classifiers (2 types of video tomographs (horizontal, vertical) x 3 descriptors (SIFT, Opponent-SIFT, RGB-SIFT) x 2 BoW strategies (soft-, hard-assignment))
Fusion of Movement Specific Human Identification Experts
Lecture Notes in Computer Science, 2009
In this paper a multi-modal method for human identification that exploits the discriminant featur... more In this paper a multi-modal method for human identification that exploits the discriminant features derived from several movement types performed from the same human is proposed. Utilizing a fuzzy vector quantization (FVQ) and linear discriminant analysis (LDA) based algorithm, an unknown movement is first classified, and, then, the person performing the movement is recognized from a movement specific person recognition
Sparse human movement representation and recognition
2008 IEEE 10th Workshop on Multimedia Signal Processing, 2008
ABSTRACT

2013 IEEE International Conference on Multimedia and Expo (ICME), 2013
In this paper, complex video events are learned and detected using a novel subclass recoding erro... more In this paper, complex video events are learned and detected using a novel subclass recoding error-correcting outputs (SRECOC) design. In particular, a set of pre-trained concept detectors along different low-level visual feature types are used to provide a model vector representation of video signals. Subsequently, a subclass partitioning algorithm is used to divide only the target event class to several subclasses and learn one subclass detector for each event subclass. The pool of the subclass detectors is then combined under a SRE-COC framework to provide a single event detector. This is achieved by first exploiting the properties of the linear lossweighted decoding measure in order to derive a probability estimate along the different event subclass detectors, and then utilizing the sum probability rule along event subclasses to retrieve a single degree of confidence for the presence of the target event in a particular test video. Experimental results on the large-scale video collections of the TRECVID Multimedia Event Detection (MED) task verify the effectiveness of the proposed method. Moreover, the effect of weak or strong concept detectors on the accuracy of the resulting event detectors is examined.
Video event detection using generalized subclass discriminant analysis and linear support vector machines
Proceedings of International Conference on Multimedia Retrieval - ICMR '14, 2014
ABSTRACT
Encyclopedia of Information Science and Technology, Third Edition, 2015

IEEE transactions on neural networks and learning systems, 2013
In this paper, a theoretical link between mixture subclass discriminant analysis (MSDA) and a res... more In this paper, a theoretical link between mixture subclass discriminant analysis (MSDA) and a restricted Gaussian model is first presented. Then, two further discriminant analysis (DA) methods, i.e., fractional step MSDA (FSMSDA) and kernel MSDA (KMSDA) are proposed. Linking MSDA to an appropriate Gaussian model allows the derivation of a new DA method under the expectation maximization (EM) framework (EM-MSDA), which simultaneously derives the discriminant subspace and the maximum likelihood estimates. The two other proposed methods generalize MSDA in order to solve problems inherited from conventional DA. FSMSDA solves the subclass separation problem, that is, the situation in which the dimensionality of the discriminant subspace is strictly smaller than the rank of the inter-between-subclass scatter matrix. This is done by an appropriate weighting scheme and the utilization of an iterative algorithm for preserving useful discriminant directions. On the other hand, KMSDA uses the ...
Human identification from human movements
2009 16th IEEE International Conference on Image Processing (ICIP), 2009
... From this database we used low resolution videos (180 × 144 pixels resolution at 25 fps), dep... more ... From this database we used low resolution videos (180 × 144 pixels resolution at 25 fps), depicting nine persons, namely, Daria (dar), Denis (den), Ido (ido), Ira (ira), Lena (len), Lyova (lyo), Moshe (mos), Shahar (sha), performing seven movements, ie, walk (wk), run (rn), skip ...

2010 IEEE Fourth International Conference on Semantic Computing, 2010
In this paper, a joint content-event model for indexing multimedia data is proposed. The event pa... more In this paper, a joint content-event model for indexing multimedia data is proposed. The event part of the model follows a number of formal principles to represent several aspects of real-life events, whereas the content part is used to describe the decomposition of any type of multimedia data to content segments. In contrast to other event models for multimedia indexing, the proposed model treats events as first class entities and provides a referencing mechanism to link real-life event elements with content segments at multiple granularity levels. This referencing mechanism has been defined with the objective to facilitate the automatic enrichment of event elements with information extracted by automatic analysis of content segments, enabling event-centric multimedia indexing in large-scale multimedia collections.

Proceedings of the 1st ACM International Conference on Multimedia Retrieval - ICMR '11, 2011
This paper demonstrates a new approach to detecting highlevel events that may be depicted in imag... more This paper demonstrates a new approach to detecting highlevel events that may be depicted in images or video frames. Given a non-annotated content item, a large number of previously trained visual concept detectors are applied to it and their responses are used for representing the content item with a model vector in a high-dimensional concept space. Subsequently, an improved subclass discriminant analysis method is used for identifying a concept subspace within the aforementioned concept space, that is most appropriate for detecting and recognizing the target high-level events. In this subspace, the nearest neighbor rule is used for comparing the non-annotated content item with a few known example instances of the target events. The high-level events used as target events in the present version of the system are those defined for the TRECVID 2010 Multimedia Event Detection (MED) task.
Uploads
Papers by Nikolaos Gkalelis