Skip to main content

Christophe Garcia

Insa de Lyon, Computer Science, Faculty Member

Followers

102

Following

10

Co-authors

10

Public Views

Interests

Uploads

Papers by Christophe Garcia

<title>Robust camera calibration using 2D-to-3D feature correspondences</title>

Proceedings of SPIE, Jul 7, 1997

ABSTRACT

End-to-End Data Visualization by Metric Learning and Coordinate Transformation

arXiv (Cornell University), Dec 27, 2016

This paper presents a deep nonlinear metric learning framework for data visualization on an image... more This paper presents a deep nonlinear metric learning framework for data visualization on an image dataset. We propose the Triangular Similarity and prove its equivalence to the Cosine Similarity in measuring a data pair. Based on this novel similarity, a geometrically motivated loss function-the triangular loss-is then developed for optimizing a metric learning system comprising two identical CNNs. It is shown that this deep nonlinear system can be efficiently trained by a hybrid algorithm based on the conventional backpropagation algorithm. More interestingly, benefiting from classical manifold learning theories, the proposed system offers two different views to visualize the outputs, the second of which provides better classification results than the state-of-the-art methods in the visualizable spaces.

Similarity Metric Learning

Multi-faceted Deep Learning, 2012

Similarity metric learning models the general semantic similarities and distances between objects... more Similarity metric learning models the general semantic similarities and distances between objects and classes of objects (e.g . persons) in order to recognise them. Different strategies and models based on Deep Learning exist and generally consist in learning a non-linear projection into a lower dimensional vector space where the semantic similarity between instances can be easily measured with a standard distance. As opposed to supervised learning, one does not train the model to predict the class labels, and the actual labels may not even be used or not known in advance. Machine learning-based similarity metric learning approaches rather operate in a weakly supervised way. That is, the training target (loss) is defined on the relationship between several instances, i.e. similar or different pairs, triplets or tuples. This learnt distance can then be applied, for example, to two new, unseen examples of unknown classes in order to determine if they belong to the same class or if the...

AColDSS: Robust Unsupervised Automatic Color Segmentation System for Noisy Heterogeneous Document Images

European Project Space on Computer Vision, Graphics, Optics and Photonics, 2015

We present the first fully automatic color analysis system suited for noisy heterogeneous documen... more We present the first fully automatic color analysis system suited for noisy heterogeneous documents. We developed a robust color segmentation system adapted for business documents and old handwritten document with significant color complexity and dithered background. We have developed the first fully data-driven pixel-based approach that does not need a priori information, training or manual assistance. The system achieves several operations to segment automatically color images, separate text from noise and graphics and provides color information about text color. The contribution of our work is four-fold: Firstly, it does not require any connected component analysis and simplifies the extraction of the layout and the recognition step undertaken by the OCR. Secondly, it is the usage of color morphology to simultaneously segment both text and inverted text using conditional color dilation and erosion even in cases where there are overlaps between the two. Thirdly, our system removes efficiently noise and speckles from dithered background and automatically suppresses graphical elements using geodesic measurements. Fourthly, we develop a method to splits overlapped characters and separates characters from graphics if they have different colors. The proposed Automatic Color Document Processing System has archived 99 % of correctly segmented document and has the potential to be adapted into different document images. The system outperformed the classical approach that uses binarization of the grayscale image.

Learning Sparse Filters in Deep Convolutional Neural Networks with a l1/l2 Pseudo-Norm

ArXiv, 2020

While deep neural networks (DNNs) have proven to be efficient for numerous tasks, they come at a ... more While deep neural networks (DNNs) have proven to be efficient for numerous tasks, they come at a high memory and computation cost, thus making them impractical on resource-limited devices. However, these networks are known to contain a large number of parameters. Recent research has shown that their structure can be more compact without compromising their performance. In this paper, we present a sparsity-inducing regularization term based on the ratio l1/l2 pseudo-norm defined on the filter coefficients. By defining this pseudo-norm appropriately for the different filter kernels, and removing irrelevant filters, the number of kernels in each layer can be drastically reduced leading to very compact Deep Convolutional Neural Networks (DCNN) structures. Unlike numerous existing methods, our approach does not require an iterative retraining process and, using this regularization term, directly produces a sparse model during the training process. Furthermore, our approach is also much ea...

Une Nouvelle Méthode d'Extraction de Caractéristiques Faciales pour la Reconnaissance: l'Analyse Dicriminante Bilinéaire

Un des buts de l'analyse de mouvement est la segmentation spatio-temporelle. Dans ce domaine ... more Un des buts de l'analyse de mouvement est la segmentation spatio-temporelle. Dans ce domaine une des difficultes majeures est le suivi d’objets dans des sequences video. Le probleme traite dans cet article est celui des occlusions. Un systeme visuel pour l’identification des trajectoires des objets dans les sequences video doit pouvoir suivre les objets (et leurs frontieres) qui sont partiellement ou meme entierement occultees. Il y a plusieurs approches interessantes pour resoudre ce probleme, mais malheureusement, presque toutes utilisent une approche pixel, les rendant inutilisables dans des applications en temps reel. De plus, elles ne sont que peu expressives au niveau semantique de la video. Dans cet article, nous presentons une methode pour le calcul des trajectoires des objets en presence d’occlusions qui exploite la richesse d'information due a la concordance spatiale entre les pixels, en utilisant une technique basee sur les graphes et une representation multiresol...

Unsupervised learning of co-occurrences for face images retrieval

Proceedings of the 2nd ACM International Conference on Multimedia in Asia, 2021

Despite a huge leap in performance of face recognition systems in recent years, some cases remain... more Despite a huge leap in performance of face recognition systems in recent years, some cases remain challenging for them while being trivial for humans. This is because a human brain is exploiting much more information than the face appearance to identify a person. In this work, we aim at capturing the social context of unlabeled observed faces in order to improve face retrieval. In particular, we propose a framework that substantially improves face retrieval by exploiting the faces occurring simultaneously in a query's context to infer a multi-dimensional social context descriptor. Combining this compact structural descriptor with the individual visual face features in a common feature vector considerably increases the correct face retrieval rate and allows to disambiguate a large proportion of query results of different persons that are barely distinguishable visually. To evaluate our framework, we also introduce a new large dataset of faces of French TV personalities organised in TV shows in order to capture the co-occurrence relations between people. On this dataset, our framework is able to improve the mean Average Precision over a set of internal queries from 67.93% (using only facial features extracted with a state-of-the-art pre-trained model) to 78.16% (using both facial features and faces co-occurrences), and from 67.88% to 77.36% over a set of external queries.

Multiple Instance Learning for Training Neural Networks under Label Noise

2020 International Joint Conference on Neural Networks (IJCNN), 2020

In this paper, we present an extensive study of different neural network-based approaches and los... more In this paper, we present an extensive study of different neural network-based approaches and loss functions applied to the Multiple Instance Learning (MIL) problem and binary classification. In the MIL setting, training is performed on small sets of instances called bags, where each positive bag contains at least one positive instance and each negative bag contains only negative instances. We propose a new loss function based on the generalised mean and an effective training strategy particularly suited to this setting and to problems where the instances of one class contain a considerable amount of label noise. Furthermore, we present a probabilistic approach to dynamically estimate the label noise in this unbalanced binary classification setting and utilise it to automatically modulate the hyper-parameter of our proposed loss function. We experimentally evaluated our approach on a number of standard benchmarks for binary classification and showed that it outperforms standard neural network optimisation algorithms as well as most state-of-theart MIL methods, both on numerical/categorical vector data with MLP architectures and images with Convolutional Neural Networks.

Data-Efficient Information Extraction from Documents with Pre-trained Language Models

Document Analysis and Recognition – ICDAR 2021 Workshops, 2021

Like for many text understanding and generation tasks, pretrained languages models have emerged a... more Like for many text understanding and generation tasks, pretrained languages models have emerged as a powerful approach for extracting information from business documents. However, their performance has not been properly studied in data-constrained settings which are often encountered in industrial applications. In this paper, we show that LayoutLM, a pre-trained model recently proposed for encoding 2D documents, reveals a high sample-efficiency when fine-tuned on public and real-world Information Extraction (IE) datasets. Indeed, LayoutLM reaches more than 80% of its full performance with as few as 32 documents for fine-tuning. When compared with a strong baseline learning IE from scratch, the pre-trained model needs between 4 to 30 times fewer annotated documents in the toughest data conditions. Finally, LayoutLM performs better on the real-world dataset when having been beforehand fine-tuned on the full public dataset, thus indicating valuable knowledge transfer abilities. We therefore advocate the use of pre-trained language models for tackling practical extraction problems.

Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey

Journal of Signal Processing Systems, 2020

Over the past, deep neural networks have proved to be an essential element for developing intelli... more Over the past, deep neural networks have proved to be an essential element for developing intelligent solutions. They have achieved remarkable performances at a cost of deeper layers and millions of parameters. Therefore utilising these networks on limited resource platforms for smart cameras is a challenging task. In this context, models need to be (i) accelerated and (ii) memory efficient without significantly compromising on performance. Numerous works have been done to obtain smaller, faster and accurate models. This paper presents a survey of methods suitable for porting deep neural networks on resource-limited devices, especially for smart cameras. These methods can be roughly divided in two main sections. In the first part, we present compression techniques. These techniques are categorized into: knowledge distillation, pruning, quantization, hashing, reduction of numerical precision and binarization. In the second part, we focus on architecture optimization. We introduce the methods to enhance networks structures as well as neural architecture search techniques. In each of their parts, we describe different methods, and analyse them. Finally, we conclude this paper with a discussion on these methods.

Learning personalized ADL recognition models from few raw data

Artificial Intelligence in Medicine, 2020

Recognition of Activities of Daily Living (ADL) is an essential component of assisted living syst... more Recognition of Activities of Daily Living (ADL) is an essential component of assisted living systems based on actigraphy. This task can nowadays be performed by machine learning models which are able to automatically extract and learn relevant features but, most of time, need to be trained with large amounts of data collected on several users. In this paper, we propose an approach to learn personalized ADL recognition models from few raw data based on a specific type of neural network called matching network. The interest of this few-shot learning approach is threefold. Firstly, people perform activities their own way and general models may average out important individual characteristics unlike personalized models that could thus achieve better performance. Secondly, gathering large quantities of annotated data from one user is time-consuming and threatens privacy in a medical context. Thirdly, matching networks are by nature weakly dependent on the classes they are trained on and can generalize easily to new activities without needing extra training, thus making them very versatile for real applications. Our results show the effectiveness of the proposed approach compared to general neural network models, even in situations with few training data.

Online Appearance-Motion Coupling for Multi-Person Tracking in Videos

International Journal of Modeling and Optimization, 2019

Pairwise Identity Verification via Linear Concentrative Metric Learning

IEEE transactions on cybernetics, Jan 16, 2016

This paper presents a study of metric learning systems on pairwise identity verification, includi... more This paper presents a study of metric learning systems on pairwise identity verification, including pairwise face verification and pairwise speaker verification, respectively. These problems are challenging because the individuals in training and testing are mutually exclusive, and also due to the probable setting of limited training data. For such pairwise verification problems, we present a general framework of metric learning systems and employ the stochastic gradient descent algorithm as the optimization solution. We have studied both similarity metric learning and distance metric learning systems, of either a linear or shallow nonlinear model under both restricted and unrestricted training settings. Extensive experiments demonstrate that with limited training pairs, learning a linear system on similar pairs only is preferable due to its simplicity and superiority, i.e., it generally achieves competitive performance on both the labeled faces in the wild face dataset and the NIST...

An Efficient New PDE-based Characters Reconstruction after Graphics Removal

2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2016

The separation between texts and graphics when they are overlapped is a challenging problem for d... more The separation between texts and graphics when they are overlapped is a challenging problem for digitization companies. In a previous work [1], we presented the first unsupervised fully automatic segmentation system adapted for colour business document with significant colour complexity and dithered background. The system achieves several operations to segment automatically colour images, separate text from noise and graphics and provides colour information about text colour. After split overlapped characters and separates characters from graphics, characters are broken. The OCR system becomes unable to recognize successfully broken characters and its efficiency is thus seriously affected. This paper presents the first Character Reconstruction System through a new PDE (Partial Differential Equation)-based approach. Our approach takes benefit of the combination of the anisotropic morphology proposed by Breuß and the Weickert Coherence enhancing shock filter diffusion. We introduce and present a continuous anisotropic morphology method driven by the main direction of the first order tensors applied in the neighborhood of the missing part left by the separation between text and graphics. It reconstructs the missing part even when the left area is larger than the strokes width. The coherency of the orientation of the tensors around missing parts overcomes the problem of image noises. The application of the ABBY FineReader OCR engine proves an important reduction in OCR errors. Our experiments show that our proposition compared to the existing state of the art requires no training steps and outperforms both of anisotropic morphology and the Weickert Coherence enhancing shock filter diffusion applied separately.

Polar Sine Based Siamese Neural Network for Gesture Recognition

Artificial Neural Networks and Machine Learning – ICANN 2016, 2016

Our work focuses on metric learning between gesture sample signatures using Siamese Neural Networ... more Our work focuses on metric learning between gesture sample signatures using Siamese Neural Networks (SNN), which aims at modeling semantic relations between classes to extract discriminative features. Our contribution is the notion of polar sine which enables a redefinition of the angular problem. Our final proposal improves inertial gesture classification in two challenging test scenarios, with respective average classification rates of 0.934 ± 0.011 and 0.776 ± 0.025.

Logistic similarity metric learning for face verification

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015

This paper presents a new method for similarity metric learning, called Logistic Similarity Metri... more This paper presents a new method for similarity metric learning, called Logistic Similarity Metric Learning (LSML), where the cost is formulated as the logistic loss function, which gives a probability estimation of a pair of faces being similar. Especially, we propose to shift the similarity decision boundary gaining significant performance improvement. We test the proposed method on the face verification problem using four single face descriptors: LBP, OCLBP, SIFT and Gabor wavelets. Extensive experimental results on the LFWa data set demonstrate that the proposed method achieves competitive state-of-the-art performance on the problem of face verification.

Triangular similarity metric learning for face verification

2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2015

We propose an efficient linear similarity metric learning method for face verification called Tri... more We propose an efficient linear similarity metric learning method for face verification called Triangular Similarity Metric Learning (TSML). Compared with relevant state-ofthe-art work, this method improves the efficiency of learning the cosine similarity while keeping effectiveness. Concretely, we present a geometrical interpretation based on the triangle inequality for developing a cost function and its efficient gradient function. We formulate the cost function as an optimization problem and solve it with the advanced L-BFGS optimization algorithm. We perform extensive experiments on the LFW data set using four descriptors: LBP, OCLBP, SIFT and Gabor wavelets. Moreover, for the optimization problem, we test two kinds of initialization: the identity matrix and the WCCN matrix. Experimental results demonstrate that both of the two initializations are efficient and that our method achieves the state-of-the-art performance on the problem of face verification.

AColDPS - Robust and Unsupervised Automatic Color Document Processing System

Proceedings of the 10th International Conference on Computer Vision Theory and Applications, 2015

This paper presents the first fully automatic color analysis system suited for business documents... more This paper presents the first fully automatic color analysis system suited for business documents. Our pixelbased approach uses mainly color morphology and does not require any training, manual assistance, prior knowledge or model. We developed a robust color segmentation system adapted for invoices and forms with significant color complexity and dithered background. The system achieves several operations to segment automatically color images, separate text from noise and graphics and provides color information about text color. The contribution of our work is Tree-fold. Firstly, it is the usage of color morphology to simultaneously segment both text and inverted text. Our system processes inverted and non-inverted text automatically using conditional color dilation and erosion, even in cases where there are overlaps between the two. Secondly, it is the extraction of geodesic measures using morphological convolution in order to separate text, noise and graphical elements. Thirdly, we develop a method to disconnect characters touching or overlapping graphical elements. Our system can separate characters that touch straight lines, split overlapped characters with different colors and separate characters from graphics if they have different colors. A color analysis stage automatically calculates the number of character colors. The proposed system is generic enough to process a wide range of images of digitized business documents from different origins. It outperforms the classical approach that uses binarization of greyscale images.

Modèles actifs d’apparences adaptés

Active Appearance Models (AAM) are able to align known faces in an efficient manner when face pos... more Active Appearance Models (AAM) are able to align known faces in an efficient manner when face pose and illumination are controlled. The AAM exploit a set of face examples in order to extract a statistical model. There is no difficulty to align a face with the same type (same morphology, illumination and pose) which constitute the example data set. Unfortunately, the AAM are less outstanding from the moment when the illumination, pose and face type changes. AAM robustness is link to the variability introduced in the learning base. The more the AAM will contain variability, the more it will be able to adapt itself to variable faces with the following drawback : the data represented in the reduced parameters space will then form different classes letting appear holes, regions without any data (see Fig. 1). It is therefore very difficult to make the AAM converge in this scattered space. We propose in this paper a robust Active Appearance Models allowing a real-time implementation. To increase the AAM robustness to illumination changes, we propose Oriented Map AAM (OM-AAM). Adapted AAM will be presented after to increase the AAM robustness to any other types of variability (in identity, pose, expression etc.)...

The 2005 PASCAL Visual Object Classes Challenge

Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, 2006

The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the chall... more The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved.