Papers by Christophe Garcia
<title>Robust camera calibration using 2D-to-3D feature correspondences</title>
Proceedings of SPIE, Jul 7, 1997
ABSTRACT
arXiv (Cornell University), Dec 27, 2016
This paper presents a deep nonlinear metric learning framework for data visualization on an image... more This paper presents a deep nonlinear metric learning framework for data visualization on an image dataset. We propose the Triangular Similarity and prove its equivalence to the Cosine Similarity in measuring a data pair. Based on this novel similarity, a geometrically motivated loss function-the triangular loss-is then developed for optimizing a metric learning system comprising two identical CNNs. It is shown that this deep nonlinear system can be efficiently trained by a hybrid algorithm based on the conventional backpropagation algorithm. More interestingly, benefiting from classical manifold learning theories, the proposed system offers two different views to visualize the outputs, the second of which provides better classification results than the state-of-the-art methods in the visualizable spaces.

Multi-faceted Deep Learning, 2012
Similarity metric learning models the general semantic similarities and distances between objects... more Similarity metric learning models the general semantic similarities and distances between objects and classes of objects (e.g . persons) in order to recognise them. Different strategies and models based on Deep Learning exist and generally consist in learning a non-linear projection into a lower dimensional vector space where the semantic similarity between instances can be easily measured with a standard distance. As opposed to supervised learning, one does not train the model to predict the class labels, and the actual labels may not even be used or not known in advance. Machine learning-based similarity metric learning approaches rather operate in a weakly supervised way. That is, the training target (loss) is defined on the relationship between several instances, i.e. similar or different pairs, triplets or tuples. This learnt distance can then be applied, for example, to two new, unseen examples of unknown classes in order to determine if they belong to the same class or if the...

European Project Space on Computer Vision, Graphics, Optics and Photonics, 2015
We present the first fully automatic color analysis system suited for noisy heterogeneous documen... more We present the first fully automatic color analysis system suited for noisy heterogeneous documents. We developed a robust color segmentation system adapted for business documents and old handwritten document with significant color complexity and dithered background. We have developed the first fully data-driven pixel-based approach that does not need a priori information, training or manual assistance. The system achieves several operations to segment automatically color images, separate text from noise and graphics and provides color information about text color. The contribution of our work is four-fold: Firstly, it does not require any connected component analysis and simplifies the extraction of the layout and the recognition step undertaken by the OCR. Secondly, it is the usage of color morphology to simultaneously segment both text and inverted text using conditional color dilation and erosion even in cases where there are overlaps between the two. Thirdly, our system removes efficiently noise and speckles from dithered background and automatically suppresses graphical elements using geodesic measurements. Fourthly, we develop a method to splits overlapped characters and separates characters from graphics if they have different colors. The proposed Automatic Color Document Processing System has archived 99 % of correctly segmented document and has the potential to be adapted into different document images. The system outperformed the classical approach that uses binarization of the grayscale image.

ArXiv, 2020
While deep neural networks (DNNs) have proven to be efficient for numerous tasks, they come at a ... more While deep neural networks (DNNs) have proven to be efficient for numerous tasks, they come at a high memory and computation cost, thus making them impractical on resource-limited devices. However, these networks are known to contain a large number of parameters. Recent research has shown that their structure can be more compact without compromising their performance. In this paper, we present a sparsity-inducing regularization term based on the ratio l1/l2 pseudo-norm defined on the filter coefficients. By defining this pseudo-norm appropriately for the different filter kernels, and removing irrelevant filters, the number of kernels in each layer can be drastically reduced leading to very compact Deep Convolutional Neural Networks (DCNN) structures. Unlike numerous existing methods, our approach does not require an iterative retraining process and, using this regularization term, directly produces a sparse model during the training process. Furthermore, our approach is also much ea...

Une Nouvelle Méthode d'Extraction de Caractéristiques Faciales pour la Reconnaissance: l'Analyse Dicriminante Bilinéaire
Un des buts de l'analyse de mouvement est la segmentation spatio-temporelle. Dans ce domaine ... more Un des buts de l'analyse de mouvement est la segmentation spatio-temporelle. Dans ce domaine une des difficultes majeures est le suivi d’objets dans des sequences video. Le probleme traite dans cet article est celui des occlusions. Un systeme visuel pour l’identification des trajectoires des objets dans les sequences video doit pouvoir suivre les objets (et leurs frontieres) qui sont partiellement ou meme entierement occultees. Il y a plusieurs approches interessantes pour resoudre ce probleme, mais malheureusement, presque toutes utilisent une approche pixel, les rendant inutilisables dans des applications en temps reel. De plus, elles ne sont que peu expressives au niveau semantique de la video. Dans cet article, nous presentons une methode pour le calcul des trajectoires des objets en presence d’occlusions qui exploite la richesse d'information due a la concordance spatiale entre les pixels, en utilisant une technique basee sur les graphes et une representation multiresol...

Proceedings of the 2nd ACM International Conference on Multimedia in Asia, 2021
Despite a huge leap in performance of face recognition systems in recent years, some cases remain... more Despite a huge leap in performance of face recognition systems in recent years, some cases remain challenging for them while being trivial for humans. This is because a human brain is exploiting much more information than the face appearance to identify a person. In this work, we aim at capturing the social context of unlabeled observed faces in order to improve face retrieval. In particular, we propose a framework that substantially improves face retrieval by exploiting the faces occurring simultaneously in a query's context to infer a multi-dimensional social context descriptor. Combining this compact structural descriptor with the individual visual face features in a common feature vector considerably increases the correct face retrieval rate and allows to disambiguate a large proportion of query results of different persons that are barely distinguishable visually. To evaluate our framework, we also introduce a new large dataset of faces of French TV personalities organised in TV shows in order to capture the co-occurrence relations between people. On this dataset, our framework is able to improve the mean Average Precision over a set of internal queries from 67.93% (using only facial features extracted with a state-of-the-art pre-trained model) to 78.16% (using both facial features and faces co-occurrences), and from 67.88% to 77.36% over a set of external queries.

2020 International Joint Conference on Neural Networks (IJCNN), 2020
In this paper, we present an extensive study of different neural network-based approaches and los... more In this paper, we present an extensive study of different neural network-based approaches and loss functions applied to the Multiple Instance Learning (MIL) problem and binary classification. In the MIL setting, training is performed on small sets of instances called bags, where each positive bag contains at least one positive instance and each negative bag contains only negative instances. We propose a new loss function based on the generalised mean and an effective training strategy particularly suited to this setting and to problems where the instances of one class contain a considerable amount of label noise. Furthermore, we present a probabilistic approach to dynamically estimate the label noise in this unbalanced binary classification setting and utilise it to automatically modulate the hyper-parameter of our proposed loss function. We experimentally evaluated our approach on a number of standard benchmarks for binary classification and showed that it outperforms standard neural network optimisation algorithms as well as most state-of-theart MIL methods, both on numerical/categorical vector data with MLP architectures and images with Convolutional Neural Networks.

Document Analysis and Recognition – ICDAR 2021 Workshops, 2021
Like for many text understanding and generation tasks, pretrained languages models have emerged a... more Like for many text understanding and generation tasks, pretrained languages models have emerged as a powerful approach for extracting information from business documents. However, their performance has not been properly studied in data-constrained settings which are often encountered in industrial applications. In this paper, we show that LayoutLM, a pre-trained model recently proposed for encoding 2D documents, reveals a high sample-efficiency when fine-tuned on public and real-world Information Extraction (IE) datasets. Indeed, LayoutLM reaches more than 80% of its full performance with as few as 32 documents for fine-tuning. When compared with a strong baseline learning IE from scratch, the pre-trained model needs between 4 to 30 times fewer annotated documents in the toughest data conditions. Finally, LayoutLM performs better on the real-world dataset when having been beforehand fine-tuned on the full public dataset, thus indicating valuable knowledge transfer abilities. We therefore advocate the use of pre-trained language models for tackling practical extraction problems.

Journal of Signal Processing Systems, 2020
Over the past, deep neural networks have proved to be an essential element for developing intelli... more Over the past, deep neural networks have proved to be an essential element for developing intelligent solutions. They have achieved remarkable performances at a cost of deeper layers and millions of parameters. Therefore utilising these networks on limited resource platforms for smart cameras is a challenging task. In this context, models need to be (i) accelerated and (ii) memory efficient without significantly compromising on performance. Numerous works have been done to obtain smaller, faster and accurate models. This paper presents a survey of methods suitable for porting deep neural networks on resource-limited devices, especially for smart cameras. These methods can be roughly divided in two main sections. In the first part, we present compression techniques. These techniques are categorized into: knowledge distillation, pruning, quantization, hashing, reduction of numerical precision and binarization. In the second part, we focus on architecture optimization. We introduce the methods to enhance networks structures as well as neural architecture search techniques. In each of their parts, we describe different methods, and analyse them. Finally, we conclude this paper with a discussion on these methods.

Artificial Intelligence in Medicine, 2020
Recognition of Activities of Daily Living (ADL) is an essential component of assisted living syst... more Recognition of Activities of Daily Living (ADL) is an essential component of assisted living systems based on actigraphy. This task can nowadays be performed by machine learning models which are able to automatically extract and learn relevant features but, most of time, need to be trained with large amounts of data collected on several users. In this paper, we propose an approach to learn personalized ADL recognition models from few raw data based on a specific type of neural network called matching network. The interest of this few-shot learning approach is threefold. Firstly, people perform activities their own way and general models may average out important individual characteristics unlike personalized models that could thus achieve better performance. Secondly, gathering large quantities of annotated data from one user is time-consuming and threatens privacy in a medical context. Thirdly, matching networks are by nature weakly dependent on the classes they are trained on and can generalize easily to new activities without needing extra training, thus making them very versatile for real applications. Our results show the effectiveness of the proposed approach compared to general neural network models, even in situations with few training data.
International Journal of Modeling and Optimization, 2019

IEEE transactions on cybernetics, Jan 16, 2016
This paper presents a study of metric learning systems on pairwise identity verification, includi... more This paper presents a study of metric learning systems on pairwise identity verification, including pairwise face verification and pairwise speaker verification, respectively. These problems are challenging because the individuals in training and testing are mutually exclusive, and also due to the probable setting of limited training data. For such pairwise verification problems, we present a general framework of metric learning systems and employ the stochastic gradient descent algorithm as the optimization solution. We have studied both similarity metric learning and distance metric learning systems, of either a linear or shallow nonlinear model under both restricted and unrestricted training settings. Extensive experiments demonstrate that with limited training pairs, learning a linear system on similar pairs only is preferable due to its simplicity and superiority, i.e., it generally achieves competitive performance on both the labeled faces in the wild face dataset and the NIST...

2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2016
The separation between texts and graphics when they are overlapped is a challenging problem for d... more The separation between texts and graphics when they are overlapped is a challenging problem for digitization companies. In a previous work [1], we presented the first unsupervised fully automatic segmentation system adapted for colour business document with significant colour complexity and dithered background. The system achieves several operations to segment automatically colour images, separate text from noise and graphics and provides colour information about text colour. After split overlapped characters and separates characters from graphics, characters are broken. The OCR system becomes unable to recognize successfully broken characters and its efficiency is thus seriously affected. This paper presents the first Character Reconstruction System through a new PDE (Partial Differential Equation)-based approach. Our approach takes benefit of the combination of the anisotropic morphology proposed by Breuß and the Weickert Coherence enhancing shock filter diffusion. We introduce and present a continuous anisotropic morphology method driven by the main direction of the first order tensors applied in the neighborhood of the missing part left by the separation between text and graphics. It reconstructs the missing part even when the left area is larger than the strokes width. The coherency of the orientation of the tensors around missing parts overcomes the problem of image noises. The application of the ABBY FineReader OCR engine proves an important reduction in OCR errors. Our experiments show that our proposition compared to the existing state of the art requires no training steps and outperforms both of anisotropic morphology and the Weickert Coherence enhancing shock filter diffusion applied separately.
Artificial Neural Networks and Machine Learning – ICANN 2016, 2016
Our work focuses on metric learning between gesture sample signatures using Siamese Neural Networ... more Our work focuses on metric learning between gesture sample signatures using Siamese Neural Networks (SNN), which aims at modeling semantic relations between classes to extract discriminative features. Our contribution is the notion of polar sine which enables a redefinition of the angular problem. Our final proposal improves inertial gesture classification in two challenging test scenarios, with respective average classification rates of 0.934 ± 0.011 and 0.776 ± 0.025.
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015
This paper presents a new method for similarity metric learning, called Logistic Similarity Metri... more This paper presents a new method for similarity metric learning, called Logistic Similarity Metric Learning (LSML), where the cost is formulated as the logistic loss function, which gives a probability estimation of a pair of faces being similar. Especially, we propose to shift the similarity decision boundary gaining significant performance improvement. We test the proposed method on the face verification problem using four single face descriptors: LBP, OCLBP, SIFT and Gabor wavelets. Extensive experimental results on the LFWa data set demonstrate that the proposed method achieves competitive state-of-the-art performance on the problem of face verification.

2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2015
We propose an efficient linear similarity metric learning method for face verification called Tri... more We propose an efficient linear similarity metric learning method for face verification called Triangular Similarity Metric Learning (TSML). Compared with relevant state-ofthe-art work, this method improves the efficiency of learning the cosine similarity while keeping effectiveness. Concretely, we present a geometrical interpretation based on the triangle inequality for developing a cost function and its efficient gradient function. We formulate the cost function as an optimization problem and solve it with the advanced L-BFGS optimization algorithm. We perform extensive experiments on the LFW data set using four descriptors: LBP, OCLBP, SIFT and Gabor wavelets. Moreover, for the optimization problem, we test two kinds of initialization: the identity matrix and the WCCN matrix. Experimental results demonstrate that both of the two initializations are efficient and that our method achieves the state-of-the-art performance on the problem of face verification.

Proceedings of the 10th International Conference on Computer Vision Theory and Applications, 2015
This paper presents the first fully automatic color analysis system suited for business documents... more This paper presents the first fully automatic color analysis system suited for business documents. Our pixelbased approach uses mainly color morphology and does not require any training, manual assistance, prior knowledge or model. We developed a robust color segmentation system adapted for invoices and forms with significant color complexity and dithered background. The system achieves several operations to segment automatically color images, separate text from noise and graphics and provides color information about text color. The contribution of our work is Tree-fold. Firstly, it is the usage of color morphology to simultaneously segment both text and inverted text. Our system processes inverted and non-inverted text automatically using conditional color dilation and erosion, even in cases where there are overlaps between the two. Secondly, it is the extraction of geodesic measures using morphological convolution in order to separate text, noise and graphical elements. Thirdly, we develop a method to disconnect characters touching or overlapping graphical elements. Our system can separate characters that touch straight lines, split overlapped characters with different colors and separate characters from graphics if they have different colors. A color analysis stage automatically calculates the number of character colors. The proposed system is generic enough to process a wide range of images of digitized business documents from different origins. It outperforms the classical approach that uses binarization of greyscale images.

Modèles actifs d’apparences adaptés
Active Appearance Models (AAM) are able to align known faces in an efficient manner when face pos... more Active Appearance Models (AAM) are able to align known faces in an efficient manner when face pose and illumination are controlled. The AAM exploit a set of face examples in order to extract a statistical model. There is no difficulty to align a face with the same type (same morphology, illumination and pose) which constitute the example data set. Unfortunately, the AAM are less outstanding from the moment when the illumination, pose and face type changes. AAM robustness is link to the variability introduced in the learning base. The more the AAM will contain variability, the more it will be able to adapt itself to variable faces with the following drawback : the data represented in the reduced parameters space will then form different classes letting appear holes, regions without any data (see Fig. 1). It is therefore very difficult to make the AAM converge in this scattered space. We propose in this paper a robust Active Appearance Models allowing a real-time implementation. To increase the AAM robustness to illumination changes, we propose Oriented Map AAM (OM-AAM). Adapted AAM will be presented after to increase the AAM robustness to any other types of variability (in identity, pose, expression etc.)...
Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, 2006
The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the chall... more The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved.
Uploads
Papers by Christophe Garcia