Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Jan 5, 2006
In this paper we address the issue of class-related reject thresholds for classification systems.... more In this paper we address the issue of class-related reject thresholds for classification systems. It has been demonstrated in the literature that class-related reject thresholds provide an errorreject trade-off better than a single global threshold. In this work we argue that the error-reject trade-off yielded by class-related reject thresholds can be further improved if a proper algorithm is used to find the thresholds. In light of this, we propose using a recently developed optimization algorithm called Particle Swarm Optimization. It has been proved to be very effective in solving real valued global optimization problems. In order to show the benefits of such an algorithm, we have applied it to optimize the thresholds of a cascading classifier system devoted to recognize handwritten digits.
In this paper we describe a synthetic database composed of 273,452 handwritten touching digits pa... more In this paper we describe a synthetic database composed of 273,452 handwritten touching digits pairs to assess segmentation algorithms. It contains several different kinds of touching and it was generated by connecting 2,000 images of isolated digits extracted from the NIST SD19. In order to get a better insight on the proposed database and establish some parameters for further comparisons, we carried out experiments using four state-of-the-art segmentation algorithms.
Most works based on diversity suggest that there exists only weak correlation between diversity a... more Most works based on diversity suggest that there exists only weak correlation between diversity and ensemble accuracy. We show that by combining the diversities with the classification accuracy of each individual classifier, we can achieve a strong correlation between the combined diversities and the ensemble accuracy in Random Subspaces.
Contribution of data complexity features on dynamic classifier selection
Different dynamic classifier selection techniques have been proposed in the literature to determi... more Different dynamic classifier selection techniques have been proposed in the literature to determine among diverse classifiers available in a pool which should be used to classify a test instance. The individual competence of each classifier in the pool is usually evaluated taking into account its accuracy on the neighborhood of the test instance in a validation dataset. In this work we investigate the possible contribution of considering during the classifier evaluation the use of features related to the problem complexity. Since usually the pool generation technique does not assure diversity, the idea is to consider diversity during the selection. Basically, we select a classifier trained in subset of data showing similar complexity than that observed in neighborhood of the test instance. We expect that this similarity in terms of complexity allow us to select a more competent classifier. Experiments on 30 classification problems representing different levels of difficulty have shown that the proposed selection method is comparable to well known dynamic selection strategies. When compared with other DS approaches it was able to win on 123 over 150 experiments. This promising results indicate that further investigation must be done to increase diversity in terms of data complexity during the process of pool generation.
An end-to-end solution for handwritten numeral string recognition is proposed, in which the numer... more An end-to-end solution for handwritten numeral string recognition is proposed, in which the numeral string is considered as composed of objects automatically detected and recognized by a YoLo-based model. The main contribution of this paper is to avoid heuristic-based methods for string preprocessing and segmentation, the need for task-oriented classifiers, and also the use of specific constraints related to the string length. A robust experimental protocol based on several numeral string datasets, including one composed of historical documents, has shown that the proposed method is a feasible end-to-end solution for numeral string recognition. Besides, it reduces the complexity of the string recognition task considerably since it drops out classical steps, in special preprocessing, segmentation, and a set of classifiers devoted to strings with a specific length.
This work compares different segmentation algorithms for handwritten digits based on explicit seg... more This work compares different segmentation algorithms for handwritten digits based on explicit segmentation. For this purpose, algorithms based on different concepts were implemented and evaluated under the same conditions. The algorithms were used to segment 2,369 pairs of touching digits of the NIST SD19 database and were evaluated in terms of correct segmentation and computational time. We also discuss the complementarity of the segmentation algorithms. We have observed that independently of the individual performance of the algorithms, each method is able to segment samples that can not be segmented by any other method. Based on this observation, we conclude that the combination of different segmentation algorithms may be an interesting strategy to improve the correct segmentation rate.
Over the last decades, most approaches proposed for handwritten digit string recognition (HDSR) h... more Over the last decades, most approaches proposed for handwritten digit string recognition (HDSR) have resorted to digit segmentation, which is dominated by heuristics, thereby imposing substantial constraints on the final performance. Few of them have been based on segmentation-free strategies where each pixel column has a potential cut location. Recently, segmentation-free strategies has added another perspective to the problem, leading to promising results. However, these strategies still show some limitations when dealing with a large number of touching digits. To bridge the resulting gap, in this paper, we hypothesize that a string of digits can be approached as a sequence of objects. We thus evaluate different end-to-end approaches to solve the HDSR problem, particularly in two verticals: those based on object-detection (e.g., Yolo and RetinaNet) and those based on sequence-to-sequence representation (CRNN). The main contribution of this work lies in its provision of a comprehensive comparison with a critical analysis of the above mentioned strategies on five benchmarks commonly used to assess HDSR, including the challenging Touching Pair dataset, NIST SD19, and two real-world datasets (CAR and CVL) proposed for the ICFHR 2014 competition on HDSR. Our results show that the Yolo model compares favorably against segmentation-free models with the advantage of having a shorter pipeline that minimizes the presence of heuristics-based models. It achieved a 97%, 96%, and 84% recognition rate on the NIST-SD19, CAR, and CVL datasets, respectively.
IEEE Transactions on Information Forensics and Security, 2021
Usually, in a real-world scenario, few signature samples are available to train an automatic sign... more Usually, in a real-world scenario, few signature samples are available to train an automatic signature verification system (ASVS). However, such systems do indeed need a lot of signatures to achieve an acceptable performance. Neuromotor signature duplication methods and feature space augmentation methods may be used to meet the need for an increase in the number of samples. Such techniques manually or empirically define a set of parameters to introduce a degree of writer variability. Therefore, in the present study, a method to automatically model the most common writer variability traits is proposed. The method is used to generate offline signatures in the image and the feature space and train an ASVS. We also introduce an alternative approach to evaluate the quality of samples considering their feature vectors. We evaluated the performance of an ASVS with the generated samples using three well-known offline signature datasets: GPDS, MCYT-75, and CEDAR. In GPDS-300, when the SVM classifier was trained using one genuine signature per writer and the duplicates generated in the image space, the Equal Error Rate (EER) decreased from 5.71% to 1.08%. Under the same conditions, the EER decreased to 1.04% using the feature space augmentation technique. We also verified that the model that generates duplicates in the image space reproduces the most common writer variability traits in the three different datasets.
Information from an image occurs over multiple and distinct spatial scales. Image pyramid multire... more Information from an image occurs over multiple and distinct spatial scales. Image pyramid multiresolution representations are a useful data structure for image analysis and manipulation over a spectrum of spatial scales. This paper employs the Gaussian–Laplacian pyramid to separately treat different spatial frequency bands of a texture. First, we generate three images corresponding to three levels of the Gaussian–Laplacian pyramid for an input image to capture intrinsic details. Then, we aggregate features extracted from gray and color texture images using bioinspired texture descriptors, information-theoretic measures, gray-level co-occurrence matrix feature descriptors, and Haralick statistical feature descriptors into a single feature vector. Such an aggregation aims at producing features that characterize textures to their maximum extent, unlike employing each descriptor separately, which may lose some relevant textural information and reduce the classification performance. The ...
2015 13th International Conference on Document Analysis and Recognition (ICDAR)
SignWriting is a writing system for sign languages. It is based on visual symbols to represent th... more SignWriting is a writing system for sign languages. It is based on visual symbols to represent the hand shapes, movements and facial expressions, among other elements. It has been adopted by more than 40 countries, but to ensure the social integration of the deaf community, writing systems based on sign languages should be properly incorporated into the Information Technology. This article reports our first efforts toward the implementation of an automatic reading system for SignWiring. This would allow converting the SignWriting script into text so that one can store, retrieve, and index information in an efficient way. In order to make this work possible, we have been collecting a database of hand configurations, which at the present moment sums up to 7,994 images divided into 103 classes of symbols. To classify such symbols, we have performed a comprehensive set of experiments using different features, classifiers, and combination strategies. The best result, 94.4% of recognition rate, was achieved by a Convolutional Neural Network.
2022 International Joint Conference on Neural Networks (IJCNN)
When using vision-based approaches to classify individual parking spaces between occupied and emp... more When using vision-based approaches to classify individual parking spaces between occupied and empty, human experts often need to annotate the locations and label a training set containing images collected in the target parking lot to finetune the system. We propose investigating three annotation types (polygons, bounding boxes, and fixed-size squares), providing different data representations of the parking spaces. The rationale is to elucidate the best trade-off between handcraft annotation precision and model performance. We also investigate the number of annotated parking spaces necessary to fine-tune a pre-trained model in the target parking lot. Experiments using the PKLot dataset show that it is possible to fine-tune a model to the target parking lot with less than 1,000 labeled samples, using low precision annotations such as fixed-size squares.
2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)
Recognition of forest species is a very challenging task thanks to the great intra-class variabil... more Recognition of forest species is a very challenging task thanks to the great intra-class variability. To cope with such a variability, we propose a multiple classifier system based on a two-level classification strategy and microscopic images. By using a divide-and-conquer approach, an image is first divided into several sub-images which are classified independently by each classifier. In a first fusion level, partial decisions for the sub-images are combined to generate a new partial decision for the original image. Then, the second fusion level combines all these new partial decisions to produce the final classification of the original image. To generate the pool of diverse classifiers, we used classical texture-based features as well as keypoint-based features. A series of experiments shows that the proposed strategy achieves compelling results. Compared to the best single classifier, a Support Vector Machine (SVM) trained with a keypointbased feature set, the divide-and-conquer strategy improves the recognition rate in about 4 and 6 percentage points in the first and second fusion levels, respectively. The best recognition rate achieved by this proposed method is 98.47%.
2020 International Joint Conference on Neural Networks (IJCNN), 2020
This work presents a sky and ground segmentation approach in digital images using the supervised ... more This work presents a sky and ground segmentation approach in digital images using the supervised Support Vector Machine (SVM) algorithm based on Whiteness and Blueness indexes, Local Binary Patterns, and Extended morphological profile features. The goal is to separate the image contents in two classes, the sky and the ground. The research is divided into two stages: first, the best features selected by monolithic classifiers using the cross-validation technique. The second stage based on combination of classifiers to segment sky and ground: in a first approach, the strategy consists in segmentation process without dividing the image databases into categories. The second approach carries out segmentation a pre-classification of databases into four categories, City, Highway/Road, Sea/ Harbor, and Nature/Mountain. We used two bases of 1200 images each, containing images with different sky and ground contexts. The first approach is generally adopted in the literature. The second approach, little cited in the literature, presents promising and distinct results for both image bases and allows us to see importance of dividing the images into categories, since there is alteration of the ground context, which leads to different results and greater hit rate.
Movie genre classification is a challenging task that has increasingly attracted the attention of... more Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. The number of movie consumers interested in taking advantage of automatic movie genre classification is growing rapidly thanks to the popularization of media streaming service providers. In this paper, we addressed the multi-label classification of the movie genres in a multimodal way. For this purpose, we created a dataset composed of trailer video clips, subtitles, synopses, and movie posters taken from 152,622 movie titles from The Movie Database (TMDb) 1. The dataset was carefully curated and organized, and it was also made available 2 as a contribution of this work. Each movie of the dataset was labeled according to a set of eighteen genre labels. We extracted features from these data using different kinds of descriptors, namely Mel Frequency Cepstral Coefficients (MFCCs), Statistical Spectrum Descriptor (SSD), Local Binary Pattern (LBP) with spectrograms, Long-Short Term Memory (LSTM), and Convolutional Neural Networks (CNN). The descriptors were evaluated using different classifiers, such as BinaryRelevance and ML-kNN. We have also investigated the performance of the combination of different classifiers/features using a late fusion strategy, which obtained encouraging results. Based on the F-Score metric, our best result, 0.628, was obtained by the fusion of a classifier created using LSTM on the synopses, and a classifier created using CNN on movie trailer frames. When considering the AUC-PR metric, the best result, 0.673, was also achieved by combining those representations, but in addition, a classifier based on LSTM created from the subtitles was used. These results corroborate the existence of complementarity among classifiers based on different sources of information in this field of application. As far as we know, this is the most comprehensive study developed in terms of the diversity of multimedia sources of information to perform movie genre classification.
2020 25th International Conference on Pattern Recognition (ICPR), 2021
This paper describes a classifier pool generation method guided by the diversity estimated on the... more This paper describes a classifier pool generation method guided by the diversity estimated on the data complexity and classifier decisions. First, the behavior of complexity measures is assessed by considering several subsamples of the dataset. The complexity measures with high variability across the subsamples are selected for posterior pool adaptation, where an evolutionary algorithm optimizes diversity in both complexity and decision spaces. A robust experimental protocol with 28 datasets and 20 replications is used to evaluate the proposed method. Results show significant accuracy improvements in 69.4% of the experiments when Dynamic Classifier Selection and Dynamic Ensemble Selection methods are applied.
2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 2020
Face recognition has been one of the most relevant and explored fields of Biometrics. In real-wor... more Face recognition has been one of the most relevant and explored fields of Biometrics. In real-world applications, face recognition methods usually must deal with scenarios where not all probe individuals were seen during the training phase (openset scenarios). Therefore, open-set face recognition is a subject of increasing interest as it deals with identifying individuals in a space where not all faces are known in advance. This is useful in several applications, such as access authentication, on which only a few individuals that have been previously enrolled in a gallery are allowed. The present work introduces a novel approach towards open-set face recognition focusing on small galleries and in enrollment detection, not identity retrieval. A Siamese Network architecture is proposed to learn a model to detect if a face probe is enrolled in the gallery based on a verification-like approach. Promising results were achieved for small galleries on experiments carried out on Pubfig83, FRGCv1 and LFW datasets. State-ofthe-art methods like HFCN and HPLS were outperformed on FRGCv1. Besides, a new evaluation protocol is introduced for experiments in small galleries on LFW.
IEEE Transactions on Information Forensics and Security, 2021
Usually, in a real-world scenario, few signature samples are available to train an automatic sign... more Usually, in a real-world scenario, few signature samples are available to train an automatic signature verification system (ASVS). However, such systems do indeed need a lot of signatures to achieve an acceptable performance. Neuromotor signature duplication methods and feature space augmentation methods may be used to meet the need for an increase in the number of samples. Such techniques manually or empirically define a set of parameters to introduce a degree of writer variability. Therefore, in the present study, a method to automatically model the most common writer variability traits is proposed. The method is used to generate offline signatures in the image and the feature space and train an ASVS. We also introduce an alternative approach to evaluate the quality of samples considering their feature vectors. We evaluated the performance of an ASVS with the generated samples using three well-known offline signature datasets: GPDS, MCYT-75, and CEDAR. In GPDS-300, when the SVM classifier was trained using one genuine signature per writer and the duplicates generated in the image space, the Equal Error Rate (EER) decreased from 5.71% to 1.08%. Under the same conditions, the EER decreased to 1.04% using the feature space augmentation technique. We also verified that the model that generates duplicates in the image space reproduces the most common writer variability traits in the three different datasets.
Convolutional neural networks (CNNs) have previously been broadly utilized to binarize document i... more Convolutional neural networks (CNNs) have previously been broadly utilized to binarize document images. These methods have problems when faced with degraded historical documents. This paper proposes the utilization of CNNs to identify foreground pixels using novel input-generated multichannel images. To create the images, the original source image is decomposed into wavelet subbands. Then, the original image is approximated by each subband separately, and finally, the multichannel image is constituted by arranging the original source image (grayscale image) as the first channel and the approximated image by each subband as the remaining channels. To achieve the best results, two scenarios are considered, that is, two-channel and four-channel images, and then fed into two types of CNN architectures, namely, single and multiple streams. To investigate the effect of the multichannel images proposed as network inputs, the CNNs used in the architectures are three popular networks, namely, U-net, SegNet, and DeepLabv3+. The experimental results of the scenarios demonstrate that our method is more successful than the three CNNs when trained by the original source images and proves competitive performance in comparison with state-of-the-art results using the DIBCO database.
Uploads
Papers by Alceu Britto