In this work, we present an ensemble of descriptors for the classification of virus images acquir... more In this work, we present an ensemble of descriptors for the classification of virus images acquired using transmission electron microscopy. We trained multiple support vector machines on different sets of features extracted from the data. We used both handcrafted algorithms and a pretrained deep neural network as feature extractors. The proposed fusion strongly boosts the performance obtained by each stand-alone approach, obtaining state of the art performance.
In this paper, we present a study about an automated system for monitoring underwater ecosystems.... more In this paper, we present a study about an automated system for monitoring underwater ecosystems. The system here proposed is based on the fusion of different deep learning methods. We study how to create an ensemble based of different Convolutional Neural Network (CNN) models, fine-tuned on several datasets with the aim of exploiting their diversity. The aim of our study is to experiment the possibility of fine-tuning CNNs for underwater imagery analysis, the opportunity of using different datasets for pre-training models, the possibility to design an ensemble using the same architecture with small variations in the training procedure. Our experiments, performed on 5 well-known datasets (3 plankton and 2 coral datasets) show that the combination of such different CNN models in a heterogeneous ensemble grants a substantial performance improvement with respect to other state-of-the-art approaches in all the tested problems. One of the main contributions of this work is a wide experimental evaluation of famous CNN architectures to report the performance of both the single CNN and the ensemble of CNNs in different problems. Moreover, we show how to create an ensemble which improves the performance of the best single model. The MATLAB source code is freely link provided in title page.
One of the first methods for analyzing the texture of an image was proposed in 1979 by Haralick, ... more One of the first methods for analyzing the texture of an image was proposed in 1979 by Haralick, who introduced the co-occurrence matrix for calculating a set of image statistics. In this paper we focus on novel texture descriptors extracted from the co-occurrence matrix. It is well known that scale is important information in texture analysis, since the same texture can be perceived as different patterns at distinct scales. In this work we present, compare and combine different strategies for extending the texture descriptors extracted from the co-occurrence matrix at multiple scales. The texture descriptors are used to train a support vector machine and some different fusion techniques are compared. Our results are validated using seven image classification problems (mainly medical image classification problems). Our results shown that we improve the performance of the standard approaches. The code for the approaches tested in this paper is available at: .
Research in sound classification and recognition is rapidly advancing in the field of pattern rec... more Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. In this paper, ensembles of classifiers that exploit several data augmentation techniques and four signal representations for training Convolutional Neural Networks (CNNs) for audio classification are presented and tested on three freely available audio benchmark datasets: i) bird calls, ii) cat sounds, and iii) the Environmental Sound Classification (ESC-50). The best performing ensembles combining data augmentation techniques with different signal representations are compared and shown to either outperform or perform comparatively to the best methods reported in the literature on these datasets, including the challenging ESC-50 dataset. To the best of our knowledge, this is the most extensive study investigating ensembles of CNNs for audio classification. Results demonstrate not only that CNNs can be trained for audio classification but also that their fusion using different techniques works better than the stand-alone classifiers.
Convolutional Neural Networks (CNNs) are used in many domains but the requirement of large datase... more Convolutional Neural Networks (CNNs) are used in many domains but the requirement of large datasets for robust training sessions and no overfitting makes them hard to apply in medical fields and similar fields. However, when large quantities of samples cannot be easily collected, various methods can still be applied to stem the problem depending on the sample type. Data augmentation, rather than other methods, has recently been under the spotlight mostly because of the simplicity and effectiveness of some of the more adopted methods. The research question addressed in this work is whether data augmentation techniques can help in developing robust and efficient machine learning systems to be used in different domains for classification purposes. To do that, we introduce new image augmentation techniques that make use of different methods like Fourier Transform (FT), Discrete Cosine Transform (DCT), Radon Transform (RT), Hilbert Transform (HT), Singular Value Decomposition (SVD), Local Laplacian Filters (LLF) and Hampel filter (HF). We define different ensemble methods by combining various classical data augmentation methods with the newer ones presented here. We performed an extensive empirical evaluation on 15 different datasets to validate our proposal. The obtained results show that the newly proposed data augmentation methods can be very effective even when used alone. The ensembles trained with different augmentations methods can outperform some of the best approaches reported in the literature as well as compete with state-of-the-art custom methods. All resources are available at .
Semantic segmentation consists in classifying each pixel of an image by assigning it to a specifi... more Semantic segmentation consists in classifying each pixel of an image by assigning it to a specific label chosen from a set of all the available ones. During the last few years, a lot of attention shifted to this kind of task. Many computer vision researchers tried to apply autoencoder structures to develop models that can learn the semantics of the image as well as a lowlevel representation of it. In an autoencoder architecture, given an input, an encoder computes a low dimensional representation of the input that is then used by a decoder to reconstruct the original data. In this work, we propose an ensemble of convolutional neural networks (CNNs). In ensemble methods, many different models are trained and then used for classification, the ensemble aggregates the outputs of the single classifiers. The approach leverages on differences of various classifiers to improve the performance of the whole system. Diversity among the single classifiers is enforced by using different loss functions. In particular, we present a new loss function that results from the combination of Dice and Structural Similarity Index. The proposed ensemble is implemented by combining different backbone networks using the DeepLabV3+ and HarDNet environment. The proposal is evaluated through an extensive empirical evaluation on two real-world scenarios: polyp and skin segmentation. All the code is available online at .
Skin detection, the process of distinguishing between skin and non-skin regions in a digital imag... more Skin detection, the process of distinguishing between skin and non-skin regions in a digital image, is widely used in a variety of applications ranging from hand gesture analysis to body part tracking to facial recognition. Skin detection is a challenging problem that has received a lot of attention from experts and proposals from the research community in the context of intelligent systems, but the lack of common benchmarks and unified testing protocols has hampered fairness among approaches. Comparisons are very difficult. Recently, the success of deep neural networks has had a major impact on the field of image segmentation detection, resulting in various successful models to date. In this work, we survey the most recent research in this field and propose fair comparisons between approaches using several different datasets. The main contributions of this work are: (i) a comprehensive literature review of approaches to skin color detection and a comparison of approaches that may h...
Purpose Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic techn... more Purpose Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks. Design/methodology/approach Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most...
Skin detectors play a crucial role in many applications: face localization, person tracking, obje... more Skin detectors play a crucial role in many applications: face localization, person tracking, objectionable content screening, etc. Skin detection is a complicated process that involves not only the development of apposite classifiers but also many ancillary methods, including techniques for data preprocessing and postprocessing. In this paper, a new postprocessing method is described that learns to select whether an image needs the application of various morphological sequences or a homogeneity function. The type of postprocessing method selected is learned based on categorizing the image into one of eleven predetermined classes. The novel postprocessing method presented here is evaluated on ten datasets recommended for fair comparisons that represent many skin detection applications. The results show that the new approach enhances the performance of the base classifiers and previous works based only on learning the most appropriate morphological sequences.
The main goal of this chapter is to develop a system for automatic protein classification. Protei... more The main goal of this chapter is to develop a system for automatic protein classification. Proteins are classified using CNNs trained on ImageNet, which are tuned using a set of multiview 2D images of 3D protein structures generated by Jmol, which is a 3D molecular graphics program. Jmol generates different types of protein visualizations that emphasize specific properties of a protein's structure, such as a visualization that displays the backbone structure of the protein as a trace of the C α atom. Different multiview protein visualizations are generated by uniformly rotating the protein structure around its central X, Y, and Z viewing axes to produce 125 images for each protein. This set of images is then used to fine-tune the pretrained CNNs. The proposed system is tested on two datasets with excellent results. The MATLAB code used in this chapter is available at .
Bioimage classification plays a crucial role in many biological problems. Here we present a new G... more Bioimage classification plays a crucial role in many biological problems. Here we present a new General Purpose (GenP) ensemble that boosts performance by combining local features, dense sampling features, and deep learning approaches. We propose an ensemble of deep learning methods built using different criteria (different batch sizes, learning rates, topologies, and data augmentation methods). One of the contributions of this paper is the proposal of new methods of data augmentation based on feature transforms (principal component analysis/discrete cosine transform) that boost performance of Convolutional Neural Networks (CNNs). Each handcrafted descriptor is used to train a different Support Vector Machine (SVM), and the different SVMs are combined with the ensemble of CNNs. Our method is evaluated on a diverse set of bioimage classification problems. Results demonstrate that the proposed GenP bioimage ensemble obtains state-of-the-art performance without any ad-hoc dataset tunin...
Traditionally, classifiers are trained to predict patterns within a feature space. The image clas... more Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-per...
In this work, we combine a Siamese neural network and different clustering techniques to generate... more In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i) birds and (ii) cat sounds, which are freely available. We exploit different clustering methods to reduce the spectrograms in the dataset to a number of centroids that are used to generate the dissimilarity space through the Siamese network. Once computed, we use the dissimilarity space to generate a vector space representation of each pattern, which is then fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Our study shows that the proposed approach based on dissimilarity space performs well on both classification problems without ad-hoc optimization of the clustering methods. Moreover, results show that the fusion of CNN-based approaches applied to the animal audio classification problem works better than ...
Features play a crucial role in computer vision. Initially designed to detect salient elements by... more Features play a crucial role in computer vision. Initially designed to detect salient elements by means of handcrafted algorithms, features now are often learned using different layers in convolutional neural networks (CNNs). This paper develops a generic computer vision system based on features extracted from trained CNNs. Multiple learned features are combined into a single structure to work on different image classification tasks. The proposed system was derived by testing several approaches for extracting features from the inner layers of CNNs and using them as inputs to support vector machines that are then combined by sum rule. Several dimensionality reduction techniques were tested for reducing the high dimensionality of the inner layers so that they can work with SVMs. The empirically derived generic vision system based on applying a discrete cosine transform (DCT) separately to each channel is shown to significantly boost the performance of standard CNNs across a large and ...
In this work, we present an ensemble of descriptors for the classification of virus images acquir... more In this work, we present an ensemble of descriptors for the classification of virus images acquired using transmission electron microscopy. We trained multiple support vector machines on different sets of features extracted from the data. We used both handcrafted algorithms and a pretrained deep neural network as feature extractors. The proposed fusion strongly boosts the performance obtained by each stand-alone approach, obtaining state of the art performance.
EURASIP Journal on Audio, Speech, and Music Processing, 2020
In this work, we present an ensemble for automated audio classification that fuses different type... more In this work, we present an ensemble for automated audio classification that fuses different types of features extracted from audio files. These features are evaluated, compared, and fused with the goal of producing better classification accuracy than other state-of-the-art approaches without ad hoc parameter optimization. We present an ensemble of classifiers that performs competitively on different types of animal audio datasets using the same set of classifiers and parameter settings. To produce this general-purpose ensemble, we ran a large number of experiments that fine-tuned pretrained convolutional neural networks (CNNs) for different audio classification tasks (bird, bat, and whale audio datasets). Six different CNNs were tested, compared, and combined. Moreover, a further CNN, trained from scratch, was tested and combined with the fine-tuned CNNs. To the best of our knowledge, this is the largest study on CNNs in animal audio classification. Our results show that several CN...
Skin detection is the process of discriminating skin and non-skin regions in a digital image and ... more Skin detection is the process of discriminating skin and non-skin regions in a digital image and it is widely used in several applications ranging from hand gesture analysis to tracking body parts and face detection. Skin detection is a challenging problem which has drawn extensive attention from the research community, nevertheless a fair comparison among approaches is very difficult due to the lack of a common benchmark and a unified testing protocol. In this work, we investigate the most recent research in this field and we propose a fair comparison among approaches using several different datasets. The major contributions of this work is a framework to evaluate and combine different skin detector approaches, whose source code will be made freely available for future research, and an extensive experimental comparison among several recent methods which have also been used to define an ensemble that works well in many different problems. Experiments are carried out in 10 different datasets including more than 10000 labelled images: experimental results confirm that the ensemble here proposed obtains a very good performance with respect to other stand-alone approaches, without requiring ad hoc parameter tuning. A MATLAB version of the framework for testing and ensemble proposed in this paper will be freely available from ( + Pattern Recognition and Ensemble Classifiers).
Journal of Artificial Intelligence and Systems, 2020
The last decade has witnessed an unprecedented accumulation of proteins in large online databases... more The last decade has witnessed an unprecedented accumulation of proteins in large online databases which has led to the need for automatic prediction of protein function essential for massive and timely annotations of the proteins in these datasets. Protein databases, combined with functional annotations and machine learning (ML) techniques, offer many potential benefits, including significantly facilitating rapid pharmacological target identification. The main objective of this study is to identify, for the problem of enzyme classification, the most powerful combinations of descriptors taken from different protein representations. To achieve this objective, four approaches for representing the Position-Specific Scoring Matrix (PSSM) combined with three methods for representing the Amino Acid Sequence (AAS) are evaluated with the aim of experimentally producing a powerful ensemble of descriptors for enzyme function prediction. Each protein descriptor is classified by a Support Vector Machine (SVM), with the set of SVMs finally combined by sum rule. Cross-validation experiments using these descriptors on single-functional enzymes (n=44,661) extracted from the PDB database demonstrate that the ensemble proposed here achieves superior classification rates compared to state-of-the-art ML techniques reported in the literature on the same dataset. Although the proposed ensemble strongly outperforms these other techniques, it is computationally much heavier, mainly because the PSSM extraction process is time consuming. However, there is a growing repository of proteins where PSSM has already been extracted, making the proposed method more practical and attractive. The MATLAB code and the dataset used in the experiments reported here are available at .
Uploads
Papers by Loris Nanni