Papers by Jorge Vaca Diez
El An�lisis De Preferencias: Un Nuevo Enfoque Para El Estudio De La Rentabilidad Empresarial

Journal of Machine Learning Research, 2009
Nondeterministic classifiers are defined as those allowed to predict more than one class for some... more Nondeterministic classifiers are defined as those allowed to predict more than one class for some entries from an input space. Given that the true class should be included in predictions and the number of classes predicted should be as small as possible, these kind of classifiers can be considered as Information Retrieval (IR) procedures. In this paper, we propose a family of IR loss functions to measure the performance of nondeterministic learners. After discussing such measures, we derive an algorithm for learning optimal nondeterministic hypotheses. Given an entry from the input space, the algorithm requires the posterior probabilities to compute the subset of classes with the lowest expected loss. From a general point of view, nondeterministic classifiers provide an improvement in the proportion of predictions that include the true class compared to their deterministic counterparts; the price to be paid for this increase is usually a tiny proportion of predictions with more than one class. The paper includes an extensive experimental study using three deterministic learners to estimate posterior probabilities: a multiclass Support Vector Machine (SVM), a Logistic Regression, and a Naïve Bayes. The data sets considered comprise both UCI multi-class learning tasks and microarray expressions of different kinds of cancer. We successfully compare nondeterministic classifiers with other alternative approaches. Additionally, we shall see how the quality of posterior probabilities (measured by the Brier score) determines the goodness of nondeterministic predictions.

Knowledge-Based Systems, 2015
Evaluating open-response assignments in Massive Open Online Courses is a difficult task because o... more Evaluating open-response assignments in Massive Open Online Courses is a difficult task because of the huge number of students involved. Peer grading is an effective method to address this problem. There are two basic approaches in the literature: cardinal and ordinal. The first case uses grades assigned by student-graders to a set of assignments of other colleagues. In the ordinal approach, the raw materials used by grading systems are the relative orders that graders appreciate in the assignments that they evaluate. In this paper we present a factorization method that seeks a trade-off between cardinal and ordinal approaches. The algorithm learns from preference judgments to avoid the subjectivity of the numeric grades. But in addition to preferences expressed by student-graders, we include other preferences: those induced from assignments with significantly different average grades. The paper includes a report of the results obtained using this approach in a real world dataset collected in 3 Universities of Spain, A Coruña, Pablo de Olavide at Sevilla, and Oviedo at Gijón. Additionally, we studied the sensitivity of the method with respect to the number of assignments graded by each student. Our method achieves similar or better scores than staff instructors when we measure the discrepancies with other instructor's grades.
Lecture Notes in Computer Science, 2002
In this paper we present an algorithm for learning a function able to assess objects. We assume t... more In this paper we present an algorithm for learning a function able to assess objects. We assume that our teachers can provide a collection of pairwise comparisons but encounter certain difficulties in assigning a number to the qualities of the objects considered. This is a typical situation when dealing with food products, where it is very interesting to have repeatable, reliable mechanisms that are as objective as possible to evaluate quality in order to provide markets with products of a uniform quality. The same problem arises when we are trying to learn user preferences in an information retrieval system or in configuring a complex device. The algorithm is implemented using a growing variant of Kohonen's Self-Organizing Maps (growing neural gas), and is tested with a variety of data sets to demonstrate the capabilities of our approach.
Peer assessment in MOOCs using preference learning via matrix factorization
ABSTRACT Evaluating in Massive Open Online Courses (MOOCs) is a difficult task because of the hug... more ABSTRACT Evaluating in Massive Open Online Courses (MOOCs) is a difficult task because of the huge number of students involved in the courses. Peer grading is an effective method to cope with this problem, but something must be done to lessen the effect of the subjective evaluation. In this paper we present a matrix factorization approach able to learn from the order of the subset of exams evaluated by each grader. We tested this method on a data set provided by a real peer review process. By using a tailored graphical representation, the induced model could also allow the detection of peculiarities in the peer review process.

Pattern Recognition, 2015
Real-world applications demand effective methods to estimate the class distribution of a sample. ... more Real-world applications demand effective methods to estimate the class distribution of a sample. In many domains, this is more productive than seeking individual predictions. At a first glance, the straightforward conclusion could be that this task, recently identified as quantification, is as simple as counting the predictions of a classifier. However, due to natural distribution changes occurring in real-world problems, this solution is unsatisfactory. Moreover, current quantification models based on classifiers present the drawback of being trained with loss functions aimed at classification rather than quantification. Other recent attempts to address this issue suffer certain limitations regarding reliability, measured in terms of classification abilities. This paper presents a learning method that optimizes an alternative metric that combines simultaneously quantification and classification performance. Our proposal offers a new framework that allows the construction of binary quantifiers that are able to accurately estimate the proportion of positives, based on models with reliable classification abilities.
IEEE Transactions on Neural Networks and Learning Systems, 2013
In many applications, the mistakes made by an automatic classifier are not equal, they have diffe... more In many applications, the mistakes made by an automatic classifier are not equal, they have different costs. These problems may be solved using a cost-sensitive learning approach. The main idea is not to minimize the number of errors, but the total cost produced by such mistakes. This paper presents a new multiclass costsensitive algorithm, in which each example has attached its corresponding misclassification cost. Our proposal is theoretically well-founded and is designed to optimize costsensitive loss functions. This research was motivated by a real-world problem, the biomass estimation of several plankton taxonomic groups. In this particular application, our method improves the performance of traditional multiclass classification approaches that optimize the accuracy.
Lecture Notes in Computer Science, 2009
In the search for functional relationships between genotypes and phenotypes, there are two possib... more In the search for functional relationships between genotypes and phenotypes, there are two possible findings. A phenotype may be heritable when it depends on a reduced set of genetic markers. Or it may be predictable from a wide genomic description. The distinction between these two kinds of functional relationships is very important since the computational tools used to find them are quite different. In this paper we present a general framework to deal with phenotypes and genotypes, and we study the case of the height of barley plants: a predictable phenotype whose heritability is quite reduced.

Progress in Artificial Intelligence, 2012
The goal of multilabel (ML) classification is to induce models able to tag objects with the label... more The goal of multilabel (ML) classification is to induce models able to tag objects with the labels that better describe them. The main baseline for ML classification is binary relevance (BR), which is commonly criticized in the literature because of its label independence assumption. Despite this fact, this paper discusses some interesting properties of BR, mainly that it produces optimal models for several ML loss functions. Additionally, we present an analytical study of ML benchmarks datasets and point out some shortcomings. As a result, this paper proposes the use of synthetic datasets to better analyze the behavior of ML methods in domains with different characteristics. To support this claim, we perform some experiments using synthetic data proving the competitive performance of BR with respect to a more complex method in difficult problems with many labels, a conclusion which was not stated by previous studies.
Lecture Notes in Computer Science, 2004
The quality of food can be assessed from different points of view. In this paper, we deal with th... more The quality of food can be assessed from different points of view. In this paper, we deal with those aspects that can be appreciated through sensory impressions. When we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers' ratings are just a way to express their preferences about the products presented in the same testing session. Therefore, we postulate to learn from consumers' preference judgments instead of using an approach based on regression. This requires the use of special purpose kernels and feature subset selection methods. We illustrate the benefits of our approach in two families of real-world data bases.
Lecture Notes in Computer Science, 2008
We present nondeterministic hypotheses learned from an ordinal regression task. They try to predi... more We present nondeterministic hypotheses learned from an ordinal regression task. They try to predict the true rank for an entry, but when the classification is uncertain the hypotheses predict a set of consecutive ranks (an interval). The aim is to keep the set of ranks as small as possible, while still containing the true rank. The justification for learning such a hypothesis is based on a real world problem arisen in breeding beef cattle. After defining a family of loss functions inspired in Information Retrieval, we derive an algorithm for minimizing them. The algorithm is based on posterior probabilities of ranks given an entry. A couple of implementations are compared: one based on a multiclass SVM and other based on Gaussian processes designed to minimize the linear loss in ordinal regression tasks.

Twenty-first international conference on Machine learning - ICML '04, 2004
In this paper we tackle a real world problem, the search of a function to evaluate the merits of ... more In this paper we tackle a real world problem, the search of a function to evaluate the merits of beef cattle as meat producers. The independent variables represent a set of live animals' measurements; while the outputs cannot be captured with a single number, since the available experts tend to assess each animal in a relative way, comparing animals with the other partners in the same batch. Therefore, this problem can not be solved by means of regression methods; our approach is to learn the preferences of the experts when they order small groups of animals. Thus, the problem can be reduced to a binary classification, and can be dealt with a Support Vector Machine (SVM) improved with the use of a feature subset selection (FSS) method. We develop a method based on Recursive Feature Elimination (RFE) that employs an adaptation of a metric based method devised for model selection (ADJ). Finally, we discuss the extension of the resulting method to more general settings, and provide a comparison with other possible alternatives.

Progress in Artificial Intelligence, 2014
Multilabel classification (ML) aims to assign a set of labels to an instance. This generalization... more Multilabel classification (ML) aims to assign a set of labels to an instance. This generalization of multiclass classification yields to the redefinition of loss functions and the learning tasks become harder. The objective of this paper is to gain insights into the relations of optimization aims and some of the most popular performance measures: subset (or 0/1), Hamming, and the example-based F-measure. To make a fair comparison, we implemented three ML learners for optimizing explicitly each one of these measures in a common framework. This can be done considering a subset of labels as a structured output. Then we use Structured output Support Vector Machines (SSVM) tailored to optimize a given loss function. The paper includes an exhaustive experimental comparison. The conclusion is that in most cases, the optimization of the Hamming loss produces the best or competitive scores. This is a practical result since the Hamming loss can be minimized using a bunch of binary classifiers, one for each label separately, and therefore it is a scalable and fast method to learn ML tasks. Additionally, we observe that in noise free learning tasks optimizing the subset loss is the best option, but the differences are very small. We have also noticed that the biggest room for improvement can be found when the goal is to optimize an F-measure in noisy learning tasks.
Machine Learning and Knowledge Discovery in Databases, 2009
From a multi-class learning task, in addition to a classifier, it is possible to infer some usefu... more From a multi-class learning task, in addition to a classifier, it is possible to infer some useful knowledge about the relationship between the classes involved. In this paper we propose a method to learn a hierarchical clustering of the set of classes. The usefulness of such clusterings has been exploited in bio-medical applications to find out relations between diseases or populations of animals. The method proposed here defines a distance between classes based on the margin maximization principle, and then builds the hierarchy using a linkage procedure. Moreover, to quantify the goodness of the hierarchies we define a measure. Finally, we present a set of experiments comparing the scores achieved by our approach with other methods.
Aplicación de un proceso de selección de reglas a un sistema de aprendizaje
INTELIGENCIA ARTIFICIAL, 2002
ABSTRACT Uno de los problemas más importantes en telemedicina es la derivación de forma automátic... more ABSTRACT Uno de los problemas más importantes en telemedicina es la derivación de forma automática de pacientes al especialista apropiado de acuerdo a su sintomatología. Esta asignación es normalmente realizada por un profesional en medicina no especializado que partiendo de un prediagnóstico, normalmente expresado en lenguaje natural, determina la especialidad más adecuada. El objetivo de este trabajo es desarrollar un clasificador estomatológico por aprendizaje, que categoriza dicho prediagnóstico en un conjunto de especialidades.
A support vector method for ranking minimizing the number of swapped pairs
ABSTRACT Learning tasks where the set Y of classes has an ordering relation arise in a number of ... more ABSTRACT Learning tasks where the set Y of classes has an ordering relation arise in a number of important application fields. In this context, the loss function may be defined in different ways, ranging from multiclass classification to ordinal or metric regression. However, to consider only the ordered structure of Y, a measure of goodness of a hypothesis h has to be related to the number of pairs whose relative ordering is swapped by h. In this paper, we present a method, based on the use of a multivariate version of Support Vector Machines (SVM) that learns to order minimizing the number of swapped pairs. Finally, using benchmark datasets, we compare the scores so achieved with those found by other alternative approaches.
Trends in Food Science & Technology, 2007
In this paper we discuss how to model preferences from a collection of ratings provided by a pane... more In this paper we discuss how to model preferences from a collection of ratings provided by a panel of consumers of some kind of food product. W e emphasize the role of tasting sessions, since the ratings tend to be relative to each session and hence regression methods are unable to capture consumer preferences. The method proposed is based on the use of Support V ector M achines (SV M) and provides both linear and nonlinear models. To illustrate the performance of the approach, we report the experimental results obtained with a couple of real world datasets.

Pattern Recognition, 2010
In hierarchical classification, classes are arranged in a hierarchy represented by a tree or a fo... more In hierarchical classification, classes are arranged in a hierarchy represented by a tree or a forest, and each example is labeled with a set of classes located on paths from roots to leaves or internal nodes. In other words, both multiple and partial paths are allowed. A straightforward approach to learn a hierarchical classifier, usually used as a baseline method, consists in learning one binary classifier for each node of the hierarchy; the hierarchical classifier is then obtained using a top-down evaluation procedure. The main drawback of this naïve approach is that these binary classifiers are constructed independently, when it is clear that there are dependencies between them that are motivated by the hierarchy and the evaluation procedure employed. In this paper, we present a new decomposition method in which each node classifier is built taking into account other classifiers, its descendants, and the loss function used to measure the goodness of hierarchical classifiers. Following a bottom-up learning strategy, the idea is to optimize the loss function at every subtree assuming that all classifiers are known except the one at the root. Experimental results show that the proposed approach has accuracies comparable to state-of-the-art hierarchical algorithms and is better than the naïve baseline method described above. Moreover, the benefits of our proposal include the possibility of parallel implementations, as well as the use of all available well-known techniques to tune binary classification SVMs.
Uploads
Papers by Jorge Vaca Diez