Papers by Hamid R. Rabiee

This paper presents a structured dictionary-based model for hyperspectral data that incorporates ... more This paper presents a structured dictionary-based model for hyperspectral data that incorporates both spectral and contextual characteristics of spectral samples. The idea is to partition the pixels of a hyperspectral image into a number of spatial neighborhoods called contextual groups and to model the pixels inside a group as members of a common subspace. That is, each pixel is represented using a linear combination of a few dictionary elements learned from the data, but since pixels inside a contextual group are often made up of the same materials, their linear combinations are constrained to use common elements from the dictionary. To this end, dictionary learning is carried out with a joint sparse regularizer to induce a common sparsity pattern in the sparse coefficients of a contextual group. The sparse coefficients are then used for classification using a linear SVM. Experimental results on a number of real hyperspectral images confirm the effectiveness of the proposed representation for hyperspectral image classification. Moreover, experiments with simulated multispectral data show that the proposed model is capable of finding representations that may effectively be used for classification of multispectral-resolution samples.
In this paper, a novel discriminative dictionary learning ap- proach is proposed that attempts to... more In this paper, a novel discriminative dictionary learning ap- proach is proposed that attempts to preserve the local structure of the data while encouraging discriminability. The recon- struction error and sparsity inducing l1-penalty of dictionary learning are minimized alongside a locality preserving and discriminative term. In this setting, each data point is rep- resented by a sparse linear combination of dictionary atoms with the goal that its k-nearest same-label neighbors are pre- served. Since the class of a new data point is unknown, its sparse representation is found once for each class. The class that produces the lowest error is associated with that point. Experimental results on five common classification datasets, show that this method outperforms state-of-the-art classifiers, especially when the training data is limited.

A unified statistical framework for crowd labeling
Recently, there has been a burst in the number of research projects on human computation via crow... more Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple-choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an application, crowd labeling is applied to find true labels for large machine learning datasets. Since crowds are not necessarily experts, the labels they provide are rather noisy and erroneous. This challenge is usually resolved by collecting multiple labels for each sample and then aggregating them to estimate the true label. Although the mechanism leads to high-quality labels, it is not actually cost-effective. As a result, efforts are currently made to maximize the accuracy in estimating true labels, while fixing the number of acquired labels.
This paper surveys methods to aggregate redundant crowd labels in order to estimate unknown true labels. It presents a unified statistical latent model where the differences among popular methods in the field correspond to different choices for the parameters of the model. Afterward, algorithms to make inference on these models will be surveyed. Moreover, adap- tive methods which iteratively collect labels based on the previously collected labels and estimated models will be discussed. In addition, this paper compares the distinguished meth- ods and provides guidelines for future work required to address the current open issues.

Classifying streams of data, for instance financial transactions or emails, is an essential eleme... more Classifying streams of data, for instance financial transactions or emails, is an essential element in applications such as online advertising and spam or fraud detection. The data stream is often large or even unbounded; furthermore, the stream is in many instances nonstationary. Therefore, an adaptive approach is required that can manage concept drift in an online fashion. This paper presents a probabilistic non-parametric generative model for stream classification that can handle concept drift efficiently and adjust its complexity over time. Unlike recent methods, the proposed model handles concept drift by adapting data-concept association without unnecessary i.i.d. assumption among the data of a batch. This allows the model to efficiently classify data using fewer and simpler base classifiers. Moreover, an online algorithm for making inference on the proposed non-conjugate time-dependent nonparametric model is proposed. Extensive experimental results on several stream datasets demonstrate the effectiveness of the proposed model.
Uploads
Papers by Hamid R. Rabiee
This paper surveys methods to aggregate redundant crowd labels in order to estimate unknown true labels. It presents a unified statistical latent model where the differences among popular methods in the field correspond to different choices for the parameters of the model. Afterward, algorithms to make inference on these models will be surveyed. Moreover, adap- tive methods which iteratively collect labels based on the previously collected labels and estimated models will be discussed. In addition, this paper compares the distinguished meth- ods and provides guidelines for future work required to address the current open issues.