Dynamic Determinantal Point Processes

Tomoyuki Shirai

Outline

Title

Dynamic Determinantal Point Processes

Tomoyuki Shirai

Proceedings of the AAAI Conference on Artificial Intelligence

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

The determinantal point process (DPP) has been receiving increasing attention in machine learning as a generative model of subsets consisting of relevant and diverse items. Recently, there has been a significant progress in developing efficient algorithms for learning the kernel matrix that characterizes a DPP. Here, we propose a dynamic DPP, which is a DPP whose kernel can change over time, and develop efficient learning algorithms for the dynamic DPP. In the dynamic DPP, the kernel depends on the subsets selected in the past, but we assume a particular structure in the dependency to allow efficient learning. We also assume that the kernel has a low rank and exploit a recently proposed learning algorithm for the DPP with low-rank factorization, but also show that its bottleneck computation can be reduced from O(M2 K) time to O(M K2) time, where M is the number of items under consideration, and K is the rank of the kernel, which can be set smaller than M by orders of magnitude.

Mithun Gupta

2020

We present a determinantal point process (DPP) inspired alternative to non-maximum suppression (NMS) which has become an integral step in all state-of-the-art object detection frameworks. DPPs have been shown to encourage diversity in subset selection problems. We pose NMS as a subset selection problem and posit that directly incorporating DPP like framework can improve the overall performance of the object detection system. We propose an optimization problem which takes the same inputs as NMS, but introduces a novel sub-modularity based diverse subset selection functional. Our results strongly indicate that the modifications proposed in this paper can provide consistent improvements to state-of-the-art object detection pipelines.

downloadDownload free PDF View PDFchevron_right

Random Search for Hyperparameters using Determinantal Point Processes

Noah Smith

ArXiv, 2017

We propose the use of k-determinantal point processes in hyperparameter optimization via random search. Compared to conventional approaches where hyperparameter settings are sampled independently, a k-DPP promotes diversity. We describe an approach that transforms hyperparameter search spaces for efficient use with a k-DPP. Our experiments show significant benefits over uniform random search in realistic scenarios with a limited budget for training supervised learners, whether in serial or parallel.

downloadDownload free PDF View PDFchevron_right

Maximum Relevance Minimum Redundancy Dropout with Informative Kernel Determinantal Point Process

Ana F. Sequeira

Sensors

In recent years, deep neural networks have shown significant progress in computer vision due to their large generalization capacity; however, the overfitting problem ubiquitously threatens the learning process of these highly nonlinear architectures. Dropout is a recent solution to mitigate overfitting that has witnessed significant success in various classification applications. Recently, many efforts have been made to improve the Standard dropout using an unsupervised merit-based semantic selection of neurons in the latent space. However, these studies do not consider the task-relevant information quality and quantity and the diversity of the latent kernels. To solve the challenge of dropping less informative neurons in deep learning, we propose an efficient end-to-end dropout algorithm that selects the most informative neurons with the highest correlation with the target output considering the sparsity in its selection procedure. First, to promote activation diversity, we devise ...

downloadDownload free PDF View PDFchevron_right

Dynamic Bayesian Probabilistic Matrix Factorization

Sotirios Chatzis

Collaborative filtering algorithms generally rely on the assumption that user preference patterns remain stationary. However, real-world relational data are seldom stationary. User preference patterns may change over time, giving rise to the requirement of designing collaborative filtering systems capable of detecting and adapting to preference pattern shifts. Motivated by this observation, in this paper we propose a dynamic Bayesian probabilistic matrix factorization model, designed for modeling time-varying distributions. Formulation of our model is based on imposition of a dynamic hierarchical Dirichlet process (dHDP) prior over the space of probabilistic matrix factorization models to capture the time-evolving statistical properties of modeled sequential relational datasets. We develop a simple Markov Chain Monte Carlo sampler to perform inference. We present experimental results to demonstrate the superiority of our temporal model.

downloadDownload free PDF View PDFchevron_right

Advances in the theory of determinantal point processes

Justin Rising

2013

The theory of determinantal point processes has its roots in work in mathematical physics in the 1960s, but it is only in recent years that it has been developed beyond several specific examples. While there is a rich probabilistic theory, there are still many open questions in this area, and its applications to statistics and machine learning are still largely unexplored.

downloadDownload free PDF View PDFchevron_right

Determinantal Point Processes

Giovanni Torrisi

Bocconi & Springer Series, 2016

In this survey we review two topics concerning determinantal (or fermion) point processes. First, we provide the construction of diffusion processes on the space of configurations whose invariant measure is the law of a determinantal point process. Second, we present some algorithms to sample from the law of a determinantal point process on a finite window. Related open problems are listed.

downloadDownload free PDF View PDFchevron_right

On a few statistical applications of determinantal point processes

Xavier Mary

ESAIM: Proceedings and Surveys, 2017

Determinantal point processes (DPPs) are a repulsive distribution over configurations of points. The 2016 conference Journées Modélisation Aléatoire et Statistique (MAS) of the French society for applied and industrial mathematics (SMAI) featured a session on statistical applications of DPPs. This paper gathers contributions by the speakers and the organizer of the session. Résumé. Les processus ponctuels déterminantaux (DPP) sont des distributions répulsives sur des configurations de points. Au cours des journées Modélisation Aléatoire et Statistique (MAS) 2016 de la Société française de Mathématiques Appliquées et Industrielles (SMAI), nous avons organisé une session sur les applications statistiques des DPP. Cet article rassemble des contributions des orateurs et de l'organisateur.

downloadDownload free PDF View PDFchevron_right

Batched Gaussian Process Bandit Optimization via Determinantal Point Processes

Amit Deshpande

2016

Gaussian Process bandit optimization has emerged as a powerful tool for optimizing noisy black box functions. One example in machine learning is hyper-parameter optimization where each evaluation of the target function may require training a model which may involve days or even weeks of computation. Most methods for this so-called "Bayesian optimization" only allow sequential exploration of the parameter space. However, it is often desirable to propose batches or sets of parameter values to explore simultaneously, especially when there are large parallel processing facilities at our disposal. Batch methods require modeling the interaction between the different evaluations in the batch, which can be expensive in complex scenarios. In this paper, we propose a new approach for parallelizing Bayesian optimization by modeling the diversity of a batch via Determinantal point processes (DPPs) whose kernels are learned automatically. This allows us to generalize a previous result ...

downloadDownload free PDF View PDFchevron_right

Determinantal Point Process Mixtures Via Spectral Density Approach

Alessandra Guglielmi

Bayesian Analysis, 2019

We consider mixture models where location parameters are a priori encouraged to be well separated. We explore a class of determinantal point process (DPP) mixture models, which provide the desired notion of separation or repulsion. Instead of using the rather restrictive case where analytical results are partially available, we adopt a spectral representation from which approximations to the DPP density functions can be readily computed. For the sake of concreteness the presentation focuses on a power exponential spectral density, but the proposed approach is in fact quite general. We later extend our model to incorporate covariate information in the likelihood and also in the assignment to mixture components, yielding a trade-off between repulsiveness of locations in the mixtures and attraction among subjects with similar covariates. We develop full Bayesian inference, and explore model properties and posterior behavior using several simulation scenarios and data illustrations. Supplementary materials for this article are available online (Bianchini et al., 2019).

downloadDownload free PDF View PDFchevron_right

Rank-Based Mixture Models for Temporal Point Processes

Yijia Ma

2022

Temporal point process, an important area in stochastic process, has been extensively studied in both theory and applications. The classical theory on point process focuses on time-based framework, where a conditional intensity function at each given time can fully describe the process. However, such a framework cannot directly capture important overall features/patterns in the process, for example, characterizing a center-outward rank or identifying outliers in a given sample. In this article, we propose a new, data-driven model for regular point process. Our study provides a probabilistic model using two factors: (1) the number of events in the process, and (2) the conditional distribution of these events given the number. The second factor is the key challenge. Based on the equivalent inter-event representation, we propose two frameworks on the inter-event times (IETs) to capture large variability in a given process—One is to model the IETs directly by a Dirichlet mixture, and th...

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

Mike Gartrell, Noam Koenigstein

Determinantal point processes (DPPs) are an emerging model for encoding probabilities over subsets, such as shopping baskets, selected from a ground set, such as an item catalog. They have recently proved to be appealing models for a number of machine learning tasks, including product recommendation. DPPs are parametrized by a positive semi-definite kernel matrix. Prior work has shown that using a low-rank factorization of this kernel provides scalability improvements that open the door to training on large-scale datasets and computing online recommendations, both of which are infeasible with standard DPP models that use a full-rank kernel. A low-rank DPP model can be trained using an optimization-based method, such as stochastic gradient ascent, to find a point estimate of the kernel parameters, which can be performed efficiently on large-scale datasets. However, this approach requires careful tuning of regularization parameters to prevent overfitting and provide good predictive performance, which can be computationally expensive. In this paper we present a Bayesian method for learning a low-rank factorization of this kernel, which provides automatic control of regularization. We show that our Bayesian low-rank DPP model can be trained efficiently using stochastic gradient Hamiltonian Monte Carlo (SGHMC). Our Bayesian model generally provides better predictive performance on several real-world product recommendation datasets than optimization-based low-rank DPP models trained using stochastic gradient ascent, and better performance than several state-of-the art recommendation methods in many cases.

downloadDownload free PDF View PDFchevron_right

On Sampling and Greedy MAP Inference of Constrained Determinantal Point Processes

Amit Deshpande

ArXiv, 2016

Subset selection problems ask for a small, diverse yet representative subset of the given data. When pairwise similarities are captured by a kernel, the determinants of submatrices provide a measure of diversity or independence of items within a subset. Matroid theory gives another notion of independence, thus giving rise to optimization and sampling questions about Determinantal Point Processes (DPPs) under matroid constraints. Partition constraints, as a special case, arise naturally when incorporating additional labeling or clustering information, besides the kernel, in DPPs. Finding the maximum determinant submatrix under matroid constraints on its row/column indices has been previously studied. However, the corresponding question of sampling from DPPs under matroid constraints has been unresolved, beyond the simple cardinality constrained k-DPPs. We give the first polynomial time algorithm to sample exactly from DPPs under partition constraints, for any constant number of parti...

downloadDownload free PDF View PDFchevron_right

On the Complexity of Constrained Determinantal Point Processes

Amit Deshpande

2017

Determinantal Point Processes (DPPs) are probabilistic models that arise in quantum physics and random matrix theory and have recently found numerous applications in computer science. DPPs define distributions over subsets of a given ground set, they exhibit interesting properties such as negative correlation, and, unlike other models, have efficient algorithms for sampling. When applied to kernel methods in machine learning, DPPs favor subsets of the given data with more diverse features. However, many real-world applications require efficient algorithms to sample from DPPs with additional constraints on the subset, e.g., partition or matroid constraints that are important to ensure priors, resource or fairness constraints on the sampled subset. Whether one can efficiently sample from DPPs in such constrained settings is an important problem that was first raised in a survey of DPPs by \cite{KuleszaTaskar12} and studied in some recent works in the machine learning literature. The m...

downloadDownload free PDF View PDFchevron_right

Batch Active Learning Using Determinantal Point Processes

Erdem Bıyık

2019

Data collection and labeling is one of the main challenges in employing machine learning algorithms in a variety of real-world applications with limited data. While active learning methods attempt to tackle this issue by labeling only the data samples that give high information, they generally suffer from large computational costs and are impractical in settings where data can be collected in parallel. Batch active learning methods attempt to overcome this computational burden by querying batches of samples at a time. To avoid redundancy between samples, previous works rely on some ad hoc combination of sample quality and diversity. In this paper, we present a new principled batch active learning method using Determinantal Point Processes, a repulsive point process that enables generating diverse batches of samples. We develop tractable algorithms to approximate the mode of a DPP distribution, and provide theoretical guarantees on the degree of approximation. We further demonstrate ...

downloadDownload free PDF View PDFchevron_right

Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient

Yikang Li

2020

Determinantal point processes (DPPs) is an effective tool to deliver diversity on multiple machine learning and computer vision tasks. Under deep learning framework, DPP is typically optimized via approximation, which is not straightforward and has some conflict with diversity requirement. We note, however, there has been no deep learning paradigms to optimize DPP directly since it involves matrix inversion which may result in highly computational instability. This fact greatly hinders the wide use of DPP on some specific objectives where DPP serves as a term to measure the feature diversity. In this paper, we devise a simple but effective algorithm to address this issue to optimize DPP term directly expressed with L-ensemble in spectral domain over gram matrix, which is more flexible than learning on parametric kernels. By further taking into account some geometric constraints, our algorithm seeks to generate valid sub-gradients of DPP term in case when the DPP gram matrix is not i...

downloadDownload free PDF View PDFchevron_right

Probabilistic sequential matrix factorization

Mark Steel

arXiv (Cornell University), 2019

We introduce the probabilistic sequential matrix factorization (PSMF) method for factorizing time-varying and non-stationary datasets consisting of high-dimensional time-series. In particular, we consider nonlinear Gaussian state-space models where sequential approximate inference results in the factorization of a data matrix into a dictionary and timevarying coefficients with potentially nonlinear Markovian dependencies. The assumed Markovian structure on the coefficients enables us to encode temporal dependencies into a low-dimensional feature space. The proposed inference method is solely based on an approximate extended Kalman filtering scheme, which makes the resulting method particularly efficient. PSMF can account for temporal nonlinearities and, more importantly, can be used to calibrate and estimate generic differentiable nonlinear subspace models. We also introduce a robust version of PSMF, called rPSMF, which uses Student-t filters to handle model misspecification. We show that PSMF can be used in multiple contexts: modeling time series with a periodic subspace, robustifying changepoint detection methods, and imputing missing data in several high-dimensional time-series, such as measurements of pollutants across London.

downloadDownload free PDF View PDFchevron_right

Dynamic Matrix Factorization with Priors on Unknown Values

Amin Mantrach

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015

Advanced and effective collaborative filtering methods based on explicit feedback assume that unknown ratings do not follow the same model as the observed ones (not missing at random). In this work, we build on this assumption, and introduce a novel dynamic matrix factorization framework that allows to set an explicit prior on unknown values. When new ratings, users, or items enter the system, we can update the factorization in time independent of the size of data (number of users, items and ratings). Hence, we can quickly recommend items even to very recent users. We test our methods on three large datasets, including two very sparse ones, in static and dynamic conditions. In each case, we outrank state-of-the-art matrix factorization methods that do not use a prior on unknown ratings.

downloadDownload free PDF View PDFchevron_right

Bayesian matrix factorization with side information and Dirichlet process mixtures

Ian Porteous

Proc. AAAI, 2010

Matrix factorization is a fundamental technique in machine learning that is applicable to collaborative filtering, information retrieval and many other areas. In collaborative filtering and many other tasks, the objective is to fill in missing elements of a sparse data matrix. One of the biggest challenges in this case is filling in a column or row of the matrix with very few observations. In this paper we introduce a Bayesian matrix factorization model that performs regression against side information known about the ...

downloadDownload free PDF View PDFchevron_right

Bayesian sparse factor analysis with kernelized observations

Vanessa Gómez-verdejo

Neurocomputing, 2022

Latent variable models for multi-view learning attempt to find low-dimensional projections that fairly capture the correlations among multiple views that characterise each datum. High-dimensional views in medium-sized datasets and non-linear problems are traditionally handled by kernel methods, inducing a (non)-linear function between the latent projection and the data itself. However, they usually come with scalability issues and exposition to overfitting. To overcome these limitations, instead of imposing a kernel function, here we propose an alternative method. In particular, we combine probabilistic factor analysis with what we refer to as kernelized observations, in which the model focuses on reconstructing not the data itself, but its correlation with other data points measured by a kernel function. This model can combine several types of views (kernelized or not), can handle heterogeneous data and work in semi-supervised settings. Additionally, by including adequate priors, it can provide compact solutions for the kernelized observations (based in a automatic selection of bayesian support vectors) and can include feature selection capabilities. Using several public databases, we demonstrate the potential of our approach (and its extensions) w.r.t. common multi-view learning models such as kernel canonical correlation analysis or manifold relevance determination gaussian processes latent variable models.

downloadDownload free PDF View PDFchevron_right

Hyperdeterminantal point processes

Steven Evans

Metrika, 2009

As well as arising naturally in the study of non-intersecting random paths, random spanning trees, and eigenvalues of random matrices, determinantal point processes (sometimes also called fermionic point processes) are relatively easy to simulate and provide a quite broad class of models that exhibit repulsion between points. The fundamental ingredient used to construct a determinantal point process is a kernel giving the pairwise interactions between points: the joint distribution of any number of points then has a simple expression in terms of determinants of certain matrices defined from this kernel. In this paper we initiate the study of an analogous class of point processes that are defined in terms of a kernel giving the interaction between 2M points for some integer M . The role of matrices is now played by 2M -dimensional "hypercubic" arrays, and the determinant is replaced by a suitable generalization of it to such arrays -Cayley's first hyperdeterminant. We show that some of the desirable features of determinantal point processes continue to be exhibited by this generalization.

downloadDownload free PDF View PDFchevron_right

Dynamic Determinantal Point Processes

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics