Academia.eduAcademia.edu

Contrastive Divergence

description7 papers
group0 followers
lightbulbAbout this topic
Contrastive Divergence is a learning algorithm used in machine learning, particularly in training restricted Boltzmann machines. It approximates the gradient of the log-likelihood of the data by using a short Markov chain to sample from the model's distribution, facilitating efficient parameter updates in unsupervised learning scenarios.
lightbulbAbout this topic
Contrastive Divergence is a learning algorithm used in machine learning, particularly in training restricted Boltzmann machines. It approximates the gradient of the log-likelihood of the data by using a short Markov chain to sample from the model's distribution, facilitating efficient parameter updates in unsupervised learning scenarios.

Key research themes

1. How can contrastive divergence algorithms be improved to reduce bias and enhance convergence properties in training Restricted Boltzmann Machines?

This research area focuses on addressing the inherent bias and convergence issues of contrastive divergence (CD) methods used to train Restricted Boltzmann Machines (RBMs). Despite their practical success, standard CD and its variants can yield biased gradient estimates, sometimes causing divergence or poor likelihood maximization. Enhancements or modifications to the CD framework aim to reduce such biases, improve likelihood estimations, and provide reliable stopping criteria to ensure more stable and accurate learning outcomes.

Key finding: The paper empirically demonstrates that the log-likelihood often decreases during training when contrastive divergence (CD), persistent CD (PCD), or fast PCD are used, especially when the target distribution is difficult to... Read more
Key finding: Proposes Weighted Contrastive Divergence (WCD), a modified version of standard CD where elements in the negative phase are weighted by their relative batch probabilities. This subtle change significantly improves the learned... Read more
Key finding: Provides a new theoretical upper bound on the bias of CD-k gradient estimates, linking the bias magnitude to the Gibbs chain length (k), the number of RBM variables, and the maximum energy change caused by flipping a single... Read more

2. What alternative criteria and methods can be employed to monitor and improve the stopping point during contrastive divergence training of Restricted Boltzmann Machines?

Since CD algorithms can exhibit non-monotonic behavior in likelihood during training, reliable stopping criteria beyond reconstruction error are critical. This area investigates new metrics, neighborhood-based measures, and auxiliary functionals that can accurately detect when the model is optimally trained, thereby preventing overfitting or divergence and improving practical training robustness.

Key finding: Investigates and proposes alternative stopping criteria for CD training beyond the traditional reconstruction error, which often fails to correlate with likelihood increases. Introduces methods based on probabilities that... Read more
Key finding: Develops a neighborhood-based stopping criterion incorporating the energy and probability of states neighboring the training samples rather than just the reconstruction error. This approach exploits the continuity and... Read more

3. How are various noise-contrastive estimation (NCE) methods related to contrastive divergence and maximum likelihood estimation for unnormalised probabilistic models?

This research theme clarifies theoretical connections between NCE variants, notably ranking NCE (RNCE) and conditional NCE (CNCE), with classical maximum likelihood estimation employing importance sampling and with contrastive divergence used in energy-based model learning. By establishing equivalences and special case relationships, it enables cross-fertilization of analytical insights and algorithmic improvements across these estimation methodologies.

Key finding: Demonstrates that RNCE can be interpreted as maximum likelihood estimation augmented with conditional importance sampling (CIS), revealing it as a form of IS. Furthermore, both RNCE and CNCE are shown to be special cases of... Read more

All papers in Contrastive Divergence

Time series prediction appear in many real-world problems, e.g., financial market, signal processing, weather forecasting among others. The underlying models and time series data of those problems are generally complex in a way that... more
Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this... more
Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this... more
Learning algorithms relying on Gibbs sampling based stochastic approximations of the log-likelihood gradient have become a common way to train Restricted Boltzmann Machines (RBMs). We study three of these methods, Contrastive Divergence... more
Many computer vision problems can be formulated in a Bayesian framework with Markov Random Field (MRF) or Conditional Random Field (CRF) priors. Usually, the model assumes that a full Maximum A Posteriori (MAP) estimation will be... more
Many computer vision problems can be formulated in a Bayesian framework with Markov Random Field (MRF) or Conditional Random Field (CRF) priors. Usually, the model assumes that a full Maximum A Posteriori (MAP) estimation will be... more
Many computer vision problems can be formulated in a Bayesian framework with Markov Random Field (MRF) or Conditional Random Field (CRF) priors. Usually, the model assumes that a full Maximum A Posteriori (MAP) estimation will be... more
A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.RBMs were initially invented under the name Harmonium by Paul Smolensky in 1986 and... more
Download research papers for free!