Contrastive Divergence

description7 papers

group0 followers

lightbulbAbout this topic

Contrastive Divergence is a learning algorithm used in machine learning, particularly in training restricted Boltzmann machines. It approximates the gradient of the log-likelihood of the data by using a short Markov chain to sample from the model's distribution, facilitating efficient parameter updates in unsupervised learning scenarios.

lightbulbAbout this topic

Key research themes

1. How can contrastive divergence algorithms be improved to reduce bias and enhance convergence properties in training Restricted Boltzmann Machines?

This research area focuses on addressing the inherent bias and convergence issues of contrastive divergence (CD) methods used to train Restricted Boltzmann Machines (RBMs). Despite their practical success, standard CD and its variants can yield biased gradient estimates, sometimes causing divergence or poor likelihood maximization. Enhancements or modifications to the CD framework aim to reduce such biases, improve likelihood estimations, and provide reliable stopping criteria to ensure more stable and accurate learning outcomes.

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines

by Christian Igel

2022, Artificial Neural Networks – ICANN 2010

Key finding: The paper empirically demonstrates that the log-likelihood often decreases during training when contrastive divergence (CD), persistent CD (PCD), or fast PCD are used, especially when the target distribution is difficult to... Read more

articleView Paper downloadDownload

Weighted contrastive divergence

by Enrique Romero

2022, Neural Networks

Key finding: Proposes Weighted Contrastive Divergence (WCD), a modified version of standard CD where elements in the negative phase are weighted by their relative batch probabilities. This subtle change significantly improves the learned... Read more

articleView Paper downloadDownload

Bounding the Bias of Contrastive Divergence Learning

by Christian Igel

2022, Neural Computation

Key finding: Provides a new theoretical upper bound on the bias of CD-k gradient estimates, linking the bias magnitude to the Gibbs chain length (k), the number of RBM variables, and the maximum energy change caused by flipping a single... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What alternative criteria and methods can be employed to monitor and improve the stopping point during contrastive divergence training of Restricted Boltzmann Machines?

Since CD algorithms can exhibit non-monotonic behavior in likelihood during training, reliable stopping criteria beyond reconstruction error are critical. This area investigates new metrics, neighborhood-based measures, and auxiliary functionals that can accurately detect when the model is optimally trained, thereby preventing overfitting or divergence and improving practical training robustness.

Stopping Criteria in Contrastive Divergence: Alternatives to the Reconstruction Error

by Enrique Romero

2022

Key finding: Investigates and proposes alternative stopping criteria for CD training beyond the traditional reconstruction error, which often fails to correlate with likelihood increases. Introduces methods based on probabilities that... Read more

articleView Paper downloadDownload

A Neighbourhood-Based Stopping Criterion for Contrastive Divergence Learning

by Enrique Romero

2022, arXiv: Neural and Evolutionary Computing

Key finding: Develops a neighborhood-based stopping criterion incorporating the energy and probability of states neighboring the training samples rather than just the reconstruction error. This approach exploits the continuity and... Read more

articleView Paper downloadDownload

3. How are various noise-contrastive estimation (NCE) methods related to contrastive divergence and maximum likelihood estimation for unnormalised probabilistic models?

This research theme clarifies theoretical connections between NCE variants, notably ranking NCE (RNCE) and conditional NCE (CNCE), with classical maximum likelihood estimation employing importance sampling and with contrastive divergence used in energy-based model learning. By establishing equivalences and special case relationships, it enables cross-fertilization of analytical insights and algorithmic improvements across these estimation methodologies.

On the connection between Noise-Contrastive Estimation and Contrastive Divergence

by Amanda Olmin

2025, arXiv (Cornell University)

Key finding: Demonstrates that RNCE can be interpreted as maximum likelihood estimation augmented with conditional importance sampling (CIS), revealing it as a form of IS. Furthermore, both RNCE and CNCE are shown to be special cases of... Read more

articleView Paper downloadDownload

All papers in Contrastive Divergence

Time Series Prediction Using Restricted Boltzmann Machines and Backpropagation

by Renato Krohling

2023, Procedia Computer Science

Time series prediction appear in many real-world problems, e.g., financial market, signal processing, weather forecasting among others. The underlying models and time series data of those problems are generally complex in a way that... more

descriptionView Paper arrow_downwardDownload

Weighted contrastive divergence

by Enrique Romero

2022, Neural Networks

Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this... more

descriptionView Paper arrow_downwardDownload

Weighted contrastive divergence

by Enrique León Romero

2022, Neural Networks

descriptionView Paper arrow_downwardDownload

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines

by Christian Igel

2022, Artificial Neural Networks – ICANN 2010

Learning algorithms relying on Gibbs sampling based stochastic approximations of the log-likelihood gradient have become a common way to train Restricted Boltzmann Machines (RBMs). We study three of these methods, Contrastive Divergence... more

descriptionView Paper arrow_downwardDownload

Learning real-time MRF inference for image denoising

by Adrian Barbu

2021, 2009 IEEE Conference on Computer Vision and Pattern Recognition

Many computer vision problems can be formulated in a Bayesian framework with Markov Random Field (MRF) or Conditional Random Field (CRF) priors. Usually, the model assumes that a full Maximum A Posteriori (MAP) estimation will be... more

descriptionView Paper arrow_downwardDownload

Learning real-time MRF inference for image denoising

by Adrian Barbu

2021, Computer Vision and Pattern Recognition, 2009. …

descriptionView Paper arrow_downwardDownload

Learning real-time MRF inference for image denoising

by Adrian Barbu

2021, 2009 IEEE Conference on Computer Vision and Pattern Recognition

descriptionView Paper arrow_downwardDownload

A Briefly Explanation of Restricted Boltzmann Machine with Practical Implementation in Pytorch

by Amir Ali and

2019, Restricted Boltzmann Machine

A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.RBMs were initially invented under the name Harmonium by Paul Smolensky in 1986 and... more

RBM is a Stochastic Neural Network which means that each neuron will have some random behavior when activated. There are two other layers of bias units (hidden bias and visible bias) in an RBM. This is what makes RBMs different from autoencoders. The hidden bias RBM produces the activation on the forward pass and the visible bias helps RBM to reconstruct the input during a backward pass. The reconstructed input is always different from the actual input as there are no connections among the visible units and therefore, no way of transferring information among themselves.

(Note that we are dealing with vectors and matrices here and not one-dimensional values.) where v(1) and h(1) are the corresponding vectors (column matrices) for the visible and the hidden layers with the superscript as the iteration and b is the visible layer bias vector. So the equation that we get in this step would be,

The analysis of hidden factors is performed in a binary way, i.e, the user only tells if they liked (rating 1) a specific movie or not (rating 0) and it represents the inputs for the input/visible layer. Given the inputs, the RMB then tries to discover latent factors in the data that can explain the movie choices and each hidden neuron represents one of the latent factors.

Import the dataset In the next step, we import the users, ratings, and movie dataset. In our case, our dataset is separated by double colons. The dataset does not have any headers so we shall pass the headers as none. We then set the engine to Python to ensure the dataset is correctly imported.

Where the second term is obtained after each k steps of Gibbs Sampling. Here is the pseudo-code for the CD algorithm: Example: Recommender System of Movies

descriptionView Paper arrow_downwardDownload