Training Neural Networks with Implicit Variance

Bayer, Justin; Osendorfer, Christian; Urban, Sebastian; van der Smagt, Patrick

doi:10.1007/978-3-642-42042-9_17

Outline

Training Neural Networks with Implicit Variance

Sebastian Urban

Patrick van der Smagt

2013, Lecture Notes in Computer Science

https://doi.org/10.1007/978-3-642-42042-9_17

visibility

…

description

8 pages

link

1 file

Abstract

We present a novel method to train predictive Gaussian distributions p(z|x) for regression problems with neural networks. While most approaches either ignore or explicitly model the variance as another response variable, it is trained implicitly in our case. Establishing stochasticty by the injection of noise into the input and hidden units, the outputs are approximated with a Gaussian distribution by the forward propagation method introduced for fast dropout [1]. We have designed our method to respect that probabilistic interpretation of the output units in the loss function. The method is evaluated on a synthetic and a inverse robot dynamics task, yielding superior performance to plain neural networks, Gaussian processes and LWPR in terms of mean squared error and likelihood.

References (28)

Wang, S., Manning, C.: Fast dropout training. In: Proceedings of the 30th Inter- national Conference on Machine Learning (ICML-13). (2013) 118-126
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convo- lutional neural networks. In: Advances in Neural Information Processing Systems 25. (2012) 1106-1114
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neu- ral networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on 20(1) (2012) 30-42
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on Machine learning, ACM (2007) 473-480
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786) (2006) 504-507
Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE (2012) 3642-3649
Zeiler, M., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing, ICASSP (2013)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Neal, R.M.: Connectionist learning of belief networks. Artificial intelligence 56(1) (1992) 71-113
Bengio, Y., Thibodeau-Laufer, .: Deep generative stochastic networks trainable by backprop. (2013)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back- propagating errors. Nature 323(6088) (1986) 533-536
Tang, Y., Salakhutdinov, R.: A new learning algorithm for stochastic feedforward neural nets. (2013)
Bengio, Y.: Estimating or propagating gradients through stochastic neurons. arXiv preprint arXiv:1305.2982 (2013)
Salakhutdinov, R., Hinton, G.: Using deep belief nets to learn covariance kernels for gaussian processes. Advances in neural information processing systems 20 (2008) 1249-1256
Uria, B., Murray, I., Renals, S., Richmond, K.: Deep architectures for articulatory inversion. In: Proceedings of Interspeech. (2012)
Bishop, C.M.: Mixture density networks. (1994)
Werbos, P.: Beyond regression: New tools for prediction and analysis in the be- havioral sciences. (1974)
Le Cun, Y.: Learning process in an asymmetric threshold network. In: Disordered systems and biological organization. Springer (1986) 233-240
Bishop, C.M., et al.: Pattern recognition and machine learning. Volume 1. springer New York (2006)
Julier, S.J., Uhlmann, J.K.: New extension of the kalman filter to nonlinear sys- tems. In: AeroSense'97, International Society for Optics and Photonics (1997) 182-193
Vijayakumar, S., D'souza, A., Schaal, S.: Incremental online learning in high di- mensions. Neural computation 17(12) (2005) 2602-2634
Rasmussen, C.E.: Gaussian processes for machine learning. Citeseer (2006)
LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R.: E cient backprop. In: Neural networks: Tricks of the trade. Springer (1998) 9-50
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13 (2012) 281-305
Tieleman, T., Hinton, G.: Lecture 6.5 -rmsprop: Divide the gradient by a run- ning average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012)
Sutskever, I.: Training Recurrent Neural Networks. PhD thesis, University of Toronto (2013)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. (2013)
Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic gaussian process regression. In: Proceedings of the 22nd international conference on Machine learning, ACM (2005) 489-496

Training Neural Networks with Implicit Variance

Sign up for access to the world's latest research

Abstract

Related papers

References (28)

Related papers

Cited by