Training Neural Networks with Implicit Variance
2013, Lecture Notes in Computer Science
https://doi.org/10.1007/978-3-642-42042-9_17Abstract
We present a novel method to train predictive Gaussian distributions p(z|x) for regression problems with neural networks. While most approaches either ignore or explicitly model the variance as another response variable, it is trained implicitly in our case. Establishing stochasticty by the injection of noise into the input and hidden units, the outputs are approximated with a Gaussian distribution by the forward propagation method introduced for fast dropout [1]. We have designed our method to respect that probabilistic interpretation of the output units in the loss function. The method is evaluated on a synthetic and a inverse robot dynamics task, yielding superior performance to plain neural networks, Gaussian processes and LWPR in terms of mean squared error and likelihood.
References (28)
- Wang, S., Manning, C.: Fast dropout training. In: Proceedings of the 30th Inter- national Conference on Machine Learning (ICML-13). (2013) 118-126
- Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convo- lutional neural networks. In: Advances in Neural Information Processing Systems 25. (2012) 1106-1114
- Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neu- ral networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on 20(1) (2012) 30-42
- Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on Machine learning, ACM (2007) 473-480
- Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786) (2006) 504-507
- Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE (2012) 3642-3649
- Zeiler, M., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing, ICASSP (2013)
- Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
- Neal, R.M.: Connectionist learning of belief networks. Artificial intelligence 56(1) (1992) 71-113
- Bengio, Y., Thibodeau-Laufer, .: Deep generative stochastic networks trainable by backprop. (2013)
- Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back- propagating errors. Nature 323(6088) (1986) 533-536
- Tang, Y., Salakhutdinov, R.: A new learning algorithm for stochastic feedforward neural nets. (2013)
- Bengio, Y.: Estimating or propagating gradients through stochastic neurons. arXiv preprint arXiv:1305.2982 (2013)
- Salakhutdinov, R., Hinton, G.: Using deep belief nets to learn covariance kernels for gaussian processes. Advances in neural information processing systems 20 (2008) 1249-1256
- Uria, B., Murray, I., Renals, S., Richmond, K.: Deep architectures for articulatory inversion. In: Proceedings of Interspeech. (2012)
- Bishop, C.M.: Mixture density networks. (1994)
- Werbos, P.: Beyond regression: New tools for prediction and analysis in the be- havioral sciences. (1974)
- Le Cun, Y.: Learning process in an asymmetric threshold network. In: Disordered systems and biological organization. Springer (1986) 233-240
- Bishop, C.M., et al.: Pattern recognition and machine learning. Volume 1. springer New York (2006)
- Julier, S.J., Uhlmann, J.K.: New extension of the kalman filter to nonlinear sys- tems. In: AeroSense'97, International Society for Optics and Photonics (1997) 182-193
- Vijayakumar, S., D'souza, A., Schaal, S.: Incremental online learning in high di- mensions. Neural computation 17(12) (2005) 2602-2634
- Rasmussen, C.E.: Gaussian processes for machine learning. Citeseer (2006)
- LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R.: E cient backprop. In: Neural networks: Tricks of the trade. Springer (1998) 9-50
- Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13 (2012) 281-305
- Tieleman, T., Hinton, G.: Lecture 6.5 -rmsprop: Divide the gradient by a run- ning average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012)
- Sutskever, I.: Training Recurrent Neural Networks. PhD thesis, University of Toronto (2013)
- Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. (2013)
- Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic gaussian process regression. In: Proceedings of the 22nd international conference on Machine learning, ACM (2005) 489-496