Compressed Predictive Information Coding
2022, arXiv (Cornell University)
https://doi.org/10.48550/ARXIV.2203.02051Abstract
Unsupervised learning plays an important role in many fields, such as artificial intelligence, machine learning, and neuroscience. Compared to static data, methods for extracting lowdimensional structure for dynamic data are lagging. We developed a novel information-theoretic framework, Compressed Predictive Information Coding (CPIC), to extract useful representations from dynamic data. CPIC selectively projects the past (input) into a linear subspace that is predictive about the compressed data projected from the future (output). The key insight of our framework is to learn representations by minimizing the compression complexity and maximizing the predictive information in latent space. We derive variational bounds of the CPIC loss which induces the latent space to capture information that is maximally predictive. Our variational bounds are tractable by leveraging bounds of mutual information. We find that introducing stochasticity in the encoder robustly contributes to better representation. Furthermore, variational approaches perform better in mutual information estimation compared with estimates under a Gaussian assumption. We demonstrate that CPIC is able to recover the latent space of noisy dynamical systems with low signal-to-noise ratios, and extracts features predictive of exogenous variables in neuroscience data.
References (39)
- Agakov, D. B. F. The im algorithm: a variational approach to information maximization. Advances in neural infor- mation processing systems, 16(320):201, 2004.
- Alemi, A. A., Fischer, I., Dillon, J. V., and Murphy, K. Deep variational information bottleneck. arXiv preprint arXiv:1612.00410, 2016.
- Baevski, A., Zhou, H., Mohamed, A., and Auli, M. wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv preprint arXiv:2006.11477, 2020.
- Bai, J., Wang, W., Zhou, Y., and Xiong, C. Representation learning for sequence data with deep autoencoding pre- dictive components. arXiv preprint arXiv:2010.03135, 2020.
- Belghazi, M. I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Courville, A., and Hjelm, R. D. Mine: mutual informa- tion neural estimation. arXiv preprint arXiv:1801.04062, 2018.
- Bengio, Y., Courville, A., and Vincent, P. Representation learning: A review and new perspectives. IEEE transac- tions on pattern analysis and machine intelligence, 35(8): 1798-1828, 2013.
- Bialek, W., Nemenman, I., and Tishby, N. Predictability, complexity, and learning. Neural computation, 13(11): 2409-2463, 2001.
- Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
- Chechik, G., Globerson, A., Tishby, N., Weiss, Y., and Dayan, P. Information bottleneck for gaussian variables. Journal of machine learning research, 6(1), 2005.
- Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. A simple framework for contrastive learning of visual rep- resentations. In International conference on machine learning, pp. 1597-1607. PMLR, 2020.
- Cheng, P., Hao, W., Dai, S., Liu, J., Gan, Z., and Carin, L. Club: A contrastive log-ratio upper bound of mutual information. In International Conference on Machine Learning, pp. 1779-1788. PMLR, 2020.
- Clark, D. G., Livezey, J. A., and Bouchard, K. E. Unsu- pervised discovery of temporal structure in noisy data with dynamical components analysis. arXiv preprint arXiv:1905.09944, 2019.
- Creutzig, F. and Sprekeler, H. Predictive coding and the slowness principle: An information-theoretic approach. Neural Computation, 20(4):1026-1041, 2008.
- Creutzig, F., Globerson, A., and Tishby, N. Past-future information bottleneck in dynamical systems. Physical Review E, 79(4):041925, 2009.
- Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for lan- guage understanding. arXiv preprint arXiv:1810.04805, 2018.
- Dimitrov, A. G., Lazar, A. A., and Victor, J. D. Informa- tion theory in neuroscience. Journal of computational neuroscience, 30(1):1-5, 2011.
- Donsker, M. D. and Varadhan, S. S. Asymptotic evaluation of certain markov process expectations for large time, i. Communications on Pure and Applied Mathematics, 28 (1):1-47, 1975.
- Glaser, J. I., Benjamin, A. S., Chowdhury, R. H., Perich, M. G., Miller, L. E., and Kording, K. P. Machine learning for neural decoding. Eneuro, 7(4), 2020.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. Advances in neural informa- tion processing systems, 27, 2014.
- Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., Doersch, C., Pires, B. A., Guo, Z. D., Azar, M. G., et al. Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733, 2020.
- He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. Mo- mentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729- 9738, 2020.
- Kingma, D. P. and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Lachmann, A., Giorgi, F. M., Lopez, G., and Califano, A. Aracne-ap: gene network reverse engineering through adaptive partitioning inference of mutual information. Bioinformatics, 32(14):2233-2235, 2016.
- Linderman, S., Johnson, M., Miller, A., Adams, R., Blei, D., and Paninski, L. Bayesian learning and inference in re- current switching linear dynamical systems. In Artificial Intelligence and Statistics, pp. 914-922. PMLR, 2017.
- Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
- Nguyen, X., Wainwright, M. J., and Jordan, M. I. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847-5861, 2010.
- Nowozin, S., Cseke, B., and Tomioka, R. f-gan: Training generative neural samplers using variational divergence minimization. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 271-279, 2016.
- Oord, A. v. d., Li, Y., and Vinyals, O. Representation learn- ing with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- O'Doherty, J. E., Cardoso, M., Makin, J., and Sabes, P. Nonhuman primate reaching with multichannel senso- rimotor cortex electrophysiology. Zenodo http://doi. org/10.5281/zenodo, 583331, 2017.
- Pchelintsev, A. Numerical and physical modeling of the dynamics of the lorenz system. Numerical analysis and Applications, 7(2):159-167, 2014.
- Poole, B., Ozair, S., Van Den Oord, A., Alemi, A., and Tucker, G. On variational bounds of mutual information. In International Conference on Machine Learning, pp. 5171-5180. PMLR, 2019.
- She, Q. and Wu, A. Neural dynamics discovery via gaussian process recurrent neural networks. In Uncertainty in Artificial Intelligence, pp. 454-464. PMLR, 2020.
- Silverman, B. W. Density estimation for statistics and data analysis. Routledge, 2018.
- Tishby, N., Pereira, F. C., and Bialek, W. The informa- tion bottleneck method. arXiv preprint physics/0004057, 2000.
- Turner, R. and Sahani, M. A maximum-likelihood interpre- tation for slow feature analysis. Neural computation, 19 (4):1022-1038, 2007.
- Wang, W., Tang, Q., and Livescu, K. Unsupervised pre- training of bidirectional speech encoders via masked re- construction. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6889-6893. IEEE, 2020.
- Wang, Y., Ribeiro, J. M. L., and Tiwary, P. Past-future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and ki- netics. Nature communications, 10(1):1-8, 2019.
- Wiskott, L. and Sejnowski, T. J. Slow feature analysis: Un- supervised learning of invariances. Neural computation, 14(4):715-770, 2002.
- Zhao, Y. and Park, I. M. Variational latent gaussian pro- cess for recovering single-trial dynamics from popula- tion spike trains. Neural computation, 29(5):1293-1316, 2017.