Academia.eduAcademia.edu

Outline

Feedback Networks

2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

https://doi.org/10.1109/CVPR.2017.196

Abstract

Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer. This is usually actualized through feedforward multilayer neural networks, e.g. ConvNets, where each layer forms one of such successive representations. However, an alternative that can achieve the same goal is a feedback based approach in which the representation is formed in an iterative manner based on a feedback received from previous iteration's output. We establish that a feedback based approach has several core advantages over feedforward: it enables making early predictions at the query time, its output naturally conforms to a hierarchical structure in the label space (e.g. a taxonomy), and it provides a new basis for Curriculum Learning. We observe that feedback develops a considerably different representation compared to feedforward counterparts, in line with the aforementioned advantages. We provide a general feedback based learning architecture, instantiated using existing RNNs, with the endpoint results on par or better than existing feedforward networks and the addition of the above advantages.

References (69)

  1. M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 3686-3693. IEEE, 2014.
  2. S. J. Ashford and L. L. Cummings. Feedback as an individ- ual resource: Personal strategies of creating information. Or- ganizational behavior and human performance, 32(3):370- 398, 1983.
  3. V. Belagiannis and A. Zisserman. Recurrent human pose estimation. arXiv preprint arXiv:1605.02914, 2016.
  4. Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Cur- riculum learning. In Proceedings of the 26th annual interna- tional conference on machine learning. ACM, 2009.
  5. W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki. Scene labeling with lstm recurrent neural networks. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3547-3555, 2015.
  6. C. Cao, X. Liu, Y. Yang, Y. Yu, J. Wang, Z. Wang, Y. Huang, L. Wang, C. Huang, W. Xu, et al. Look and think twice: Cap- turing top-down visual attention with feedback convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2956-2964, 2015.
  7. J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik. Human pose estimation with iterative error feedback. arXiv preprint arXiv:1507.06550, 2015.
  8. R. M. Cichy, D. Pantazis, and A. Oliva. Resolving human object recognition in space and time. Nature neuroscience, 17(3):455-462, 2014.
  9. J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy, S. Bengio, Y. Li, H. Neven, and H. Adam. Large-scale object classifica- tion using label relation graphs. In European Conference on Computer Vision, pages 48-64. Springer, 2014.
  10. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248-255. IEEE, 2009.
  11. N. Ding, J. Deng, K. P. Murphy, and H. Neven. Probabilistic label relation graphs with ising models. In Proceedings of the IEEE International Conference on Computer Vision, pages 1161-1169, 2015.
  12. J. L. Elman. Learning and development in neural networks: The importance of starting small. Cognition, 48(1):71-99, 1993.
  13. F. A. Ford. Modeling the environment: an introduction to system dynamics models of environmental systems. Island Press, 1999.
  14. C. D. Gilbert and M. Sigman. Brain states: top-down influ- ences in sensory processing. Neuron, 54(5):677-696, 2007.
  15. G. Gkioxari, A. Toshev, and N. Jaitly. Chained predic- tions using convolutional neural networks. arXiv preprint arXiv:1605.02346, 2016.
  16. K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv preprint arXiv:1612.07771, 2016.
  17. K. Gregor and Y. LeCun. Learning fast approximations of sparse coding. In Proceedings of the 27th International Con- ference on Machine Learning (ICML-10), pages 399-406, 2010.
  18. D. Ha, A. Dai, and Q. V. Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016.
  19. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- ing for image recognition. arXiv preprint arXiv:1512.03385, 2015.
  20. K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. arXiv preprint arXiv:1603.05027, 2016.
  21. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735-1780, Nov. 1997.
  22. C. S. Holling. Resilience and stability of ecological systems. Annual review of ecology and systematics, pages 1-23, 1973.
  23. H. Hu, G.-T. Zhou, Z. Deng, Z. Liao, and G. Mori. Learn- ing structured inference neural networks with label relations. arXiv preprint arXiv:1511.05616, 2015.
  24. G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Weinberger. Deep networks with stochastic depth. arXiv preprint arXiv:1603.09382, 2016.
  25. J. Hupé, A. James, B. Payne, S. Lomber, P. Girard, and J. Bullier. Cortical feedback improves discrimination be- tween figure and background by v1, v2 and v3 neurons. Na- ture, 394(6695):784-787, 1998.
  26. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
  27. A. Karpathy and L. Fei-Fei. Deep visual-semantic align- ments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3128-3137, 2015.
  28. D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
  29. J. Krause, M. Stark, J. Deng, and L. Fei-Fei. 3d object rep- resentations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 554-561, 2013.
  30. A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. 2009.
  31. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012.
  32. K. A. Krueger and P. Dayan. Flexible shaping: How learning in small steps helps. Cognition, 110(3):380-394, 2009.
  33. S. Kumar and W. Byrne. Minimum bayes-risk decoding for statistical machine translation. Technical report, DTIC Doc- ument, 2004.
  34. E. B. Lee and L. Markus. Foundations of optimal control theory. Technical report, DTIC Document, 1967.
  35. T. S. Lee and D. Mumford. Hierarchical bayesian inference in the visual cortex. JOSA A, 20(7):1434-1448, 2003.
  36. K. Li, B. Hariharan, and J. Malik. Iterative instance segmen- tation. arXiv preprint arXiv:1511.08498, 2015.
  37. M. Liang and X. Hu. Recurrent convolutional neural network for object recognition. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 3367-3375, 2015.
  38. Q. Liao and T. Poggio. Bridging the gaps between residual learning, recurrent neural networks and visual cortex. arXiv preprint arXiv:1604.03640, 2016.
  39. T.-Y. Lin, A. RoyChowdhury, and S. Maji. Bilinear cnn mod- els for fine-grained visual recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages 1449-1457, 2015.
  40. L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(Nov):2579-2605, 2008.
  41. V. Mnih, N. Heess, A. Graves, et al. Recurrent models of vi- sual attention. In Advances in neural information processing systems, pages 2204-2212, 2014.
  42. A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation. arXiv preprint arXiv:1603.06937, 2016.
  43. M. Oberweger, P. Wohlhart, and V. Lepetit. Training a feed- back loop for hand pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 3316-3324, 2015.
  44. A. G. Parlos, K. T. Chong, and A. F. Atiya. Application of the recurrent multilayer perceptron in modeling complex process dynamics. IEEE Transactions on Neural Networks, 5(2):255-266, 1994.
  45. P. H. Pinheiro and R. Collobert. Recurrent convolutional neural networks for scene labeling. In ICML, pages 82-90, 2014.
  46. V. Ramakrishna, D. Munoz, M. Hebert, J. A. Bagnell, and Y. Sheikh. Pose machines: Articulated pose estimation via inference machines. In European Conference on Computer Vision, pages 33-47. Springer, 2014.
  47. N. C. Rust and J. J. DiCarlo. Selectivity and tolerance (invariance) both increase as visual information propagates from cortical area v4 to it. The Journal of Neuroscience, 30(39):12978-12995, 2010.
  48. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  49. S. Singh, D. Hoiem, and D. Forsyth. Swapout: Learn- ing an ensemble of deep architectures. arXiv preprint arXiv:1605.06465, 2016.
  50. R. Socher, B. Huval, B. P. Bath, C. D. Manning, and A. Y. Ng. Convolutional-recursive deep learning for 3d object clas- sification. In NIPS, volume 3, page 8, 2012.
  51. R. Socher, C. C. Lin, C. Manning, and A. Y. Ng. Parsing nat- ural scenes and natural language with recursive neural net- works. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 129-136, 2011.
  52. Y. Song, M. Zhao, J. Yagnik, and X. Wu. Taxonomic classi- fication for web-based videos. In Computer Vision and Pat- tern Recognition (CVPR), 2010 IEEE Conference on, pages 871-878. IEEE, 2010.
  53. R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015.
  54. M. F. Stollenga, J. Masci, F. Gomez, and J. Schmidhuber. Deep networks with internal selective attention through feed- back connections. In Advances in Neural Information Pro- cessing Systems, pages 3545-3553, 2014.
  55. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1-9, 2015.
  56. J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bre- gler. Efficient object localization using convolutional net- works. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 648-656, 2015.
  57. A. Toshev and C. Szegedy. Deeppose: Human pose estima- tion via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1653-1660, 2014.
  58. Z. Tu. Auto-context and its application to high-level vision tasks. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8. IEEE, 2008.
  59. A. Veit, M. J. Wilber, and S. Belongie. Residual networks behave like ensembles of relatively shallow networks. In Advances in Neural Information Processing Systems, pages 550-558, 2016.
  60. Q. Wang, J. Zhang, S. Song, and Z. Zhang. Attentional neu- ral network: Feature selection using cognitive feedback. In Advances in Neural Information Processing Systems, pages 2033-2041, 2014.
  61. Z. Wang, S. Chang, J. Zhou, M. Wang, and T. S. Huang. Learning a task-specific deep architecture for clustering. In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 369-377. SIAM, 2016.
  62. D. J. Weiss and B. Taskar. Structured prediction cascades. In AISTATS, pages 916-923, 2010.
  63. D. H. Wolpert. Stacked generalization. Neural networks, 5(2):241-259, 1992.
  64. D. Wyatte, T. Curran, and R. O'Reilly. The limits of feed- forward vision: Recurrent processing promotes robust object recognition when objects are degraded. Journal of Cognitive Neuroscience, 24(11):2248-2261, 2012.
  65. S. Xie, T. Yang, X. Wang, and Y. Lin. Hyper-class aug- mented and regularized deep learning for fine-grained im- age classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2645- 2654, 2015.
  66. S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, and W.-c. Woo. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems, 2015.
  67. K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhut- dinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044, 2(3):5, 2015.
  68. M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In European Conference on Com- puter Vision, pages 818-833. Springer, 2014.
  69. S. Zhang, Y. Wu, T. Che, Z. Lin, R. Memisevic, R. Salakhut- dinov, and Y. Bengio. Architectural complexity measures of recurrent neural networks. arXiv preprint arXiv:1602.08210, 2016.