Academia.eduAcademia.edu

Outline

Very Fast Online Learning of Highly Non Linear Problems

2007, Journal of Machine Learning Research

Abstract

The experimental investigation on the efficient learning of highly non-linear problems by online training, using ordinary feed forward neural networks and stochastic gradient descent on the errors computed by back-propagation, gives evidence that the most crucial factors for efficient training are the hidden units' differentiation, the attenuation of the hidden units' interference and the selective attention on the parts of the problems where the approximation error remains high. In this report, we present global and local selective attention techniques and a new hybrid activation function that enables the hidden units to acquire individual receptive fields which may be global or local depending on the problem's local complexities. The presented techniques enable very efficient training on complex classification problems with embedded subproblems.

References (48)

  1. C. C. Aggarwal, A. Hinneburg, and D. A. Keim. On the surprising behavior of distance metrics in high dimensional spaces. In J. Van den Bussche and V. Vianu, editors, Proceedings of the 8th International Conference on Database Theory (ICDT), volume 1973 of Lecture Notes in Computer Science, pages 420-434. Springer, 2001.
  2. K. Agyepong and R. Kothari. Controlling hidden layer capacity through lateral connections. Neural Computation, 9(6):1381-1402, 1997.
  3. S. Ahmad and S. Omohundro. A network for extracting the locations of point clusters using selective attention. In Proceedings of the 12th Annual Conference of the Cognitive Science Society, MIT, 1990.
  4. L. B. Almeida, T. Langlois, and J. D. Amaral. On-line step size adaptation. Technical Report INESC RT07/97, INESC/IST, Rua Alves Redol 1000 Lisbon, Portugal, 1997.
  5. S. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2):251-276, 1998.
  6. P. Bakker. Don't care margins help backpropagation learn exceptions. In A. Adams and L. Sterling, editors, Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, pages 139- 144, 1992.
  7. P. Bakker. Exception learning by backpropagation: A new error function. In P. Leong and M. Jabri, editors, Proceedings of the 4th Australian Conference on Neural Networks, pages 118-121, 1993.
  8. S. Baluja and D. Pomerleau. Using the representation in a neural network's hidden layer for task- specific focus of attention. In IJCAI, pages 133-141, 1995.
  9. L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California, 1996.
  10. D. S. Broomhead and D. Lowe. Multivariate functional interpolation and adaptive networks. Com- plex Systems, 2(3):321-355, 1988.
  11. W. Duch, K. Grudzinski, and G. H. F. Diercksen. Minimal distance neural methods. In World Congress of Computational Intelligence, pages 1299-1304, 1998.
  12. D. L. Elliott. A better activation function for artificial neural networks. Technical Report TR 93-8, The Institute for Systems Research, University of Maryland, College Park, MD, 1993.
  13. G. W. Flake. Square unit augmented, radially extended, multilayer perceptrons. In G. B. Orr and K. R. Müller, editors, Neural Networks: Tricks of the Trade, volume 1524 of Lecture Notes in Computer Science, pages 145-163. Springer, 1998.
  14. T. C. Fogarty. Technical note: First nearest neighbor classification on frey and slate's letter recog- nition problem. Machine Learning, 9(4):387-388, 1992.
  15. Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In ICML, pages 148- 156, 1996.
  16. P. W. Frey and D. J. Slate. Letter recognition using holland-style adaptive classifiers. Machine Learning, 6:161-182, 1991.
  17. S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):721-741, 1984.
  18. T. Graepel and N. N. Schraudolph. Stable adaptive momentum for rapid online learning in nonlinear systems. In J. R. Dorronsoro, editor, Proceedings of the International Conference on Artificial Neural Networks (ICANN), volume 2415 of Lecture Notes in Computer Science, pages 450-455. Springer, 2002.
  19. M. Harmon and L. Baird. Multi-player residual advantage learning with general function approx- imation. Technical Report WL-TR-1065, Wright Laboratory, Wright-Patterson Air Force Base, OH 45433-6543, 1996.
  20. M. Hegland and V. Pestov. Additive models in high dimensions. Computing Research Repository (CoRR), cs/9912020, 1999.
  21. S. C. Huang and Y. F. Huang. Learning algorithms for perceptrons using back propagation with selective updates. IEEE Control Systems Magazine, pages 56-61, April 1990.
  22. R.A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1: 295-307, 1988.
  23. R. Kothari and D. Ensley. Decision boundary and generalization performance of feed-forward networks with gaussian lateral connections. In S. K. Rogers, D. B. Fogel, J. C. Bezdek, and B. Bosacchi, editors, Applications and Science of Computational Intelligence, SPIE Proceedings, volume 3390, pages 314-321, 1998.
  24. B. Laheld and J. F. Cardoso. Adaptive source separation with uniform performance. In Proc. EUSIPCO, pages 183-186, September 1994.
  25. Y. LeCun, P. Simard, and B. Pearlmutter. Automatic learning rate maximization by on-line estima- tion of the hessian's eigenvectors. In S. Hanson, J. Cowan, and L. Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 156-163. Morgan Kaufmann Publish- ers, San Mateo, CA, 1993.
  26. Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Mueller. Efficient backprop. In G. B. Orr and K.-R. Müller, editors, Neural Networks: Tricks of the Trade, volume 1524 of Lecture Notes in Computer Science, pages 9-50. Springer, 1998.
  27. T. K. Leen and G. B. Orr. Optimal stochastic search and adaptive momentum. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Proceedings of the 7th NIPS Conference (NIPS), Advances in Neural Information Processing Systems 6, pages 477-484. Morgan Kaufmann, 1993.
  28. P. W. Munro. A dual back-propagation scheme for scalar reinforcement learning. In Proceedings of the 9th Annual Conference of the Cognitive Science Society, Seattle, WA, pages 165-176, 1987.
  29. N. Murata, K. Müller, A. Ziehe, and S. Amari. Adaptive on-line learning in changing environments. In M. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9 (NIPS), pages 599-605. MIT Press, 1996.
  30. D. J. Newman, S. Hettich, C.L. Blake, and C.J. Merz. UCI repository of machine learning databases, 1998.
  31. G. B. Orr and T. K. Leen. Using curvature information for fast stochastic search. In M. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9 (NIPS), pages 606-612. MIT Press, 1996.
  32. J. L. Phillips and D. C. Noelle. Reinforcement learning of dimensional attention for categorization. In Proceedings of the 26th Annual Meeting of the Cognitive Science Society, 2004.
  33. M. Plumbley. A hebbian/anti-hebbian network which optimizes information capacity by orthonor- malizing the principal subspace. In Proc. IEE Conf. on Artificial Neural Networks, Brighton, UK, pages 86-90, 1993.
  34. R. Reed, R.J. Marks, and S. Oh. Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter. IEEE Transactions on Neural Networks, 6(3):529-538, 1995.
  35. R. E. Schapire. A brief introduction to boosting. In T. Dean, editor, Proceedings of the 16th Interna- tional Joint Conference on Artificial Intelligence (IJCAI), pages 1401-1406. Morgan Kaufmann, 1999.
  36. R. E. Schapire, Y. Freund, P. Barlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. In D. H. Fisher, editor, Proceedings of the 14th International Conference on Machine Learning (ICML), pages 322-330. Morgan Kaufmann, 1997.
  37. N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7):1723-1738, 2002.
  38. N. N. Schraudolph. Centering neural network gradient factors. In G. B. Orr and K. R. M üller, edi- tors, Neural Networks: Tricks of the Trade, volume 1524 of Lecture Notes in Computer Science, pages 207-226. Springer, 1998a.
  39. N. N. Schraudolph. Accelerated gradient descent by factor-centering decomposition. Technical Report IDSIA-33-98, Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 1998b.
  40. N. N. Schraudolph. Online local gain adaptation for multi-layer perceptrons. Technical Report IDSIA-09-98, Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, Galleria 2, CH-6928 Manno, Switzerland, 1998c.
  41. N. N. Schraudolph. Local gain adaptation in stochastic gradient descent. In ICANN, pages 569-574. IEE, London, 1999.
  42. H. Schwenk and Y. Bengio. Boosting neural networks. Neural Computation, 12(8):1869-1887, 2000.
  43. H. Schwenk and Y. Bengio. Training methods for adaptive boosting of neural networks for character recognition. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10. MIT Press, Cambridge, MA, 1998.
  44. M. W. Spratling and M. H. Johnson. Neural coding strategies and mechanisms of competition. Cognitive Systems Research, 5(2):93-117, 2004.
  45. C. Thornton. The howl effect in dynamic-network learning. In Proceedings of the International Conference on Artificial Neural Networks, pages 211-214, 1992.
  46. K. M. Ting and Z. Zheng. Improving the performance of boosting for naive bayesian classification. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 296-305, 1999.
  47. Y. H. Yu and R. F. Simmons. Descending epsilon in back-propagation: A technique for better gen- eralization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), volume 3, pages 167-172, 1990.
  48. S. Zhong and J. Ghosh. Decision boundary focused neural network classifier. In Intelligent Engi- neering Systems Through Artificial Neural Networks (ANNIE). ASME Press, 2000.