Online Learning From Finite Training Sets and
https://doi.org/10.1162/089976698300017034Abstract
We analyse online gradient descent learning from nite training sets at non-in nitesimal learning rates for both linear and non-linear networks. In the linear case, exact results are obtained for the time-dependent generalization error of networks with a large number of weights N, trained on p = N examples. This allows us to study in detail the e ects of nite training set size on, for example, the optimal choice of learning rate. We also compare online and o ine learning, for respective optimal settings of at given nal learning time. Online learning turns out to be much more robust to input bias and actually outperforms o ine learning when such bias is present; for unbiased inputs, online and o ine learning perform almost equally well. Our analysis of online learning for non-linear networks (namely, softcommittee machines), advances the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or in nite training sets. Preliminary comparisons with simulations suggest that the theory captures some e ects of nite training sets, but may not yet account correctly for the presence of local minima.
References (13)
- Biehl, M. and Schwarze, H. (1995). Learning by online gradient descent. Journal of Physics A, 28:643{656.
- Coolen, A. C. C., Laughton, S. N., and Sherrington, D. (1996). Modern Ana- lytic Techniques to Solve the Dynamics of Recurrent Neural Networks. In Toutretzky, D. S., Mozer, M. C., and Hasslemo, M. E., editors, Advances in Neural Information Processing Systems NIPS 8. MIT Press.
- Halkj r, S. and Winther, O. (1997). The e ect of correlated input data on the dynamics of learning. In Mozer, M. C., Jordan, M. I., and Petsche, T., editors, Advances in Neural Information Processing Systems 9, pages 169{175, Cambridge, MA. MIT Press.
- Hertz, J. A., Krogh, A., and Thorbergsson, G. I. (1989). Phase transitions in simple learning. Journal of Physics A, 22:2133{2150.
- Heskes, T. and Kappen, B. (1991). Learning processes in neural networks. Physical Review A, 44:2718{2762.
- Krogh, A. and Hertz, J. A. (1992). Generalization in a linear perceptron in the presence of noise. Journal of Physics A, 25:1135{1147.
- LeCun, Y., Kanter, I., and Solla, S. A. (1991). Eigenvalues of covariance ma- trices -application to neural-network learning. Physical Review Letters, 66(18):2396{2399.
- Saad, D. and Solla, S. A. (1995). Online learning in soft committee machines. Physical Review E, 52:4225.
- Sollich, P. (1994). Finite size e ects in learning and generalization in linear perceptrons. Journal of Physics A, 27:7771{7784.
- Sollich, P. (1995). Learning unrealizable tasks from minimum entropy queries. Journal of Physics A, 28:6125{6142.
- Sollich, P. and Barber, D. (1997a). On-line learning from nite training sets. Europhysics Letters, 38:477{482.
- Sollich, P. and Barber, D. (1997b). Online learning from nite training sets: An analytical case study. In Mozer, M. C., Jordan, M. I., and Petsche, T., editors, Advances in Neural Information Processing Systems 9, pages 274{280, Cambridge, MA. MIT Press.
- Wendemuth, A., Opper, M., and Kinzel, W. (1993). The e ect of correlations in neural networks. Journal of Physics A, 26(13):3165{3185.