On the Generalization Ability of On-Line Learning Algorithms
2004, IEEE Transactions on Information Theory
Abstract
In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking). We present a generic decoupling technique that enables us to provide Rademacher complexity-based generalization error bounds. Our bounds are in general tighter than those obtained by Wang et al. (2012) for the same problem. Using our decoupling technique, we are further able to obtain fast convergence rates for strongly convex pairwise loss functions. We are also able to analyze a class of memory efficient online learning algorithms for pairwise learning problems that use only a bounded subset of past training samples to update the hypothesis at each step. Finally, in order to complement our generalization bounds, we propose a novel memory efficient online learning algorithm for higher order learning problems with bounded regret guarantees.
References (27)
- Agarwal, Shivani and Niyogi, Partha. Generaliza- tion Bounds for Ranking Algorithms via Algorith- mic Stability. JMLR, 10:441-474, 2009.
- Balcan, Maria-Florina and Blum, Avrim. On a Theory of Learning with Similarity Functions. In ICML, pp. 73-80, 2006.
- Bellet, Aurélien, Habrard, Amaury, and Sebban, Marc. Similarity Learning for Provably Accurate Sparse Linear Classification. In ICML, 2012.
- Brefeld, Ulf and Scheffer, Tobias. AUC Maximizing Support Vector Learning. In ICML workshop on ROC Analysis in Machine Learning, 2005.
- Cao, Qiong, Guo, Zheng-Chu, and Ying, Yiming. Gen- eralization Bounds for Metric and Similarity Learn- ing, 2012. arXiv:1207.5437.
- Cesa-Bianchi, Nicoló and Gentile, Claudio. Improved Risk Tail Bounds for On-Line Algorithms. IEEE Trans. on Inf. Theory, 54(1):286-390, 2008.
- Cesa-Bianchi, Nicoló, Conconi, Alex, and Gentile, Claudio. On the Generalization Ability of On-Line Learning Algorithms. In NIPS, pp. 359-366, 2001.
- Clémençon, Stéphan, Lugosi, Gábor, and Vayatis, Nicolas. Ranking and empirical minimization of U- statistics. Annals of Statistics, 36:844-874, 2008.
- Cortes, Corinna, Mohri, Mehryar, and Rostamizadeh, Afshin. Generalization Bounds for Learning Ker- nels. In ICML, pp. 247-254, 2010a.
- Cortes, Corinna, Mohri, Mehryar, and Rostamizadeh, Afshin. Two-Stage Learning Kernel Algorithms. In ICML, pp. 239-246, 2010b.
- Cristianini, Nello, Shawe-Taylor, John, Elisseeff, André, and Kandola, Jaz S. On Kernel-Target Alignment. In NIPS, pp. 367-373, 2001.
- Freedman, David A. On Tail Probabilities for Martin- gales. Annals of Probability, 3(1):100-118, 1975.
- Hazan, Elad, Kalai, Adam, Kale, Satyen, and Agar- wal, Amit. Logarithmic Regret Algorithms for On- line Convex Optimization. In COLT, pp. 499-513, 2006.
- Jin, Rong, Wang, Shijun, and Zhou, Yang. Regular- ized Distance Metric Learning: Theory and Algo- rithm. In NIPS, pp. 862-870, 2009.
- Kakade, Sham M. and Tewari, Ambuj. On the Gen- eralization Ability of Online Strongly Convex Pro- gramming Algorithms. In NIPS, pp. 801-808, 2008.
- Kakade, Sham M., Sridharan, Karthik, and Tewari, Ambuj. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization. In NIPS, 2008.
- Kakade, Sham M., Shalev-Shwartz, Shai, and Tewari, Ambuj. Regularization Techniques for Learning with Matrices. JMLR, 13:1865-1890, 2012.
- Kumar, Abhishek, Niculescu-Mizil, Alexandru, Kavukcuoglu, Koray, and III, Hal Daumé. A Binary Classification Framework for Two-Stage Multiple Kernel Learning. In ICML, 2012.
- Ledoux, Michel and Talagrand, Michel. Probabil- ity in Banach Spaces: Isoperimetry and Processes. Springer, 2002.
- Sridharan, Karthik, Shalev-Shwartz, Shai, and Srebro, Nathan. Fast Rates for Regularized Objectives. In NIPS, pp. 1545-1552, 2008.
- Steinwart, Ingo and Christmann, Andreas. Support Vector Machines. Information Science and Statis- tics. Springer, 2008.
- Vitter, Jeffrey Scott. Random Sampling with a Reser- voir. ACM Trans. on Math. Soft., 11(1):37-57, 1985.
- Wang, Yuyang, Khardon, Roni, Pechyony, Dmitry, and Jones, Rosie. Generalization Bounds for Online Learning Algorithms with Pairwise Loss Functions. JMLR -Proceedings Track, 23:13.1-13.22, 2012.
- Wang, Yuyang, Khardon, Roni, Pechyony, Dmitry, and Jones, Rosie. Online Learning with Pairwise Loss Functions, 2013. arXiv:1301.5332.
- Xing, Eric P., Ng, Andrew Y., Jordan, Michael I., and Russell, Stuart J. Distance Metric Learning with Application to Clustering with Side-Information. In NIPS, pp. 505-512, 2002.
- Zhao, Peilin, Hoi, Steven C. H., Jin, Rong, and Yang, Tianbao. Online AUC Maximization. In ICML, pp. 233-240, 2011.
- Zinkevich, Martin. Online Convex Programming and Generalized Infinitesimal Gradient Ascent. In ICML, pp. 928-936, 2003.