Academia.eduAcademia.edu

Outline

Large Margin Classification for Moving Targets

2002, Lecture Notes in Computer Science

https://doi.org/10.1007/3-540-36169-3_11

Abstract

We consider using online large margin classification algorithms in a setting where the target classifier may change over time. The algorithms we consider are Gentile's Alma, and an algorithm we call Norma which performs a modified online gradient descent with respect to a regularised risk. The update rule of Alma includes a projectionbased regularisation step, whereas Norma has a weight decay type of regularisation. For Alma we can prove mistake bounds in terms of the total distance the target moves during the trial sequence. For Norma, we need the additional assumption that the movement rate stays sufficiently low uniformly over time. In addition to the movement of the target, the mistake bounds for both algorithms depend on the hinge loss of the target. Both algorithms use a margin parameter which can be tuned to make them mistake-driven (update only when classification error occurs) or more aggressive (update when the confidence of the classification is below the margin). We get similar mistake bounds both for the mistakedriven and a suitable aggressive tuning. Experiments on artificial data confirm that an aggressive tuning is often useful even if the goal is just to minimise the number of mistakes.

References (15)

  1. P. Auer, N. Cesa-Bianchi and C. Gentile. Adaptive and self-confident on-line learning algorithms. Technical Report NC-TR-00-083, NeuroCOLT, 2000.
  2. P. Auer and M. K. Warmuth. Tracking the best disjunction. Machine Learning, 32(2):127-150, August 1998.
  3. P. Bartlett and J. Shawe-Taylor. Generalization performance of support vector machines and other pattern classifiers. In B. Schölkopf, C. J. C. Burges and A. J. Smola, editors, Advances in Kernel Methods: Support Vector Learning, pages 43- 54. MIT Press, 1999.
  4. L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Physics, 7:200-217, 1967.
  5. Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277-296, 1999.
  6. C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213-242, December 2001.
  7. C. Gentile and N. Littlestone. The robustness of the p-norm algorithms. In Proc. 12th Annu. Conf. on Comput. Learning Theory, pages 1-11. ACM Press, New York, NY, 1999.
  8. A. J. Grove, N. Littlestone and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43(3):173-210, 2001.
  9. M. Herbster. Learning additive models online with fast evaluating kernels. In D. Helmbold and B. Williamson, editors, Proc. 14th Annu. Conf. on Comput. Learning Theory, pages 444-460. Springer LNAI 2111, Berlin, July 2001.
  10. M. Herbster and M. K. Warmuth. Tracking the best linear predictor. Journal of Machine Learning Research, 1:281-309, September 2001.
  11. J. Kivinen, A. J. Smola and R. C. Williamson. Online learning with kernels. In T. G. Dietterich, S. Becker and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 785-792. MIT Press, Cambridge, MA, 2002.
  12. Y. Li and P. M. Long. The relaxed online maximum margin algorithm. Machine Learning, 46(1):361-387, January 2002.
  13. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear- threshold algorithm. Machine Learning, 2(4):285-318, 1988.
  14. C. Mesterharm. Tracking linear-threshold concepts with Winnow. In J. Kivinen and B. Sloan, editors, Proc. 15th Annu. Conf. on Comput. Learning Theory, pages 138-152. Springer LNAI 2375, Berlin, July 2002.
  15. A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume 12, pages 615-622. Polytechnic Institute of Brooklyn, 1962.