Learning with idealized kernels
2003
Abstract
The kernel function plays a central role in kernel methods. Existing methods typically fix the functional form of the kernel in advance and then only adapt the associated kernel parameters based on empirical data. In this paper, we consider the problem of adapting the kernel so that it becomes more similar to the so-called ideal kernel. We formulate this as a distance metric learning problem that searches for a suitable linear transform (fcature weighting) in the kernel-induced feature space. This formulation is applicable even when the training set can only provide examples of similar and dissimilar pairs, but not explicit class label information. Computationally, this leads to a local-optima-free quadratic programming problem, with the number of variables independent of the number of features. Performance of this method is evaluated on classification and clustering tasks on both toy and real-world data sets.
References (17)
- Amari, S., & Wu, S. (1999). Improving support vector machine classifiers by modifying kernel functions. Neural Networks, 12, 783-789.
- Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support v.ector machines. Machine Learning, 46, 131-159.
- Crammer, K., Keshet, J., & Singer, Y. (2003). Kernel design using boosting. Advances in Neural Informa- tion Processing Systems 15. Cambridge, MA: MIT Press.
- Cristianini, N., Kandola, J., Elisseeff, A., & Shawe- Taylor, J. (2002a). On kernel target alignment. Sub- mitted to Journal of Machine Learning Research.
- Cristianini, N., & Shawe-Taylor, J. (2000). An in- troduction to support vector machines. Cambridge University Press.
- Cristianini, N., Shawe-Taylor, J., Elisseeff, A., & Kandola, J. (2002b). On kernel-target alignment. Advances in Neural Information Processing Sys- tems 14. Cambridge, MA: MIT Press.
- Grandvalet, Y., & Canu, S. (2003). Adaptive scaling for feature selection in SVMs. Advances in Neu- ral Information Processing Systems 15. Cambridge, MA: MIT Press.
- Lanckriet, G., Cristianini, N., Bartlett, P., E1 Ghaoui, L., & Jordan, M. (2002). Learning the kernel matrix with semi-definite programming. Proceedings of the International Conference on Machine Learning (pp. 323-330).
- Ong, C., Smola, A., & Williamson, R. (2003). Superk- ernels. Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press.
- Rowels, S., & Saul, L. (2000). Nonlinear dimensional- ity reduction by locally linear embedding. Science, 290, 2323-2326.
- Sch61kopf, B., & Smola, A. (2002). Learning with ker- nels. MIT.
- Sch61kopf, B., Smola, A., Williamson, R., & Bartlett, P. (2000). New support vector algorithms. Neural Computation, 12, 1207-1245.
- Sollich, P. (2002). Bayesian methods for support vector machines: Evidence and predictive class probabili- ties. Machine Learning, 46, 21-52.
- Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
- Wettschereck, D., Aha, D., & Mohri, T. (1997). review and empirical evaluation of feature weight- ing methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11, 273-314.
- Xing, E., Ng, A., Jordan, M., & Russell, S. (2003). Dis- tance metric learning, with application to clustering with side-information. Advances in Neural In.forma- tion Processing Systems 15. Cambridge, MA: MIT Press.
- Zhang, Z., Kwok, J., Yeung, D., & Wang, W. (2002). A novel distance-based classifier using convolution kernels and Euclidean embeddings (Technical Report HKUST-CS02-28). Department of Computer Sci- ence, Hong Kong University of Science and Tech- nology.