Lecture 11: Support Vector Machines 11.1 Overview
2004
Sign up for access to the world's latest research
Abstract
1 11.1 Overview In this lecture we will describe methods for performing a binary classification task on a linearly non-separable data by the means of linear classification. We first explore the linear problem and the mathematical methods used to solve this problem. We then perform a generalization that will allow us to deal with more complex, noisy or non-linear situations, by embedding the input data into a higher dimensional feature-space, in which the data is separable (the concept is demonstrated in Figure 11.1). This will be accomplished a “kernel trick”
Related papers
We explain the support vector machine algorithm, and its extension the kernel method, for machine learning using small datasets. We also briefly discuss the Vapnik-Chervonenkis theory which forms the theoretical foundation of machine learning. This review is based on lectures given by the second author.
Wiley Interdisciplinary Reviews: Computational Statistics, 2009
Support vector machines (SVMs) are a family of machine learning methods, originally introduced for the problem of classification and later generalized to various other situations. They are based on principles of statistical learning theory and convex optimization, and are currently used in various domains of application, including bioinformatics, text categorization, and computer vision. 2009 John Wiley & Sons, Inc. WIREs Comp Stat 2009 1 283-289 S upport vector machines (SVMs), introduced by
IEEE transactions on neural networks and learning systems, 2013
The paper considers the classification problem using Support Vector Machines, and investigates how to maximally reduce the size of the training set without losing information. Under separable dataset assumptions, we derive the exact conditions stating which observations can be discarded without diminishing the overall information content. For this purpose, we introduce the concept of Potential Support Vectors, i.e., those data that can become Support Vectors when future data become available. Complementary, we also characterize the set of Discardable Vectors, i.e., those data that, given the current dataset, can never become Support Vectors. These vectors are thus useless for future training purposes, and can eventually be removed without loss of information. We then provide an efficient algorithm based on linear programming which returns the potential and discardable vectors by constructing a simplex tableau. Finally we compare it with alternative algorithms available in the literature on some synthetic data as well as on datasets from standard repositories.
Neurocomputing, 2003
Support vector machines (SVMs) are currently a very active research area within machine learning. Motivated by statistical learning theory, SVMs have been successfully applied to numerous tasks, among others in data mining, computer vision, and bioinformatics. SVMs are examples of a broader category of learning approaches which utilize the concept of kernel substitution, which makes the task of learning more tractable by exploiting an implicit mapping into a high-dimensional space. SVMs have many appealing properties for machine learning. For example, the classic SVM learning task involves convex quadratic programming, a problem that does not su er from the 'local minima' problem and whose solution may easily be found by using one of the many specially e cient algorithms developed for it in the optimization theory. Furthermore, recently developed model selection strategies can be applied, so that few, if any, learning parameters need to be set by the operator. Above all, they have been found to work very well in practice.
Support vector machines (SVMs) appeared in the early nineties as optimal margin classifiers in the context of Vapnik's statistical learning theory. Since then SVMs have been successfully applied to real-world data analysis problems, often providing improved results compared with other techniques. The SVMs operate within the framework of regularization theory by minimizing an empirical risk in a well-posed and consistent way. A clear advantage of the support vector approach is that sparse solutions to classification and regression problems are usually obtained: only a few samples are involved in the determination of the classification or regression functions. This fact facilitates the application of SVMs to problems that involve a large amount of data, such as text processing and bioinformatics tasks. This paper is intended as an introduction to SVMs and their applications, emphasizing their key features. In addition, some algorithmic extensions and illustrative real-world applications of SVMs are shown.
Classifying biological data is a common task in the biomedical context. Predicting the class of new, unknown information allows researchers to gain insight and make decisions based on the available data. Also, using classification methods often implies choosing the best parameters to obtain optimal class separation, and the number of parameters might be large in biological datasets. Support Vector Machines provide a well-established and powerful classification method to analyse data and find the minimal-risk separation between different classes. Finding that separation strongly depends on the available feature set and the tuning of hyper-parameters. Techniques for feature selection and SVM parameters optimization are known to improve classification accuracy, and its literature is extensive. In this paper we review the strategies that are used to improve the classification performance of SVMs and perform our own experimentation to study the influence of features and hyper-parameters in the optimization process, using several known kernels.
1999
In this report we show some simple properties of SVM for regression. In particular we show that for close to zero, minimizing the norm of w is equivalent to maximizing the distance between the optimal approximating hyperplane solution of SVMR and the closest points in the data set. So, in this case, there exists a complete analogy between SVM for regression and classi cation, and the -tube plays the same role as the margin between classes. Moreover we show that for every the set of support vectors found by SVMR is linearly separable in the feature space and the optimal approximating hyperplane is a separator for this set. As a consequence, we show that for every regression problem there exists a classi cation problem which is linearly separable in the feature space. This is due to the fact that the solution of SVMR separates the set of support vectors in two classes: the support vectors living above and the one living below the optimal approximating hyperplane solution of SVMR. The position of the support vectors with respect to the hyperplane is given by the sign of ( i ? i ). Finally, we present a simple algorithm for obtaining a sparser representation of the optimal approximating hyperplane by using SVM for classi cation.
IEEE Transactions on Cybernetics, 2014
We propose a novel nonparallel classifier, named nonparallel support vector machine (NPSVM), for binary classification. Totally different with the existing nonparallel classifiers, such as the generalized eigenvalue proximal support vector machine (GEPSVM) and the twin support vector machine (TWSVM), our NPSVM has several incomparable advantages: (1) Two primal problems are constructed implementing the structural risk minimization principle; (2) The dual problems of these two primal problems have the same advantages as that of the standard SVMs, so that the kernel trick can be applied directly, while existing TWSVMs have to construct another two primal problems for nonlinear cases based on the approximate kernel-generated surfaces, furthermore, their nonlinear problems can not degenerate to the linear case even the linear kernel is used; (3) The dual problems have the same elegant formulation with that of standard SVMs and can certainly be solved efficiently by sequential minimization optimization (SMO) algorithm, while existing GEPSVM or TWSVMs are not suitable for large scale problems; (4) It has the inherent sparseness as standard SVMs; (5) Existing TWSVMs are only the special cases of the NPSVM when the parameters of which are appropriately chosen. Experimental results on lots of data sets show the effectiveness of our method in both sparseness and classification accuracy, and therefore confirm the above conclusion further. In some sense, our NPSVM is a new starting point of nonparallel classifiers.
Support Vector Machines have acquired a central position in the field of Machine Learning and Pattern Recognition in the past decade and have known to deliver state-of-the-art performance in applications such as text categorization, hand-written character recognition, bio-sequence analysis, etc. In this article we provide a gentle introduction into the workings of Support Vector Machines (also known as SVMs) and attempt to provide some insight into the learning mechanisms involved. We begin with a general introduction to mathematical learning and move on to discuss the learning framework used by the SVM architecture.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.