Faster Directions for Second Order SMO
2010, Lecture Notes in Computer Science
https://doi.org/10.1007/978-3-642-15822-3_4…
2 pages
1 file
Sign up for access to the world's latest research
Abstract
Second order SMO represents the state-of-the-art in SVM training for moderate size problems. In it, the solution is attained by solving a series of subproblems which are optimized w.r.t just a pair of multipliers. In this paper we will illustrate how SMO works in a two stage fashion, setting first the values of the bounded multipliers to the penalty factor C and proceeding then to adjust the non-bounded multipliers. Furthermore, during this second stage the selected pairs for update often appear repeatedly during the algorithm. Taking advantage of this, we shall propose a procedure to combine previously used descent directions that results in much fewer iterations in this second stage and that may also lead to noticeable savings in kernel operations.
Related papers
2007 International Joint Conference on Neural Networks, 2007
The Support Vector Machine is a widely employed machine learning model due to its repeatedly demonstrated superior generalization performance. The Sequential Minimal Optimization (SMO) algorithm is one of the most popular SVM training approaches. SMO is fast, as well as easy to implement; however, it has a limited working set size (2 points only). Faster training times can result if the working set size can be increased without significantly increasing the computational complexity. In this paper, we extend the 2-point SMO formulation to a 4-point formulation and address the theoretical issues associated with such an extension. We show that modifying the SMO algorithm to increase the working set size is beneficial in terms of the number of iterations required for convergence, and shows promise for reducing the overall training time.
IEEE Transactions on Neural Networks, 2008
Global convergence of the sequential minimal optimization (SMO) algorithm for support vector regression (SVR) is studied in this paper. Given l training samples, SVR is formulated as a convex quadratic programming problem with l pairs of variables. We prove that if two pairs of variables violating the optimality condition are chosen for update in each step and subproblems are solved in a certain way then the SMO algorithm always stops within a finite number of iterations after finding an optimal solution. Also, efficient implementation techniques for the SMO algorithm are presented and compared experimentally with other SMO algorithms.
We propose a refined gradient ascent method including heuristic parameters for solving the dual problem of nonlinear SVM. Aiming to get better tuning to the particular training sequence, the proposed refinement consists of the use of heuristically established weights in correcting the search direction at each step of the learning algorithm that evolves in the feature space. We propose three variants for computing the correcting weights, their effectiveness being analyzed on experimental basis in the final part of the paper. The tests pointed out good convergence properties, and moreover, the proposed modified variants proved higher convergence rates as compared to Platt’s SMO algorithm. The experimental analysis aimed to derive conclusions on the recognition rate as well as on the generalization capacities. The learning phase of the SVM involved linearly separable samples randomly generated from Gaussian repartitions and the WINE and WDBC datasets. The generalization capacities in c...
2006
Abstract Decomposition methods are currently one of the major methods for training support vector machines. They vary mainly according to different working set selections. Existing implementations and analysis usually consider some specific selection rules. This paper studies sequential minimal optimization type decomposition methods under a general and flexible way of choosing the two-element working set.
ArXiv, 2021
Typically, nonlinear Support Vector Machines (SVMs) produce significantly higher classification quality when compared to linear ones but, at the same time, their computational complexity is prohibitive for large-scale datasets: this drawback is essentially related to the necessity to store and manipulate large, dense and unstructured kernel matrices. Despite the fact that at the core of training a SVM there is a simple convex optimization problem, the presence of kernel matrices is responsible for dramatic performance reduction, making SVMs unworkably slow for large problems. Aiming to an efficient solution of large-scale nonlinear SVM problems, we propose the use of the Alternating Direction Method of Multipliers coupled with Hierarchically Semi-Separable (HSS) kernel approximations. As shown in this work, the detailed analysis of the interaction among their algorithmic components unveils a particularly efficient framework and indeed, the presented experimental results demonstrate ...
International Journal of Computer Applications, 2010
Training a support vector machine (SVM) leads to a quadratic optimization problem with bound constraints and one linear equality constraint. Despite the fact that this type of problem is well understood, there are many issues to be considered in designing an SVM learner. In particular, for large learning tasks with many training examples, off-the-shelf optimization techniques for general quadratic programs quickly become intractable in their memory and time requirements. Here we propose an algorithm which aims at reducing the learning time, this algorithm is based on the decomposition method proposed by Osuna dedicated to optimizing SVMs: it divides the original optimization problem into sub problems computable by the machine in terms of CPU time and memory storage, the obtained solution is in practice more parsimonious than that found by the approach of Osuna in terms of learning time quality, while offering similar performances.
2007 European Conference on Power Electronics and Applications, 2007
Authors would like to acknowledge the partial financial support of the Junta de Castilla y León under grants VA004B06 and VA021B06.
2008 IEEE International Conference on Systems, Man and Cybernetics, 2008
Maximizing the classification performance of the training data is a typical procedure in training a classifier. It is well known that training a Support Vector Machine (SVM) requires the solution of an enormous quadratic programming (QP) optimization problem. Serious challenges appeared in the training dilemma due to immense training and this could be solved using Sequential Minimal Optimization (SMO). This paper investigates the performance of SMO solver in term of CPU time, number of support vector and decision boundaries when applied in a 2-dimensional datasets. Next, the chunking algorithm is employed for comparison purpose. Initial results demonstrated that the SMO algorithm could enhance the performance of the training dataset. Both algorithms illustrated similar patterns from the decision boundaries attained. Classification rate achieved by both solvers are superb.
Lecture Notes in Computer Science, 2008
We propose a new algorithm for training a linear Support Vector Machine in the primal. The algorithm mixes ideas from non smooth optimization, subgradient methods, and cutting planes methods. This yields a fast algorithm that compares well to state of the art algorithms. It is proved to require O(1/λ ) iterations to converge to a solution with accuracy . Additionally we provide an exact shrinking method in the primal that allows reducing the complexity of an iteration to much less than O(N ) where N is the number of training samples.
Neurocomputing, 2009
Fast SVM training is an important goal for which many proposals have been given in the literature. In this work we will study from a geometrical point of view the presence, in both the Mitchell-Demyanov-Malozemov (MDM) algorithm and Platt's Sequential Minimal Optimization, of training cycles, that is, the repeated selection of some concrete updating patterns. We shall see how to take advantage of these cycles by partially collapsing them in a single updating vector that gives better minimizing directions. We shall numerically illustrate the resulting procedure, showing that it can lead to substantial savings in the number of iterations and kernel operations for both algorithms.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.