Figure 2 where a kernel function, K(x, x;), is applied to allow all necessary computations to be performed directly in the input space (a kernel function K(x;, x;) is a function of the inner product between x; and x,, thus it transforms the computation of inner product < “b(x), (x;)> to that of <x;,x;>). Conceptually, the kernel functions map the original data into a higher-dimension space and make the input data set linearly separable in the transformed space. The choice of kernel functions is highly appli- cation-dependent and it is the most important factor in support vector machine applications. Vapnik [45] showed how training a support vector machine for pattern recognition leads to a quadratic optimization problem with bound constraints and one linear equality constraint (Eq. (2)). The quadratic optimization problem belongs to a type of problem that we understand very well, and because the number of training examples determines the size of the prob- lem, using standard quadratic problem solvers will easily make the computation impossible for large size training sets. Different solutions have been proposed on solving the quadratic programming problem in SVM by utilizing its special properties. These strate- gies include gradient ascent methods, chunking and decomposition and Platt’s Sequential Minimal Optimi- zation (SMO) algorithm (which extended the chunking approach to the extreme by updating two parameters at a time) [35].