Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.

Log In
Sign Up

Figure 2 – uploaded by varun Satish

See full PDF downloadDownload figure

where a kernel function, K(x, x;), is applied to allow all necessary computations to be performed directly in the input space (a kernel function K(x;, x;) is a function of the inner product between x; and x,, thus it transforms the computation of inner product < “b(x), (x;)> to that of <x;,x;>). Conceptually, the kernel functions map the original data into a higher-dimension space and make the input data set linearly separable in the transformed space. The choice of kernel functions is highly appli- cation-dependent and it is the most important factor in support vector machine applications. Vapnik [45] showed how training a support vector machine for pattern recognition leads to a quadratic optimization problem with bound constraints and one linear equality constraint (Eq. (2)). The quadratic optimization problem belongs to a type of problem that we understand very well, and because the number of training examples determines the size of the prob- lem, using standard quadratic problem solvers will easily make the computation impossible for large size training sets. Different solutions have been proposed on solving the quadratic programming problem in SVM by utilizing its special properties. These strate- gies include gradient ascent methods, chunking and decomposition and Platt’s Sequential Minimal Optimi- zation (SMO) algorithm (which extended the chunking approach to the extreme by updating two parameters at a time) [35]. — Figure 2 where a kernel function, K(x, x;), is applied to allow all necessary computations to be performed directly in the input space (a kernel function K(x;, x;) is a function of the inner product between x; and x,, thus it transforms the computation of inner product < “b(x), (x;)> to that of <x;,x;>). Conceptually, the kernel functions map the original data into a higher-dimension space and make the input data set linearly separable in the transformed space. The choice of kernel functions is highly appli- cation-dependent and it is the most important factor in support vector machine applications. Vapnik [45] showed how training a support vector machine for pattern recognition leads to a quadratic optimization problem with bound constraints and one linear equality constraint (Eq. (2)). The quadratic optimization problem belongs to a type of problem that we understand very well, and because the number of training examples determines the size of the prob- lem, using standard quadratic problem solvers will easily make the computation impossible for large size training sets. Different solutions have been proposed on solving the quadratic programming problem in SVM by utilizing its special properties. These strate- gies include gradient ascent methods, chunking and decomposition and Platt’s Sequential Minimal Optimi- zation (SMO) algorithm (which extended the chunking approach to the extreme by updating two parameters at a time) [35].

Related Figures (9)

Prior bond rating prediction studies using Artificial Intelligence techniques BP: Backpropagation Neural Networks, RBS: Rule-based System, ACLS: Analog Concept Learning System, RBF: Radial Basis Function, LVQ: Learning Vector Quantization, CBR: Case-based Reasoning, GA: Genetic Algorithm, MDA: Multiple Discriminant Analysis, LinR: Linear Regression, LogR: Logistic Regression, OPP: Ordinary Pairwise Partitioning. Sample size: Training/tuning/testing.

The standard SVM formulation solves only the binary classification problem, so we need to use several binary classifiers to construct a multi-class classifier or make fundamental changes to the original formulation to consider all classes at the same time. Hsu and Lin’s recent paper [20] compared several methods for multi-class support vector machines and concluded that the “one-against-one” and DAG meth- ods are more suitable for practical uses. We used the software package Hsu and Lin provided, BSVM, for our study. We experimented with different SVM parameter settings on the credit rating data, including the noise level C and different kernel functions (including the linear, polynomial, radial basis and sigmoid function). We also experimented with differ- ent approaches for multi-class classification using SVM [20]. The final SVM setting used Crammer and Singer’s formulation [6] for multi-class SVM classification, a radial basis kernel function (K(x; x) =en Yee, y=0.1)', and C with a value of 1000. This setting achieved a relatively better perfor- mance on the credit rating data set. This extended formulation leads to the dual prob- lem described in Eq. (4).

Ratings in the two data sets Table 2

Prediction accuracies (LogR: logistic regression model, SVM: support vector machines, NN: neural networks)

Within-1-class accuracy results

Table 5

work. We use the following notations to describe the two measures. Consider a neural network with / input units, J hidden units, and K output units. The connec- tion strengths between input, hidden and output layers are denoted as w;; and v,, where i=1,...,j=1,...J and k=1, ... K. Garson’s measure of relative contri- bution of input 7 on output k is defined as Eq. (5) and Yoon et al.’s measure is defined as Eq. (6). Garson’s method places more emphasis on the connection strengths from the hidden layer to the output layer, but it does not measure the direction of the influence. The two methods both measure the relative contribution of each input variable to each of the output units. The number of output nodes in our neural network architecture was the number of bond- rating classes. Thus, the contribution measures de- scribed above will evaluate the contribution of each input variables to each of the bond-rating classes. This brought some problems to interpret Yoon’s contribu- tion measures. The direction of the influence of an input financial variable may be different across bond- rating classes. With relatively large number of bond- rating classes, the results we obtained were too complicated to permit interpreting the contribution measures and did not improve understanding of the bond-rating process. On the other hand, the contribu- tion analysis results from Garson’s method showed that input variables made similar contributions to different bond-rating classes, which allowed us to understand the relative importance of different input financial variables in our neural network models. We summarize the results we obtained using Garson’s method in Fig. 1.

Fig. 1. Financial variable contribution based on Garson’s measure.

Related topics:

Machine Learning Credit Rating MARKET ANALYSIS Decision Support Systems (DSS)

Connect with 287M+ leading minds in your field

Discover breakthrough research and expand your academic network

Explore
Papers
Topics

Features
Mentions
Analytics
PDF Packages
Advanced Search
Search Alerts

Journals
Academia.edu Journals
My submissions
Reviewer Hub
Why publish with us
Testimonials

Company
About
Careers
Press
Help Center
Terms
Privacy
Copyright
Content Policy

580 California St., Suite 400

San Francisco, CA, 94104

© 2025 Academia. All rights reserved