A Survey of Feature Selection Techniques
Encyclopedia of Data Warehousing and Mining, Second Edition
https://doi.org/10.4018/978-1-60566-010-3.CH289…
3 pages
1 file
Sign up for access to the world's latest research
Abstract
Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data mining algorithms (Maimon and Last, 2000). The main reason for this is that data mining algorithms are computationally intensive. This obstacle is sometimes known as the “curse of dimensionality” (Bellman, 1961). The objective of Feature Selection is to identify features in the data-set as important, and discard any other feature as irrelevant and redundant information. Since Feature Selection reduces the dimensionality of the data, data mining algorithms can be operated faster and more effectively by using Feature Selection. In some cases, as a result of feature selection, the performance of the data mining method can be improved. The reason for that is mainly a more compact, easily interpreted representation of the target concept. The filter approach (Kohavi , 1995; Kohavi and John ,1996) operates independently of the data mining method emp...
Related papers
Bulletin de la Société Royale des Sciences de Liège, 2016
At first, one of the dimension reduction techniques so called feature selection is explained. The Concepts, principles and existing feature selection methods for classification and clustering are also described. Then, a categorizing framework consisting of the procedures of finding selected subsets, including Search-based procedures and non-search based, evaluation criteria and data mining tasks will be completed and developed. During the grouping of Feature selection algorithms, categorizing framework represent guidelines to select appropriate algorithm(s) for each application. In categorizing, similar algorithms which follow the same process of selected subset finding and have the same evaluation criteria, are placed in the one block. Empty blocks indicates that no algorithm has been designed for them and this is a motive to design new algorithm.
Feature selection is the process of eliminating features from the data set that are irrelevant with respect to the task to be performed. Feature selection is important for many reasons such as simplification, performance, computational efficiency and feature interpretability. It can be applied to both supervised and unsupervised learning methodologies. Such techniques are able in improving the efficiency of various machine learning algorithms and that of training as well. Feature selection speed up the run time of learning, improves data quality and data understanding.
MATEC Web of Conferences, 2016
Feature Subset Selection is an essential pre-processing task in Data Mining. Feature selection process refers to choosing subset of attributes from the set of original attributes. This technique attempts to identify and remove as much irrelevant and redundant information as possible. In this paper, a new feature subset selection algorithm based on conditional mutual information approach is proposed to select the effective feature subset. The effectiveness of the proposed algorithm is evaluated by comparing with the other well-known existing feature selection algorithms using standard datasets from UC Iravine and WEKA (Waikato Environment for Knowledge Analysis). The performance of the proposed algorithm is evaluated by multi-criteria that take into account not only the classification accuracy but also number of selected features. This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits distribution, and reproduction in any medium, provided the original work is properly cited.
In context of the substantial number of existing feature selection algorithms, the need arises to count on criteria that enables to adequately decide which algorithm to use in certain situations. This work reviews several fundamental algorithms found in the literature and accesses their performance in a controlled scenario. A scoring measure ranks the algorithm taking into account the amount of relevance and redundancy on sample data sets. This measure computes the degree of matching between the output given by the algorithm and the known optimal solution
Improving Knowledge Discovery through the Integration of Data Mining Techniques
The key objective of the chapter would be to study the classification accuracy, using feature selection with machine learning algorithms. The dimensionality of the data is reduced by implementing Feature selection and accuracy of the learning algorithm improved. We test how an integrated feature selection could affect the accuracy of three classifiers by performing feature selection methods. The filter effects show that Information Gain (IG), Gain Ratio (GR) and Relief-f, and wrapper effect show that Bagging and Naive Bayes (NB), enabled the classifiers to give the highest escalation in classification accuracy about the average while reducing the volume of unnecessary attributes. The achieved conclusions can advise the machine learning users, which classifier and feature selection methods to use to optimize the classification accuracy, and this can be important, especially at risk-sensitive applying Machine Learning whereas in the one of the aim to reduce costs of collecting, processing and storage of unnecessary data.
2008
Abstract The use of feature selection can improve accuracy, efficiency, applicability and understandability of a learning process and the resulting learner. For this reason, many methods of automatic feature selection have been developed. By using the modularization of feature selection process, this paper evaluates a wide spectrum of these methods and some additional ones created by combination of different search and measure modules.
International Journal of Signal Processing, Image Processing and Pattern Recognition, 2014
The interest and focus for quite some time has been on Feature Selection and lot of work has been made in this field. With databases getting larger in volume so machine learning techniques are required which results in demand for feature selection. Feature selection is commonly used method for performing data mining in the field of data preprocessing that is scaled on large amount of data sets. In this paper, several kinds of feature selection methods are used which may result in different subsets of features with evaluation criterion.
International Journal of Computer Applications Technology and Research, 2016
In recent years, application of feature selection methods in medical datasets has greatly increased. The challenging task in feature selection is how to obtain an optimal subset of relevant and non redundant features which will give an optimal solution without increasing the complexity of the modeling task. Thus, there is a need to make practitioners aware of feature selection methods that have been successfully applied in medical data sets and highlight future trends in this area. The findings indicate that most existing feature selection methods depend on univariate ranking that does not take into account interactions between variables, overlook stability of the selection algorithms and the methods that produce good accuracy employ more number of features. However, developing a universal method that achieves the best classification accuracy with fewer features is still an open research area.
2010
Abstract The rapid advance of computer technologies in data processing, collection, and storage has provided unparalleled opportunities to expand capabilities in production, services, communications, and research. However, immense quantities of high-dimensional data renew the challenges to the state-of-the-art data mining techniques. Feature selection is an effective technique for dimension reduction and an essential step in successful data mining applications.
2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015
Feature selection (FS) methods can be used in data pre-processing to achieve efficient data reduction. This is useful for finding accurate data models. Since exhaustive search for optimal feature subset is infeasible in most cases, many search strategies have been proposed in literature. The usual applications of FS are in classification, clustering, and regression tasks. This review considers most of the commonly used FS techniques. Particular emphasis is on the application aspects. In addition to standard filter, wrapper, and embedded methods, we also provide insight into FS for recent hybrid approaches and other advanced topics. I.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.