A Survey of Feature Selection Techniques

Oded Maimon

doi:10.4018/978-1-60566-010-3.CH289

Outline

Artificial Intelligence

A Survey of Feature Selection Techniques

Oded Maimon

Encyclopedia of Data Warehousing and Mining, Second Edition

https://doi.org/10.4018/978-1-60566-010-3.CH289

visibility

…

description

3 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data mining algorithms (Maimon and Last, 2000). The main reason for this is that data mining algorithms are computationally intensive. This obstacle is sometimes known as the “curse of dimensionality” (Bellman, 1961). The objective of Feature Selection is to identify features in the data-set as important, and discard any other feature as irrelevant and redundant information. Since Feature Selection reduces the dimensionality of the data, data mining algorithms can be operated faster and more effectively by using Feature Selection. In some cases, as a result of feature selection, the performance of the data mining method can be improved. The reason for that is mainly a more compact, easily interpreted representation of the target concept. The filter approach (Kohavi , 1995; Kohavi and John ,1996) operates independently of the data mining method emp...

Farhad Rad

Bulletin de la Société Royale des Sciences de Liège, 2016

At first, one of the dimension reduction techniques so called feature selection is explained. The Concepts, principles and existing feature selection methods for classification and clustering are also described. Then, a categorizing framework consisting of the procedures of finding selected subsets, including Search-based procedures and non-search based, evaluation criteria and data mining tasks will be completed and developed. During the grouping of Feature selection algorithms, categorizing framework represent guidelines to select appropriate algorithm(s) for each application. In categorizing, similar algorithms which follow the same process of selected subset finding and have the same evaluation criteria, are placed in the one block. Empty blocks indicates that no algorithm has been designed for them and this is a motive to design new algorithm.

downloadDownload free PDF View PDFchevron_right

A Review Paper on Feature Selection Methodologies and Their Applications

Madhvi Gaur

Feature selection is the process of eliminating features from the data set that are irrelevant with respect to the task to be performed. Feature selection is important for many reasons such as simplification, performance, computational efficiency and feature interpretability. It can be applied to both supervised and unsupervised learning methodologies. Such techniques are able in improving the efficiency of various machine learning algorithms and that of training as well. Feature selection speed up the run time of learning, improves data quality and data understanding.

downloadDownload free PDF View PDFchevron_right

Performance Comparison of Feature Selection Methods

Nyein Oo

MATEC Web of Conferences, 2016

Feature Subset Selection is an essential pre-processing task in Data Mining. Feature selection process refers to choosing subset of attributes from the set of original attributes. This technique attempts to identify and remove as much irrelevant and redundant information as possible. In this paper, a new feature subset selection algorithm based on conditional mutual information approach is proposed to select the effective feature subset. The effectiveness of the proposed algorithm is evaluated by comparing with the other well-known existing feature selection algorithms using standard datasets from UC Iravine and WEKA (Waikato Environment for Knowledge Analysis). The performance of the proposed algorithm is evaluated by multi-criteria that take into account not only the classification accuracy but also number of selected features. This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits distribution, and reproduction in any medium, provided the original work is properly cited.

downloadDownload free PDF View PDFchevron_right

Survey on Feature Selection Algorithms

Amartya Hatua

In context of the substantial number of existing feature selection algorithms, the need arises to count on criteria that enables to adequately decide which algorithm to use in certain situations. This work reviews several fundamental algorithms found in the literature and accesses their performance in a controlled scenario. A scoring measure ranks the algorithm taking into account the amount of relevance and redundancy on sample data sets. This measure computes the degree of matching between the output given by the algorithm and the known optimal solution

downloadDownload free PDF View PDFchevron_right

An Empirical Evaluation of Feature Selection Methods

Saira gillani Gillani

Improving Knowledge Discovery through the Integration of Data Mining Techniques

The key objective of the chapter would be to study the classification accuracy, using feature selection with machine learning algorithms. The dimensionality of the data is reduced by implementing Feature selection and accuracy of the learning algorithm improved. We test how an integrated feature selection could affect the accuracy of three classifiers by performing feature selection methods. The filter effects show that Information Gain (IG), Gain Ratio (GR) and Relief-f, and wrapper effect show that Bagging and Naive Bayes (NB), enabled the classifiers to give the highest escalation in classification accuracy about the average while reducing the volume of unnecessary attributes. The achieved conclusions can advise the machine learning users, which classifier and feature selection methods to use to optimize the classification accuracy, and this can be important, especially at risk-sensitive applying Machine Learning whereas in the one of the aim to reduce costs of collecting, processing and storage of unnecessary data.

downloadDownload free PDF View PDFchevron_right

Empirical study of feature selection methods in classification

Antonio Arauzo-Azofra

2008

Abstract The use of feature selection can improve accuracy, efficiency, applicability and understandability of a learning process and the resulting learner. For this reason, many methods of automatic feature selection have been developed. By using the modularization of feature selection process, this paper evaluates a wide spectrum of these methods and some additional ones created by combination of different search and measure modules.

downloadDownload free PDF View PDFchevron_right

A Study of Feature Subset Selection Methods for Dimension Reduction

sajid ali khan

International Journal of Signal Processing, Image Processing and Pattern Recognition, 2014

The interest and focus for quite some time has been on Feature Selection and lot of work has been made in this field. With databases getting larger in volume so machine learning techniques are required which results in demand for feature selection. Feature selection is commonly used method for performing data mining in the field of data preprocessing that is scaled on large amount of data sets. In this paper, several kinds of feature selection methods are used which may result in different subsets of features with evaluation criterion.

downloadDownload free PDF View PDFchevron_right

A Review on Feature Selection Methods For Classification Tasks

Mary Mwadulo

International Journal of Computer Applications Technology and Research, 2016

In recent years, application of feature selection methods in medical datasets has greatly increased. The challenging task in feature selection is how to obtain an optimal subset of relevant and non redundant features which will give an optimal solution without increasing the complexity of the modeling task. Thus, there is a need to make practitioners aware of feature selection methods that have been successfully applied in medical data sets and highlight future trends in this area. The findings indicate that most existing feature selection methods depend on univariate ranking that does not take into account interactions between variables, overlook stability of the selection algorithms and the methods that produce good accuracy employ more number of features. However, developing a universal method that achieves the best classification accuracy with fewer features is still an open research area.

downloadDownload free PDF View PDFchevron_right

Feature selection: An ever evolving frontier in data mining

Huan Liu

2010

Abstract The rapid advance of computer technologies in data processing, collection, and storage has provided unparalleled opportunities to expand capabilities in production, services, communications, and research. However, immense quantities of high-dimensional data renew the challenges to the state-of-the-art data mining techniques. Feature selection is an effective technique for dimension reduction and an essential step in successful data mining applications.

downloadDownload free PDF View PDFchevron_right

A review of feature selection methods with applications

N. Bogunovic

2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015

Feature selection (FS) methods can be used in data pre-processing to achieve efficient data reduction. This is useful for finding accurate data models. Since exhaustive search for optimal feature subset is infeasible in most cases, many search strategies have been proposed in literature. The usual applications of FS are in classification, clustering, and regression tasks. This review considers most of the commonly used FS techniques. Particular emphasis is on the application aspects. In addition to standard filter, wrapper, and embedded methods, we also provide insight into FS for recent hybrid approaches and other advanced topics. I.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

IJESRT Journal

Data mining is the process of analyzing data from different perspectives and summarizing it into useful information. Feature selection is one of the important techniques in data mining. It is used for selectin g the relevant features and removes the redundant features in dataset. Classification is a technique used for discovering classes of unknown data. Classification task leads to reduction of the dimensionality of feature space, feature selection process is u sed for selecting large set of features. This paper proposed various feature selection methods

downloadDownload free PDF View PDFchevron_right

A Survey on Various Feature Selection Methodologies

IJSRD Journal

The process of selecting features is important process in machine learning; this is method of selecting a subset of relevant/ significant variables and features. Feature selection is applicable in multiple areas such as anomaly detection, Bioinformatics, image processing, etc. where high dimensional data is generated. Analysis and classification of such big data is time consuming. Feature set selection is generally user for: to simplify model data set, reducing over fitting, increase efficiency of classifier. In this paper we have analyzed various techniques for extraction of features and feature subset collection. The main objective behind this research was to find a better algorithm for extraction of features and feature subset collection with efficiency. Subsequently, several methods for the extraction and selection of features have been suggested to attain the highest relevance.

downloadDownload free PDF View PDFchevron_right

Report: Feature Selection Techniques for Classification

Giorgio Roffo

ArXiv, 2016

In an era where accumulating data is easy and storing it inexpensive, feature selection plays a central role in helping to reduce the high-dimensionality of huge amounts of otherwise meaningless data. This report overviews concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification with a categorizing framework based on the complexity: filter, embedded, and wrappers methods. Some real-world applications are included to demonstrate the use of feature selection in data mining. As a result, the report proposes extensive tests on diverse datasets. We conclude this work by identifying trends and challenges of feature selection research and development while providing a toolbox of 10 methods selected from recent literature.

downloadDownload free PDF View PDFchevron_right

Survey on Feature Selection

Tarek Amr

Feature selection plays an important role in the data mining process. It is needed to deal with the excessive number of features, which can become a computational burden on the learning algorithms. It is also necessary, even when computational resources are not scarce, since it improves the accuracy of the machine learning tasks, as we will see in the upcoming sections. In this review, we discuss the different feature selection approaches, and the relation between them and the various machine learning algorithms. This report tries to compare between the existing feature selection approaches. I wrote this report as part of my MSc. degree in data mining program in the University of East Anglia.

downloadDownload free PDF View PDFchevron_right

Feature Selection in Data Mining

Nick Street

Opportunities and Challenges, 2003

downloadDownload free PDF View PDFchevron_right

Feature selection algorithms in classification problems: an experimental evaluation

Michael Doumpos

Optimization Methods and Software, 2007

Feature selection (FS) is a major issue in developing efficient pattern recognition systems. FS refers to the selection of the most appropriate subset of features that describes (adequately) a given classification task. The objective of this paper is to perform a thorough analysis of the performance and efficiency of feature selection algorithms (FSAs). The analysis covers a variety of important issues with respect to the functionality of FSAs, such as: (a) their ability to identify relevant features, (b) the performance of the classification models developed on a reduced set of features, (c) the reduction in the number of features, and (d) the interactions between different FSAs with the techniques used to develop a classification model. The analysis considers a variety of FSAs and classification methods.

downloadDownload free PDF View PDFchevron_right

A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction

Jwan N Saeed, Rizgar R . Zebari

Interdisciplinary Publishing Academia, 2020

Due to sharp increases in data dimensions, working on every data mining or machine learning (ML) task requires more efficient techniques to get the desired results. Therefore, in recent years, researchers have proposed and developed many methods and techniques to reduce the high dimensions of data and to attain the required accuracy. To ameliorate the accuracy of learning features as well as to decrease the training time dimensionality reduction is used as a pre-processing step, which can eliminate irrelevant data, noise, and redundant features. Dimensionality reduction (DR) has been performed based on two main methods, which are feature selection (FS) and feature extraction (FE). FS is considered an important method because data is generated continuously at an ever-increasing rate; some serious dimensionality problems can be reduced with this method, such as decreasing redundancy effectively, eliminating irrelevant data, and ameliorating result comprehensibility. Moreover, FE transacts with the problem of finding the most distinctive, informative, and decreased set of features to ameliorate the efficiency of both the processing and storage of data. This paper offers a comprehensive approach to FS and FE in the scope of DR. Moreover, the details of each paper, such as used algorithms/approaches, datasets, classifiers, and achieved results are comprehensively analyzed and summarized. Besides, a systematic discussion of all of the reviewed methods to highlight authors' trends, determining the method(s) has been done, which significantly reduced computational time, and selecting the most accurate classifiers. As a result, the different types of both methods have been discussed and analyzed the findings.

downloadDownload free PDF View PDFchevron_right

A Survey on Feature Selection Algorithms

Vimal Kumar Dubey, International Journal IJRITCC

Citation/Export MLA Dr. Amit Kumar Saxena, Vimal Kumar Dubey, “A Survey on Feature Selection Algorithms”, April 15 Volume 3 Issue 4 , International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), ISSN: 2321-8169, PP: 1895 - 1899, DOI: 10.17762/ijritcc2321-8169.160431 APA Dr. Amit Kumar Saxena, Vimal Kumar Dubey, April 15 Volume 3 Issue 4, “A Survey on Feature Selection Algorithms”, International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), ISSN: 2321-8169, PP: 1895 - 1899, DOI: 10.17762/ijritcc2321-8169.160431

downloadDownload free PDF View PDFchevron_right

Feature Selection for Classification

safiyeh zolfaghari

Intelligent Data Analysis, 1997

Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a comprehensive overview of many existing methods from the 1970's to the present. It identifies four steps of a typical feature selection method, and categorizes the different existing methods in terms of generation procedures and evaluation functions, and reveals hitherto unattempted combinations of generation procedures and evaluation functions. Representative methods are chosen from each category for detailed explanation and discussion via example. Benchmark datasets with different characteristics are used for comparative study. The strengths and weaknesses of different methods are explained. Guidelines for applying feature selection methods are given based on data types and domain characteristics. This survey identifies the future research areas in feature selection, introduces newcomers to this field, and paves the way for practitioners who search for suitable methods for solving domain-specific real-world applications.

downloadDownload free PDF View PDFchevron_right

JMLR Workshop and Conference Proceedings Volume 10: Feature Selection in Data Mining Proceedings of the Fourth International Workshop on Feature Selection in Data Mining, June 21st, 2010, Hyderabad, India

Robin Gras

2010

Knowledge discovery and data mining (KDD) is a multidisciplinary field that researches and develops theories, algorithms and software systems to mine gold nuggets of knowledge from data. The increasingly large data sets from many application domains have posed renewed challenges to KDD; in the meantime, new types of data are evolving such as social media, text, and microarray data. Researchers and practitioners in multiple disciplines and various IT sectors confront similar issues in feature selection, and there is still a pressing need for continued exchange and discussion of challenges and ideas, exploring new methodologies and innovative approaches in search of breakthroughs. Feature selection is effective in data preprocessing and reduction, thus is an essential step in successful data mining applications. Feature selection has been a research topic with practical significance in many areas such as statistics, pattern recognition, machine learning, and data mining (including Web, text, image, and microarrays). The objectives of feature selection include: building simpler and more comprehensible models, improving data mining performance, and helping prepare, clean, and understand data. The Workshop on Feature Selection in Data Mining (FSDM) aims to further the cross-discipline, collaborative effort in feature (a.k.a. variable) selection research and application. This year, FSDM is held with the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2010) in Hyderabad, India. FSDM'10 consists of one keynote speech and 8 peer-reviewed papers. in which four papers are on developing new algorithms or improving existing algorithms of feature selection; two papers on designing effective feature selection algorithms for real-world problems; and three papers on exploring novel problems in feature selection research. It has been an enjoyable journey for us to work together with program committee members and authors to make this workshop a reality. We would like to convey our immense gratitude to the PC members for spending their precious time helping review and select papers, and to all the authors for their contributions and efforts in generating the FSDM'10 proceedings. Last but not least, we would like to thank Neil Lawrence from JMLR, and the organizers of PAKDD for their guidance and help in producing the proceedings and in organizing this workshop.

downloadDownload free PDF View PDFchevron_right

Cited by

Binary spotted hyena optimizer and its application to feature selection

Avneet Kaur

Journal of Ambient Intelligence and Humanized Computing, 2019

Spotted hyena optimizer (SHO) is a recently developed metaheuristic technique that mimics the hunting behavior of the spotted hyenas. However, it does not provide optimal solution for discrete problems. Therefore, a novel binary version of Spotted Hyena Optimizer is proposed in this paper. The binary version of SHO can deal with discrete optimization problems. In the proposed algorithm, tangent hyperbolic function is utilized to squash the continuous position and then these values are used to update the position of spotted hyenas. The prey searching, encircling, and attacking are three main steps of binary spotted hyena optimizer. The proposed algorithm has been compared with six well-known metaheuristic techniques over 29 benchmark test functions. The effects of convergence, scalability, and control parameters have been investigated. The statistical significance of the proposed approach has also been examined through ANOVA test. The proposed approach is also applied on feature selection domain. The performance of proposed approach has been compared with four well-known metaheuristic techniques over eleven UCI repository datasets. The experimental results reveal that the proposed approach is able to search the optimal feature set than the others.

downloadDownload free PDF View PDFchevron_right

A Survey of Feature Selection Techniques

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics

Cited by