Academia.eduAcademia.edu

Rule Induction

description954 papers
group41 followers
lightbulbAbout this topic
Rule induction is a machine learning technique that involves the extraction of useful if-then rules from data. It aims to create a model that can predict outcomes based on input features by identifying patterns and relationships within the dataset.
lightbulbAbout this topic
Rule induction is a machine learning technique that involves the extraction of useful if-then rules from data. It aims to create a model that can predict outcomes based on input features by identifying patterns and relationships within the dataset.

Key research themes

1. How do search strategies and heuristics influence rule learning performance and over-searching in inductive rule induction?

This research theme investigates the impact of different search strategies—hill-climbing, beam search, and exhaustive search—and rule evaluation heuristics on the performance and characteristics of rule induction algorithms. It addresses the over-searching phenomenon, where increasing search effort may deteriorate learning performance, by examining the interplay between search mechanisms and heuristics. Understanding this interaction is critical for optimizing rule learning algorithms to balance theory size, predictive accuracy, and rule generality.

Key finding: This study demonstrated that the traditionally observed over-searching phenomenon in inductive rule learning depends significantly on the choice of heuristic evaluation function. Exhaustive search tends to find longer but... Read more
Key finding: This paper analyzed key rule learning heuristics—m-estimate, F-measure, and Klösgen measures—characterizing how each parametrically manages the trade-off between rule consistency (accuracy on covered examples) and coverage... Read more
Key finding: RuleKit exemplifies a flexible, scalable sequential covering rule induction system that supports extensive customization of rule quality measures (over 40), including user-guided induction and multi-threaded execution.... Read more

2. What methodologies enable effective rule extraction from complex black-box models, particularly support vector machines, enhancing interpretability without compromising performance?

A key challenge in machine learning is extracting comprehensible symbolic rules from high-performance but opaque models like support vector machines (SVMs). This theme explores learning-based and decompositional approaches for rule extraction that convert SVM decision boundaries into human-readable rules, facilitating trust, explanation, and validation especially in high-stakes domains such as medicine. The theme includes evaluation of techniques that treat SVMs as black boxes and generate rule sets approximating SVM predictions while maintaining accuracy.

Key finding: This work presented a novel learning-based method for extracting symbolic classification rules from SVMs by treating the SVM as a black box to generate labeled examples, which are then used to train rule-based learners like... Read more
Key finding: This study applied rule induction algorithms (e.g., AQ, CN2, RIPPER) to detect faults from test results on uniform random samples of software configurations. Evaluations on large-scale datasets demonstrate that rule learning... Read more
Key finding: By integrating prior knowledge as existing rule sets and user constraints into the rule induction process, this work proposed a two-step approach of generating rule seeds and specializing them to obtain more accurate rules.... Read more

3. How can constructive induction and complex condition formulation extend the expressivity and predictive capacity of rule induction algorithms?

Traditional rule induction algorithms typically generate rules with simple logical conditions, which may limit their ability to capture complex relationships in data. This theme investigates methodologies for constructive induction—creating new features or complex rule conditions such as M-of-N combinations—and how these enhance the descriptive and predictive capabilities of rule learning. The research also addresses practical aspects such as heuristic control and knowledge-driven user guidance to manage combinatorial explosion and improve model interpretability.

Key finding: This paper proposed a multistrategy constructive induction framework combining data-driven and hypothesis-driven inference alongside expanding and contracting operations in representation space. The approach simultaneously... Read more
Key finding: The proposed methodology incorporates expert knowledge to guide constructive induction by suggesting new composite features that augment original datasets. By iteratively augmenting data with user-defined features and... Read more
Key finding: This study introduced an extension to sequential covering rule induction algorithms allowing complex and M-of-N conditions in rule premises by analyzing frequent sets of elementary conditions. The approach effectively induced... Read more

All papers in Rule Induction

Intrusion detection systems rely on a wide variety of observable data to distinguish between legitimate and illegitimate activities. In this paper we study one such observablesequences of system calls into the kernel of an operating... more
The CN2 algorithm induces an ordered list of classi cation rules from examples using entropy as its search heuristic. In this short paper, we describe two improvements to this algorithm. Firstly, we present the use of the Laplacian error... more
This paper describes an approach being explored to improve the usefulness of machine learning techniques for generating classification rules for complex, real world data. The approach involves the use of genetic algorithms as a "front... more
Vulnerabilities in common security components such as firewalls are inevitable. Intrusion Detection Systems (IDS) are used as another wall to protect computer systems and to identify corresponding vulnerabilities. In this paper, a novel... more
Classification is one of the fundamental tasks of data mining. Most rule induction and decision tree algorithms perform local, greedy search to generate classification rules that are often more complex than necessary. Evolutionary... more
One of the main obstacles facing current intelligent pattern recognition applications is that of dataset dimensionality. To enable these systems to be e ective, a redundancy-removing step is usually carried out beforehand. Rough set... more
We present a general rule induction algorithm based on sequential covering, suitable for variable consistency rough set approaches. This algorithm, called VC-DomLEM, can be used for both ordered and non-ordered data. In the case of... more
The integration of Landsat TM and environmental GIS data sets using artificial intelligence rule-induction and decision-tree analysis is shown to facilitate the production of vegetation maps with both floristic and structural infermation.... more
The user has requested enhancement of the downloaded file.
This is a review paper, whose goal is to significantly improve our understanding of the crucial role of attribute interaction in data mining. The main contributions of this paper are as follows. Firstly, we show that the concept of... more
Learning models to classify rarely occurring target classes is an important problem with applications in network intrusion detection, fraud detection, or deviation detection in general. In this paper, we analyze our previously proposed... more
In this chapter we provide a broad overview of selected knowledge management, data mining, and text mining techniques and their use in various emerging biomedical applications. It aims to set the context for subsequent chapters. We first... more
An approach is introduced to combine survey data with multi-agent simulation models of consumer behaviour to study the diffusion process of organic food consumption. This methodology is based on rough set theory, which is able to... more
We propose a new fuzzy rough set approach which, differently from most known fuzzy set extensions of rough set theory, does not use any fuzzy logical connectives (t-norm, t-conorm, fuzzy implication). As there is no rationale for a... more
Inductive characterizations of the sets of terms, the subset of strongly normalizing terms and normal forms are studied in order to reprove weak and strong normalization for the simplytyped λ-calculus and for an extension by sum types... more
Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open ended by forcing us to consider a large number of independent... more
Model transformation (MT) has become an important concern in software engineering. In addition to its role in model-driven development, it is useful in many other situations such as measurement, refactoring, and test-case generation.... more
A dynamic model of group performance is suggested that combines the group learning approach and the combination of contributions approach. Three hypotheses are tested in two experiments, comparing individual training conditions with mixed... more
The inconsistency of information about objects may be the greatest obstacle to performing inductive learning from examples. Rough sets theory provides a new mathematical tool to deal with uncertainty and vagueness. Based on rough sets... more
Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug... more
A rule quality measure is important to a rule induction system for determining when to stop generalization or specialization. Such measures are also important to a rule-based classification procedure for resolving conflicts among rules.... more
In this paper we report on the results of a European survey on business/ICT alignment practices. The goal of this study is to come up with some practical guidelines for managers on how to strive for better alignment of ICT investments... more
Due to the increase in the amount of relational data that is being collected and the limitations of propositional problem definition in relational domains, multi-relational data mining has arisen to be able to extract patterns from... more
Background: Human African trypanosomiasis (HAT), also known as sleeping sickness, is a parasitic tropical disease. It progresses from the first, haemolymphatic stage to a neurological second stage due to invasion of parasites into the... more
Traditionally researchers have used statistical methods to predict medical outcomes. However, statistical techniques do not provide sufficient in-formation for solving problems of high complexity. Recently more attention has turned to a... more
Inductive definitions and rule inductions are two fundamental reasoning tools in logic and computer science. When inductive definitions involve binders, then Barendregt's variable convention is nearly always employed (explicitly or... more
Researchers have embraced a variety of machine learning (ML) techniques in their efforts to improve the quality of learning programs. The recent evolution of hybrid architectures for machine learning systems has resulted in several... more
Multi-relational data mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. Several relational knowledge discovery systems have... more
Soil microbial ecology plays a significant role in global ecosystems. Nevertheless, methods of model prediction and mapping have yet to be established for soil microbial ecology. The present study was undertaken to develop an... more
This paper describes ,experiments ,with ,a challenging data set describing preterm births. The data set, collected at the Duke University Medical Center, was large and, at the same time, many attribute values were missing. However, the... more
Tissue microarrays (TMAs) are a new high-throughput tool for the study of protein expression patterns in tissues and are increasingly used to evaluate the diagnostic and prognostic importance of biomarkers. TMA data are rather challenging... more
Background: Pathway discovery from gene expression data can provide important insight into the relationship between signaling networks and cancer biology. Oncogenic signaling pathways are commonly inferred by comparison with signatures... more
To manage information like ontology, we usually use categorization with concept hierarchy. Such concept hierarchies are managed individual for each system due to the many differences in concept hierarchies. Consequently, it is difficult... more
We use Backward Chaining Rule Induction (BCRI), a novel data mining method for hypothesizing causative mechanisms, to mine lung cancer gene expression array data for mechanisms that could impact survival. Initially, a supervised learning... more
A fundamental requirement of any task-oriented dialogue system is the ability to generate object descriptions that refer to objects in the task domain. The subproblem of content selection for object descriptions in task-oriented dialogue... more
We study rule induction from two decision tables as a basis of rough set analysis of more than one decision tables. We regard the rule induction process as enumerating minimal conditions satisfied with positive examples but unsatisfied... more
Traditional classification techniques such as decision trees and RIPPER use heuristic search methods to find a small subset of patterns. In recent years, a promising new approach that mainly uses association rule mining in classification... more
In this paper we introduce a method for computing fitness in evolutionary learning systems based on NVIDIA's massive parallel technology using the CUDA library. Both the match process of a population of classifiers against a training set... more
A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer-revisable form. In intelligent systems for paper document processing this information... more
This paper focuses on automated procedures to reduce the dimensionality of protein structure prediction datasets by simplifying the way in which the primary sequence of a protein is represented. The potential benefits of this procedure to... more
In this paper we describe an open source tool for automatic induction of transfer rules. Transfer rule induction is carried out on pairs of dependency structures and their node alignment to produce all rules consistent with the node... more
by F. Biscarri and 
1 more
This paper proposes a comprehensive framework to detect non-technical losses (NTLs) and recover electrical energy (lost by abnormalities or fraud) by means of a data mining analysis, in the Spanish Power Electric Industry. It is divided... more
Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid... more
Download research papers for free!