Academia.eduAcademia.edu

Rule Induction

description954 papers
group41 followers
lightbulbAbout this topic
Rule induction is a machine learning technique that involves the extraction of useful if-then rules from data. It aims to create a model that can predict outcomes based on input features by identifying patterns and relationships within the dataset.
lightbulbAbout this topic
Rule induction is a machine learning technique that involves the extraction of useful if-then rules from data. It aims to create a model that can predict outcomes based on input features by identifying patterns and relationships within the dataset.

Key research themes

1. How do search strategies and heuristics influence rule learning performance and over-searching in inductive rule induction?

This research theme investigates the impact of different search strategies—hill-climbing, beam search, and exhaustive search—and rule evaluation heuristics on the performance and characteristics of rule induction algorithms. It addresses the over-searching phenomenon, where increasing search effort may deteriorate learning performance, by examining the interplay between search mechanisms and heuristics. Understanding this interaction is critical for optimizing rule learning algorithms to balance theory size, predictive accuracy, and rule generality.

Key finding: This study demonstrated that the traditionally observed over-searching phenomenon in inductive rule learning depends significantly on the choice of heuristic evaluation function. Exhaustive search tends to find longer but... Read more
Key finding: This paper analyzed key rule learning heuristics—m-estimate, F-measure, and Klösgen measures—characterizing how each parametrically manages the trade-off between rule consistency (accuracy on covered examples) and coverage... Read more
Key finding: RuleKit exemplifies a flexible, scalable sequential covering rule induction system that supports extensive customization of rule quality measures (over 40), including user-guided induction and multi-threaded execution.... Read more

2. What methodologies enable effective rule extraction from complex black-box models, particularly support vector machines, enhancing interpretability without compromising performance?

A key challenge in machine learning is extracting comprehensible symbolic rules from high-performance but opaque models like support vector machines (SVMs). This theme explores learning-based and decompositional approaches for rule extraction that convert SVM decision boundaries into human-readable rules, facilitating trust, explanation, and validation especially in high-stakes domains such as medicine. The theme includes evaluation of techniques that treat SVMs as black boxes and generate rule sets approximating SVM predictions while maintaining accuracy.

Key finding: This work presented a novel learning-based method for extracting symbolic classification rules from SVMs by treating the SVM as a black box to generate labeled examples, which are then used to train rule-based learners like... Read more
Key finding: This study applied rule induction algorithms (e.g., AQ, CN2, RIPPER) to detect faults from test results on uniform random samples of software configurations. Evaluations on large-scale datasets demonstrate that rule learning... Read more
Key finding: By integrating prior knowledge as existing rule sets and user constraints into the rule induction process, this work proposed a two-step approach of generating rule seeds and specializing them to obtain more accurate rules.... Read more

3. How can constructive induction and complex condition formulation extend the expressivity and predictive capacity of rule induction algorithms?

Traditional rule induction algorithms typically generate rules with simple logical conditions, which may limit their ability to capture complex relationships in data. This theme investigates methodologies for constructive induction—creating new features or complex rule conditions such as M-of-N combinations—and how these enhance the descriptive and predictive capabilities of rule learning. The research also addresses practical aspects such as heuristic control and knowledge-driven user guidance to manage combinatorial explosion and improve model interpretability.

Key finding: This paper proposed a multistrategy constructive induction framework combining data-driven and hypothesis-driven inference alongside expanding and contracting operations in representation space. The approach simultaneously... Read more
Key finding: The proposed methodology incorporates expert knowledge to guide constructive induction by suggesting new composite features that augment original datasets. By iteratively augmenting data with user-defined features and... Read more
Key finding: This study introduced an extension to sequential covering rule induction algorithms allowing complex and M-of-N conditions in rule premises by analyzing frequent sets of elementary conditions. The approach effectively induced... Read more

All papers in Rule Induction

Present-day healthcare witnesses a growing demand for coordination of patient care. Coordination is needed especially in those cases in which hospitals have structured healthcare into specialtyoriented units, while a substantial portion... more
Currently, the data mining and machine learning fields are facing new challenges because of the amount of information that is collected and needs processing. Many sophisticated learning approaches cannot simply cope with large and complex... more
M achine learning is the study of computational methods for improving performance by mechanizing the acquisition of knowledge from experience. Expert performance requires much domain-specific knowledge, and knowledge engineering has... more
Small-sized hotels that prevail in the tourist destination of Serbia rarely use any kind of property management or intelligence systems. The issue that pervades throughout this paper is related to the ways in which they can benefit from... more
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal... more
We address the issue of on-line detection of communication problems in spoken dialogue systems. The usefulness is investigated of the sequence of system question types and the word graphs corresponding to the respective user utterances.... more
In this paper we study two types of machine learning techniques, rule-induction and memorybased learning, for error detection in spoken dialogue systems. The learners are trained and tested on two tasks: predicting whether the current... more
To facilitate the development of speech enabled applications and services, we have been working on an example-based semantic grammar authoring tool. Previous studies have shown that the tool has not only significantly reduced the grammar... more
The illegal use of electricity, defective meters, and a malfunctioning infrastructure are major causes of Non-Technical Losses (NTLs) in electric distribution systems. Although the use of supervised machine learning techniques to detect... more
Sensitivity to distributional characteristics of sequential linguistic and nonlinguistic stimuli, have been shown to play a role in learning the underlying structure of these stimuli. A growing body of experimental and computational... more
Fairness, Accountability, Transparency and Explainability have become strong requirements in most practical applications of Artificial Intelligence (AI). Fuzzy sets and systems are recognized world-wide because of their outstanding... more
The nested generalized exemplar theory accomplishes learning by storing objects in Euclidean n-space, as hyperrectangles. Classification of new data is performed by computing their distance to the nearest "generalized exemplar" or... more
This paper presents the development of the N-Spherical Minimalist Machine Learning (MML) classifier, an innovative model within the Minimalist Machine Learning paradigm. Using N-spherical coordinates and concepts from metaheuristics and... more
Discretization is the process of dividing a continuous-valued base attribute into discrete intervals, which highlight distinct patterns in the behavior of a re-lated goal attribute. In this paper, we present an in-tegrated visual... more
Acyclic conjunctive queries form a polyno-mially evaluable fragment of definite non-recursive first-order Horn clauses. Labeled graphs, a special class of relational struc-tures, provide a natural way for represent-ing chemical compounds.... more
The goal of this research was to examine mechanisms underlying early induction-specifically, the relation between induction and categorization. Some researchers argue that even early in development, induction is based on... more
In supervised learning the inductive algorithm seeks to develop a conceptual description, or prescriptive model, from examples or objects that have been pre-classified. On the other hand, in unsupervised learning, or clustering, the task... more
In supervised learning the inductive algorithm seeks to develop a conceptual description, or prescriptive model, from examples or objects that have been pre-classified. On the other hand, in unsupervised learning, or clustering, the task... more
In this paper we explore the effects of query and database size on news story classification performance. Memory Based Reasoning (MBR) (a k-nearest neighbor method) used as the classification method. There are 360 different possible... more
We live in a world submerged with more information than ever before. We express information or data mathematically and it is growing faster than ever. If the data is imperfect, out of context or otherwise contaminated, it can lead to... more
There are inherent open problems arising when developing and running Intelligent Environmental Decision Support Systems (IEDSS). During daily operation of IEDSS several open challenge problems appear. The uncertainty of data being... more
The exceptionally high virulence of COVID-19 and the patients' precondition seem to constitute primary factors in how pro-inflammatory cytokines production evolves during the course of an infection. We present a System Dynamics Model... more
Algorithms for induction of concept descriptions from examples are important tools in the fields of machine learning and knowledge discovery in databases. This paper presents an induction algorithm, named PA3, that learns a set of ordered... more
Inclusion of domain knowledge in a process of knowledge discovery in databases is a complex but very important part of successful knowledge discovery solutions. In real-life data mining development, non-structured domain knowledge... more
A productive way to think about imagistic mental models of physical systems is as though they were sources of quasi‐empirical evidence. People depict or imagine events at those points in time when they would experiment with the world if... more
Rule systems have failed to attract much interest in large data analysis problems because they tend to be too simplistic to be useful or consist of too many rules for human interpretation. We present a method that constructs a... more
Rule systems have failed to attract much interest in large data analysis problems because they tend to be too simplistic to be useful or consist of too many rules for human interpretation. We present a method that constructs a... more
Most rule induction algorithms generate rules with simple logical conditions based on equality or inequality relations. This feature limits their ability to discover complex dependencies that may exist in data. This article presents an... more
Employee turnover is a serious concern in knowledge based organizations. When employees leave an organization, they carry with them invaluable tacit knowledge which is often the source of competitive advantage for the business. In order... more
Genetic programming represents a flexible and powerful evolutionary technique in machine learning. The use of genetic programming for rule induction has generated interesting results in classification problems. This paper proposes an... more
This special issue of Knowledge Based Systems comprises expanded versions of the best papers submitted to the conference AI-2010, the thirtieth SGAI International Conference on Artificial Intelligence, which was held in Cambridge, England... more
Classification algorithms usually assume that any example in the training set should contribute equally to the classification model being generated. However, this is not always the case. This paper shows that the contribution of an... more
James Liley Samuel R. Emerson Bilal A. Mateen Catalina A. Vallejos Louis J. M. Aslett Sebastian J. Vollmer 1 Alan Turing Institute, London, UK; 2 MRC Human Genetics Unit, Univ. of Edinburgh, UK; 3 Department of Mathematical Sciences,... more
Lists of if±then rules (i.e. ordered rule sets) are among the most expressive and intelligible representations for inductive learning algorithms. Two extreme strategies searching for such a list of rules can be distinguished: (i) local... more
Effective information systems require the existence of explicit process models. A completely specified process design needs to be developed in order to enact a given business process. This development is time consuming and often... more
Present-day healthcare witnesses a growing demand for coordination of patient care. Coordination is needed especially in those cases in which hospitals have structured healthcare into specialtyoriented units, while a substantial portion... more
The paper presents the results of research related to the efficiency of the so-called rule quality measures which are used to evaluate the quality of rules at each stage of the rule induction. The stages of rule growing and pruning were... more
This article presents GuideR, a user-guided rule induction algorithm, which overcomes the largest limitation of the existing methods-the lack of the possibility to introduce user's preferences or domain knowledge to the rule learning... more
Rule-based models are often used for data analysis as they combine interpretability with predictive power. We present RuleKit, a versatile tool for rule learning. Based on a sequential covering induction algorithm, it is suitable for... more
This paper presents an implementation of bagging techniques over the heuristic algorithm for induction of classification rules called SA Tabu Miner (Simulated Annealing and Tabu Search data miner). The goal was to achieve better... more
The anomaly-based intrusion detection systems examine current system activity do find deviations from normal system activity. The present paper proposes a method for normal activity description using the Hidden Markov Models (HMM), which... more
In this paper a querying environment for analysis of patient clinical data is presented. The data consists of two parts: patients' pathological data and data about corresponding gene expression levels. The querying environment includes a... more
Based on multi-dominance discernibility matrices, a non-incremental algorithm RIDDM and an incremental algorithm INRIDDM are proposed by means of Dominance-based Rough Set Approach. For the incremental algorithm, when a new object... more
In this article a new approach to the formalization of inductive inference in terms of non-monotonic inference is proposed. Induction is characterized as closed-world reasoning from the available data, followed by an inductive jump, which... more
Healthcare systems generate a huge data collected from medical tests. Data mining is the computing process of discovering patterns in large data sets such as medical examinations. Blood diseases are not an exception; there are many test... more
Rule-based classifier, that extract a subset of induced rules to efficiently learn/mine while preserving the discernibility information, plays a crucial role in human-explainable artificial intelligence. However, in this era of big data,... more
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
To facilitate the development of speech enabled applications and services, we have been working on an example-based semantic grammar authoring tool. Previous studies have shown that the tool has not only significantly reduced the grammar... more
Shopping intention prediction using decision trees. Millenium, 2(4), 13-22. m 4 14 RESUMO Introdução: O preço é um elemento negligenciado na literatura em marketing devido à complexidade da sua gestão e sensibilidade dos clientes sobre as... more
RESUMO Introdução: O preço é um elemento negligenciado na literatura em marketing devido à complexidade da sua gestão e sensibilidade dos clientes sobre as mudanças de preços. Consequentemente, o processo de tomada de decisões de compra... more
Download research papers for free!