Academia.eduAcademia.edu

Rule Extraction

description1,133 papers
group51 followers
lightbulbAbout this topic
Rule extraction is a process in machine learning and data mining that involves deriving interpretable rules or patterns from trained models, particularly complex ones like neural networks or ensemble methods. This technique aims to enhance model transparency and facilitate understanding of decision-making processes by translating model behavior into human-readable rules.
lightbulbAbout this topic
Rule extraction is a process in machine learning and data mining that involves deriving interpretable rules or patterns from trained models, particularly complex ones like neural networks or ensemble methods. This technique aims to enhance model transparency and facilitate understanding of decision-making processes by translating model behavior into human-readable rules.

Key research themes

1. How can rule extraction methods enhance interpretability and explainability of black-box machine learning models such as SVMs and neural networks?

This research area focuses on extracting interpretable symbolic rules from complex, opaque machine learning classifiers like Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs). Such rule extraction improves user trust and acceptance, especially in critical domains like medical diagnosis, by providing an explanation of how a decision is reached. Multiple approaches treat the rule extraction as a learning task or analyze internal components to generate human-understandable rules. These methods aim to maintain predictive accuracy while increasing transparency and enabling better data exploration and knowledge refinement.

Key finding: This paper proposes a two-step learning-based method for extracting rules from SVMs by first training an SVM classifier and then using its predictions to train a decision tree learning system that produces explicit rules. The... Read more
Key finding: The study enhances the Recursive-Rule Extraction (Re-RX) algorithm by replacing C4.5 with J48graft to generate more concise and accurate classification rules when extracting interpretative knowledge from neural networks... Read more
Key finding: The paper introduces a novel metaheuristic-based approach using Variable Neighbourhood Search (VNS) to extract accurate and intelligible classification rules from trained artificial neural networks without modifying the ANN... Read more
Key finding: This work demonstrates that it is feasible to automatically extract IF-THEN rules directly from raw textual data using natural language processing techniques and a custom developed program. By transforming free-form text into... Read more
Key finding: Using decision tree classifiers on texture features extracted from brain MRI lesions, the study successfully derives explicit classification rules that differentiate Multiple Sclerosis subjects with benign versus advanced... Read more

2. What approaches enable automated extraction of business, manufacturing, and legal rules from unstructured or legacy data sources?

This theme investigates methodologies for automatically extracting structured and formalized rules from diverse, traditionally unstructured data sources including legacy databases, informal documents, regulatory texts, and manufacturing process descriptions. The goal is to reduce manual knowledge acquisition effort, improve business process understanding, and enable compliance automation by transforming textual or database artifacts into machine-interpretable business vocabularies and rules. The papers cover ontology-based acquisitions, constraint-based modeling with NLP, natural language processing for informal and regulatory documents, and legacy system mining techniques, often utilizing semantic technologies or heuristic algorithms.

Key finding: This work presents a heuristic mapping from legacy database components — tables, procedures, and forms — to extract structural business rules in SBVR format from a complex industrial financial system (SIFI). Evaluated by... Read more
Key finding: The proposed ID2SBVR approach automatically extracts operational business rules and vocabulary from informal natural language documents (e.g., interview transcripts) by mining fact type candidates using triplet extraction... Read more
Key finding: The paper introduces a feedback generation framework based on Constraint-based Modeling (CBM) integrated with NLP and domain ontology to validate and refine input text for formal manufacturing rule extraction. This method... Read more
Key finding: This work demonstrates a linguistically oriented, rule-based pipeline for identifying and extracting conditional and deontic rules—including antecedents, consequences, agents, and exceptions—from English regulatory text.... Read more
Key finding: The paper proposes an automatic rule acquisition procedure that leverages rule ontologies (RuleToOnto) to extract rules from similar web sites in the same domain by identifying rule components and composing rules via an A*... Read more

3. How can grammar-based and rule-based methods be developed for extracting semantic and structural rules from complex data, including graphs and multi-word terms, to support advanced natural language processing and sound design?

This theme addresses the extraction of formal grammatical or linguistic rules from highly structured or complex data forms, such as semantic meaning representations (graphs), multi-word lexical terms, and parameter spaces in generative systems. The goal is to develop methods that facilitate parsing, formal representation, and interpretability of semantic structures or parameter influences. Included are graph grammar extraction algorithms enabling polynomial-time parsing, rule-based lexical multi-word term extraction combined with lemmatization for inflected languages, and rule extraction for perceptual generalization in sound design. These methods enable improved semantic understanding and automation in applications ranging from NLP to computational music.

Key finding: The authors present polynomial-time algorithms for extracting Hyperedge Replacement Grammar (HRG) rules from graphs under fixed vertex orders through tree decompositions of minimal width. This enables parsing of semantic... Read more
Key finding: This paper presents a rule-based approach using lexical resources and finite-state transducers to extract and lemmatize multi-word terms (MWTs) in highly inflected languages (Serbian). The system processes large corpora,... Read more
Key finding: Using the Rulex algorithm, this study extracted interpretable linguistic rules from perceptually labeled parameter presets in sound design systems to generalize successful parameter combinations producing specific perceptual... Read more
Key finding: The authors propose a methodology for exploring high-dimensional parameter spaces of algorithmic composition systems through interactive human evaluation, followed by compaction of input-output relationships into... Read more
Key finding: This paper uses Beta regression to extract simplified near-optimal control rules from intensive offline dynamic programming optimization results for heating load shifting in electric residential buildings. The... Read more

All papers in Rule Extraction

Rough Non-deterministic Information Analysis (RNIA) is a rough set-based data analysis framework for Nondeterministic Information Systems (NISs). RNIA-related algorithms and software tools developed so far for rule generation provide good... more
E-learning offers a new context for education where large amounts of information describing the continuum of the teaching-learning interactions are endlessly generated and ubiquitously available. But raw information by itself may be of no... more
E-learning offers a new context for education where large amounts of information describing the continuum of the teaching-learning interactions are endlessly generated and ubiquitously available. But raw information by itself may be of no... more
Genetic algorithm is one of the commonly used approaches on data mining. In this paper, we put forward a genetic algorithm approach for classification problems. Binary coding is adopted in which an individual in a population consists of a... more
Artificial neural networks is one of the most commonly used machine learning algorithms in medical applications. However, they are still not used in practice in the clinics partly due to their lack of explanatory capacity. We compare two... more
The neural networks are successfully applied to many applications in different domains. However, due to the results made by the neural networks are difficult to explain the decision process of neural networks is supposed as a black box.... more
This paper presents a new approach to fuzzy rulebased modeling of nonlinear systems from numerical data. The novelty of the approach lies in the way of input partitioning and in the syntax of the rules. This paper introduces interpretable... more
Back-propagation learning (BP) is known for its serious limitations in generalising knowledge from certain types of learning material. BP-SOM is an extension of BP which overcomes some of these limitations. BP-SOM is a combination of a... more
Recent years have shown the need of an automated p r ocess to discover interesting and hidden patterns in real-world databases, handling large volumes of data. This sort of process implies a lot of computational power, memory and disk I... more
In a text-to-speech system, a transcription of each word can be either retrieved from the dictionary, or generated by rules or some statistical means. Though the dictionary-based approach can produce the most accurate result, a... more
Through conceptual examples and demonstrations, we argue that the symbiotic combination of the Internet and humans will result in a significant enhancement of the previously existing, self-organizing social structure of humans. The... more
Classification is one of the data mining problems receiving great attention recently in the database community. This paper presents an approach to discover symbolic classification rules using neural networks. Neural networks have not been... more
Although backpropagation neural networks generally predict better than decision trees do for pattern classification problems, they are of-ten regarded as black boxes, ie, their predic-tions are not as interpretable as those of deci-sion... more
Classi cation, which involves nding rules that partition a given data set into disjoint groups, is one class of data mining problems. Approaches proposed so far for mining classication rules for large databases are mainly decision tree... more
Previously, we hypothesized that overfitting occurs in the brain networks of the patients with autism. In this study, we implement this idea in an artificial neural network (ANN). A set of tasks, named configural grouping tests, used by... more
Land cover classification using multispectral satellite image is a very challenging task with numerous practical applications. We propose a multi-stage classifier that involves fuzzy rule extraction from the training data and then... more
Most methods of fuzzy rule-based system identification (SI) either ignore feature analysis or do it in a separate phase. This paper proposes a novel neuro-fuzzy system that can simultaneously do feature analysis and SI in an integrated... more
Neural networks have been applied in various domain including science, commerce, medicine, and industry. However, The knowledge learned by a trained neural network is difficult to understand. This paper proposes a Boolean algebra based... more
A recognized impediment to the more widespread utilization of Artificial Neural Networks (ANNs) is the absence of a capability to explain, in a humancomprehensible form, either the process by which a trained ANN arrives at a specific... more
Artificial Neural Networks (ANNs) are often viewed as black box. This limits the comprehensive understanding on how it deals with input neuron/data, as well as how it reached a particular decision. Input significance analysis (ISA) refers... more
This paper presents a technique to improve the accuracy of the predictions obtained using the Rough Set Theory (RST) in non-deterministic cases (rough cases). The RST is here applied to the data collected by the Intelligent Field Devices... more
The problem of relevance and the usefulness of extracted association rules is becoming of primary importance, since an overwhelming number of association rules may be derived. This paper proposes an algorithm, called GenAll, to build a... more
Artificial Neural Networks (ANNs) are able, in general and in principle, to learn complex tasks. Interpretation of models induced by ANNs, however, is often extremely difficult due to the non-linear and non-symbolic nature of the models.... more
In France, 40 % of buildings are heated with electrical devices causing high peak load in winter. In this context, advanced control systems could improve buildings energy management. More specifically, optimal strategies have been... more
In France, 40 % of buildings are heated with electrical devices causing high peak load in winter. In this context, optimal strategies (under constraints related to comfort and maximum heating power) have been developed using the dynamic... more
Promoter sequences are well known to play a central role in gene expression. Their recognition and assignment in silico has not consolidated into a general bioinformatics method yet. Most previously available algorithms employ and are... more
Les études concernant les méthodes d'extraction de règle de classification proposent de classer une instance, le plus souvent en se basant sur une seule règle, sans prendre en considération les règles du même type ayant un label de classe... more
This paper presents a technique to improve the accuracy of the predictions obtained using the Rough Set Theory (RST) in non-deterministic cases (rough cases). The RST is here applied to the data collected by the Intelligent Field Devices... more
In the paper, the problem of extraction of complex decision rules in simple decision systems over ontological graphs is considered. The extracted rules are consistent with the dominance principle similar to that applied in the... more
A novel ant colony algorithm, mass recruitment and group recruitment based continuous ant colony optimization (MG-CACO), is proposed to solve continuous optimization problems. MG-CACO, which can capture the interdependencies between... more
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more
The wave of innovation unleashed by the first user-friendly PCs in the 1980's and the Web in the 1990's seems to have gotten drowned in complexity and confusion. Software developers are scrambling to keep their systems up-to-date with all... more
Cet article presente une application de la classification multi-labels pour la prediction des diagnostics secondaires a partir d’un diagnostic primaire connu (et d’informations complementaires) dans les donnees du PMSI(Programme de... more
Neurocognitive approach to higher cognitive functions that bridges the gap between psychological and neural level of description is introduced. Relevant facts about the brain, working memory and representation of symbols in the brain are... more
Recently there has been a lot of interest in the extraction of symbolic rules from neural networks. The work described in this paper is concerned with an evaluation and comparison of the accuracy and complexity of symbolic rules extracted... more
Bootstrapping techniques have significant potential for the efficient generation of linguistic resources such as electronic pronunciation dictionaries. We describe a system and an approach to bootstrapping for the development of such... more
The gene expression control is a fundamental process in cellular activities, performed through the interaction of multiple regulatory mechanisms. The proper regulation of transcription is crucial for a single-cell prokaryote since its... more
The gene expression control is a fundamental process in cellular activities, performed through the interaction of multiple regulatory mechanisms. The proper regulation of transcription is crucial for a single-cell prokaryote since its... more
Due to their capability of dealing with nonlinear In the last years, Formal Concept Analysis (FCA) has been pro- problems, Artificial Neural Networks (ANN) are widely used with posed as a powerful tool in knowledge representation and ex-... more
Neurocognitive approach to higher cognitive functions that bridges the gap between psychological and neural level of description is introduced. Relevant facts about the brain, working memory and representation of symbols in the brain are... more
The present disclosure is directed to a novel system for performing online reconfiguration of a neural network. Once a neural network has been implemented into a production environment, the system may use underlying construction logic to... more
The present patent is directed to a novel system for a self-constructing deep neural network. The system may comprise a hybrid logic library which contains the building structures needed to construct the neural network, which may include... more
Breakthroughs in information and communication technology establish payment-collection technologies like Debit and Credit card systems at the level where their rapid penetration in commercial market have led to an everlarger share of the... more
The paper focuses on the problem of rule extraction from neural networks, with the aim of transforming the knowledge captured in a trained neural network into a familiar form for human user. The ultimate purpose for us is to develop human... more
The overall purpose of this study is to develop a prototype radiologic consultation system. The system should provide a second diagnostic opinion based on similar cases, incorporating the experience of many radiologists, their diagnostic... more
The overall purpose of this study is to develop a prototype radiological consultation system. We concentrate our work on prototype software environment for the system. The system provides a second diagnostic opinion based on similar... more
Uses a parametric statistical framework to understand the effect of input representation on performance for nonlinear prediction of time series. In particular, considerations of input representation lead directly to choices between... more
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we present a set of techniques to mine rules from URLs and utilize... more
Neurocognitive approach to higher cognitive functions that bridges the gap between psychological and neural level of description is introduced. Relevant facts about the brain, working memory and representation of symbols in the brain are... more
Download research papers for free!