Papers by Jose Maria Luna Ariza

Association rule mining is one of the most common data mining techniques used to identify and des... more Association rule mining is one of the most common data mining techniques used to identify and describe interesting relationships between patterns from large datasets, the frequency of an association being defined as the number of transactions that it satisfies. In situations where each transaction includes an undetermined number of instances (customers shopping habits where each transaction represents a different customer having a varied number of instances), the problem cannot be described as a traditional association rule mining problem. The aim of this work is to discover robust and useful patterns from multiple instance datasets, that is, datasets where each transaction may include an undetermined number of instances. We propose a new problem formulation in the data mining framework: multiple-instance association rule mining. The problem definition, an algorithm to tackle the problem, the application fields, and the relations' quality measures are formally described. Experimental results reveal the scalability of the problem on different data dimensionality. Finally, we apply it to two real-world applications field: (1) analysis of financial data gathered from one of the most important banks in Lithuania; (2) study of existing relations between records of unemployed gathered from the Spanish public employment service.

Given a database of records, it might be possible to identify small subsets of data which distrib... more Given a database of records, it might be possible to identify small subsets of data which distribution is exceptionally different from the distribution in the complete set of data records. Finding such interesting relationships, which we call exceptional relationships, in an automated way would allow discovering unusual or exceptional hidden behavior. In this paper we formulate the problem of mining exceptional relationships as a special case of exceptional model mining, and propose a grammar guided genetic programming algorithm (MERG3P) that enables the discovery of any exceptional relationships. In particular, MERG3P can work directly not only with categorical, but also with numerical data. In the experimental evaluation, we conduct a case study on mining exceptional relations between well-known and widely used quality measures of association rules, which exceptional behavior would be of interest to pattern mining experts. For this purpose, we constructed a dataset comprising a wide range of values for each considered association rule quality measure, such that possible exceptional relations between measures could be discovered. Thus, besides the actual validation of MERG3P, we found that the support and leverage measure in fact are negatively correlated under certain conditions, while in general experts in the field expect these measures to be positively correlated.

The extraction of patterns of interest and associations between them have been a major research t... more The extraction of patterns of interest and associations between them have been a major research topic since its definition at the beginning of the nineties. Abundant research studies have been dedicated to this field, providing overwhelming progresses in both efficiency and scalability, and extracting patterns from different data structures and domains. Since pattern mining is the keystone of data analysis, many application fields and, specially, numerous researchers have focused their attention on the discovery of patterns and associations that describe and represent any type of homogeneity and regularity in data. The growing scope of applications of pattern mining has deep impact on pattern mining models based on data domains, data dimensionality, data comprehensibility and data flexibility. All of this provides new and challenging research issues that need to be solved, broaden new research lines and leaving early pattern mining problems that can be considered as solved already.
Books by Jose Maria Luna Ariza

Pattern Mining with Evolutionary Algorithms
This book provides a comprehensive overview of the field of pattern mining with evolutionary algo... more This book provides a comprehensive overview of the field of pattern mining with evolutionary algorithms. To do so, it covers formal definitions about patterns, patterns mining, type of patterns and the usefulness of patterns in the knowledge discovery process. As it is described within the book, the discovery process suffers from both high runtime and memory requirements, especially when high dimensional datasets are analyzed. To solve this issue, many pruning strategies have been developed. Nevertheless, with the growing interest in the storage of information, more and more datasets comprise such a dimensionality that the discovery of interesting patterns becomes a challenging process. In this regard, the use of evolutionary algorithms for mining pattern enables the computation capacity to be reduced, providing sufficiently good solutions.
This book offers a survey on evolutionary computation with particular emphasis on genetic algorithms and genetic programming. Also included is an analysis of the set of quality measures most widely used in the field of pattern mining with evolutionary algorithms. This book serves as a review of the most important evolutionary algorithms for pattern mining. It considers the analysis of different algorithms for mining different type of patterns and relationships between patterns, such as frequent patterns, infrequent patterns, patterns defined in a continuous domain, or even positive and negative patterns.
A completely new problem in the pattern mining field, mining of exceptional relationships between patterns, is discussed. In this problem the goal is to identify patterns which distribution is exceptionally different from the distribution in the complete set of data records. Finally, the book deals with the subgroup discovery task, a method to identify a subgroup of interesting patterns that is related to a dependent variable or target attribute. This subgroup of patterns satisfies two essential conditions: interpretability and interestingness.
Drafts by Jose Maria Luna Ariza

—The growing interest in data storage has made the data size to be exponentially increased, hampe... more —The growing interest in data storage has made the data size to be exponentially increased, hampering the process of knowledge discovery from these large volumes of high-dimensional and heterogeneous data. In recent years, many efficient algorithms for mining data associations have been proposed, facing up time and main memory requirements. Nevertheless, this mining process could still become hard when the number of items and records is extremely high. In this paper, the goal is not to propose new efficient algorithms but a new data structure that could be used by a variety of existing algorithm without modifying its original schema. Thus, our aim is to speed up the association rule mining process regardless the algorithm used to this end, enabling the performance of efficient implementations to be enhanced. The structure simplifies, reorganizes and speeds up the data access by sorting data by means of a shuffling strategy based on the hamming distance, which achieve similar values to be closer, and considering both an inverted index mapping and a run length encoding compression. In the experimental study, we explore the bounds of the algorithms' performance by using a wide number of datasets that comprise either thousands and millions of both items and records. The results demonstrate the utility of the proposed data structure in enhancing the algorithms' runtime orders of magnitude, and substantially reducing both the auxiliary and the main memory requirements.
Uploads
Papers by Jose Maria Luna Ariza
Books by Jose Maria Luna Ariza
This book offers a survey on evolutionary computation with particular emphasis on genetic algorithms and genetic programming. Also included is an analysis of the set of quality measures most widely used in the field of pattern mining with evolutionary algorithms. This book serves as a review of the most important evolutionary algorithms for pattern mining. It considers the analysis of different algorithms for mining different type of patterns and relationships between patterns, such as frequent patterns, infrequent patterns, patterns defined in a continuous domain, or even positive and negative patterns.
A completely new problem in the pattern mining field, mining of exceptional relationships between patterns, is discussed. In this problem the goal is to identify patterns which distribution is exceptionally different from the distribution in the complete set of data records. Finally, the book deals with the subgroup discovery task, a method to identify a subgroup of interesting patterns that is related to a dependent variable or target attribute. This subgroup of patterns satisfies two essential conditions: interpretability and interestingness.
Drafts by Jose Maria Luna Ariza