Mining Patterns in the Presence of Domain Knowledge
2009, Proceedings of the 11th International Conference on Enterprise Information
https://doi.org/10.5220/0001995001880193Abstract
One of the main difficulties of pattern mining is to deal with items of different nature in the same itemset, which can occur in any domain except basket analysis. Indeed, if we consider the analysis of any transactional database composed by several entities and relationships, it is easy to understand that the equality function may be different for each element, which difficult the identification of frequent patterns. This situation is just one example of the need for using domain knowledge to manage the discovery process, but several other, no less important can be enumerated, such the need to consider patterns at higher levels of abstraction or the ability to deal with structured data. In this paper, we show how the Onto4AR framework can be explored to overcome these situations in a natural way, illustrating its use in the analysis of two distinct case studies. In the first one, exploring a cinematographic dataset, we capture patterns that characterize kinds of movies in accordance to the actors present in their casts and their roles. In the second one, identifying molecular fragments, we find structured patterns, including chains, rings and stars. Pattern mining is a subtask of mining association rules, a problem that was formulated in 1993 in the context of basket analysis. Formally, let I={i 1 ,i 2 ,…,i m } be a set of m distinct liaterals, called items and X⊆I a subset of items, therefore known as itemset. Let D be a set of transactions, i.e., itemsets transacted in the same conditions, under a unique 188 Antunes C. (2009).
References (8)
- Agrawal, R., Imielinsky, T., and Swami, A. Mining Association Rules between Sets of Items in Large Databases. In Proc. ACM SIGMOD Conf Management of Data. 1993. 207-216
- Antunes, C., and Oliveira, A.L., Constraint Relaxations for Discovering Unknown Sequential Patterns. In Knowledge Discovery in Inductive Databases: Third International Workshop, Springer, 2005, 11-32
- Antunes, C. Onto4AR: a framework for mining association rules. In Proc. Int'l Workshop on Constraint-Based Mining and Learning, 2007. 37-48
- Antunes, C. An ontology-based method for mining frequent patterns. Technical report, Instituto Superior Técnico. 2008.
- Bayardo, R.J., The Many Roles of Constraints in Data Mining. In SIGKDD Explorations, vol. 4, nr. 1 pp. i-ii, 2002.
- Garofalakis, M.N., Rastogi, R., and Shim, K., SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. In Proc. Very Large Databases Conf. 1999, 223-234
- Maedche, A., Ontology Learning for the Semantic Web, Kluwer Academic Publishers, 2002.
- Wiederhold, G., Movies Database Documentation, 1989.