Skip to main content

Diane Cook

Followers

2

Following

1

Public Views

Interests

Uploads

Papers by Diane Cook

Discovering Structural Patterns in Telecommunications Data

The Florida AI Research Society, May 22, 2000

With the increasing amount and complexity of data being collected, there is an urgent need to cre... more With the increasing amount and complexity of data being collected, there is an urgent need to create automated techniques for mining the data. In particular, data being generated and stored by telecom companies overwhelms scientists' ability to manually discover patterns in the data. Because much of this data is structural in nature, or composed of parts and relations between the parts, linear attribute-value based algorithms will not capture all of the intricacies of the data. Hence, there exists a need to develop scalable tools to analyze and discover concepts in structural databases.

Coupling Two Complementary Knowledge Discovery Systems

The Florida AI Research Society, May 18, 1998

Most approaches to knowledge discovery concentrate on either an attribute-value representation or... more Most approaches to knowledge discovery concentrate on either an attribute-value representation or a structural data representation. The discover}, systems for these two representations are typically different, and their integration is non-trivial. We investigate a simpler integration of the two systems by coupling the two approaches. Our method first executes the structural discovery s}~tem on the data, and then uses these results to augment or compress the data before being input to the attribute-value-based system. We demonstrate this strategy using the AutoClass attribute-valuebased clustering system and the Subdue structural discovery system. The results of the demonstration show that coupling the two systems allows the discovery of knowledge imperceptible to either system alone.

Identifying Inhabitants of an Intelligent Environment Using a Graph-Based Data Mining System

The Florida AI Research Society, 2003

The goal of the MavHome smart home project is to build an intelligent home environment that is aw... more The goal of the MavHome smart home project is to build an intelligent home environment that is aware of its inhabitants and their activities. Such a home is designed to provide maximum comfort to inhabitants at minimum cost. This can be done by learning the activities of the inhabitants and to automate those activities. For this it is necessary to identify among multiple inhabitants who is currently present in the home. Subdue is a graph-based data mining algorithm that discovers patterns in structural data. By representing the activity patterns for each inhabitant as graphs, Subdue can be used for inhabitant identification. We introduce a multiple-class learning version of Subdue and show some preliminary results on synthetic smart home activity data for multiple inhabitants.

Enhancing Structure Discovery for Data Mining in Graphical Databases Using Evolutionary Programming

The Florida AI Research Society, May 14, 2002

The purpose of this paper is to develop an evolutionary programming based system that performs da... more The purpose of this paper is to develop an evolutionary programming based system that performs data mining on databases represented as graphs. The importance of such an endeavor can hardly be overemphasized, given that much of the data collected nowadays is structural in nature, or is composed of parts and relations between the parts, which can be naturally represented as graphs. The searching capability of evolutionary programming is utilized for discovering concepts or substructures that are often repeating in such structural data. The superiority of the proposed technique over the previously developed SUBDUE system , which uses a computationally constrained beam search in the space of substructures, is demonstrated for a number of data sets in the Web domain.

Improving scalability in a scientific discovery system by exploiting parallelism

The large amount of data collected today is quickly rwrcm.mrhnlminn rocolrrhara' Qhilitinc tn int... more The large amount of data collected today is quickly rwrcm.mrhnlminn rocolrrhara' Qhilitinc tn intornrot the ""l.l ."'Lx,"Y'1'& '~UCW~II~I,~ u,"#ll"lY" U" 'I~"~~'y".'" data and discover interesting patterns. Knowledge discovery and data mining approaches hold the potential to automate the interpretation process, but these approaches frequently utilize computationally expensive algorithms. This research outlines a general approach for scaling KDD systems using parallel and clistribut,erl resources and applies the suggested strategies to the SUBDUE knowledge discovery system. SUBDUE has been used to discover interesting and repetitive concepts in graph-based databases from a variety of do-m~;nn h..+ rom,;wx .a nrrhct~nt;al camnrrnt nf n,.,\,~occ~.. LYcLLlll", VU" r.,.yulrs," u cJ""O"'WI"aw C"LAA"UI.Y "I yL\'.A"m ing time. Experiments that demonstrate scalability of parallel versions of the SUBDUE system are performed using CAD circuit databases and artificially-generated databases, and potential achievements and obstacles are discussed.

Learning Node Replacement Graph Grammars in Metabolic Pathways

BIOCOMP, 2007

This paper describes graph-based relational, unsupervised learning algorithm to infer node replac... more This paper describes graph-based relational, unsupervised learning algorithm to infer node replacement graph grammar and its application to metabolic pathways. We search for frequent subgraphs and then check for overlap among the instances of the subgraphs in the input graph. If subgraphs overlap by one node, we propose a node replacement graph grammar production. We also can infer a hierarchy of productions by compressing portions of a graph described by a production and then inferring new productions on the compressed graph. We show learning curves and how the learning process changes when we increase the size of a sample set. We examine how computation time changes with an increased number of nodes in the input graphs. We inferred graph grammars from metabolic pathways which do not change more with increased number of graphs in the input set. It indicates that graph grammars found represent the input sets well.

Discovering Substructures in the Chemical Toxicity Domain

Mining in the Proximity of Subgraphs

Graphs are a natural way to represent multi-relational data and are extensively used to model a v... more Graphs are a natural way to represent multi-relational data and are extensively used to model a variety of application domains in diverse fields ranging from bioinformatics to homeland security. Often, in such graphs, certain subgraphs are known to possess some distinct properties and graph patterns in the proximity of these subgraphs can be an indicator of these properties. In this work we focus on the task of mining in the proximity of subgraphs, known to possess certain distinct properties and identify patterns which distinguish these subgraphs from other subgraphs without these properties. This task is novel and of considerable interest as it can facilitate the prediction of previously unknown subgraphs possessing the properties under consideration in the graph and can lead to a better understanding of the application domain. We characterize the task of mining in the proximity of subgraphs as a supervised learning problem and present a heuristic algorithm for the same. Experimental comparison with the ILP system CProgol on real world and artificial datasets provides a strong indication of the ability and viability of the approach in uncovering interesting patterns.

Inferring Graph Grammars by Detecting Overlap in Frequent Subgraphs

International Journal of Applied Mathematics and Computer Science, Jun 1, 2008

In this paper we study the inference of node and edge replacement graph grammars. We search for f... more In this paper we study the inference of node and edge replacement graph grammars. We search for frequent subgraphs and then check for an overlap among the instances of the subgraphs in the input graph. If the subgraphs overlap by one node, we propose a node replacement graph grammar production. If the subgraphs overlap by two nodes or two nodes and an edge, we propose an edge replacement graph grammar production. We can also infer a hierarchy of productions by compressing portions of a graph described by a production and then inferring new productions on the compressed graph. We validate the approach in experiments where we generate graphs from known grammars and measure how well the approach infers the original grammar from the generated graph. We show graph grammars found in biological molecules, biological networks, and analyze learning curves of the algorithm.

Applying the Subdue Substructure Discovery System to the Chemical Toxicity Domain

The Florida AI Research Society, May 1, 1999

The ever-increasing number of chemical compounds added every year has not been accompanied by a s... more The ever-increasing number of chemical compounds added every year has not been accompanied by a similar growth in our ability to analyze and classify these compounds. The problem of prevention of cancer caused by many of these chemicals has been of great scientific and humanitarian value. The use of AI discovery tools for predicting chemical toxicity is being investigated. The basic idea behind the work is to obtain structure-activity representation (SARs) [Srinivasan et al.], which relates molecular structures to cancerous activity. The data is obtained from the U.S National Toxicology Program conducted by the National Institute of Environmental Health Sciences (NIEHS). A general approach to automatically discover repetitive substructures from the datasets is outlined by this research. Relevant SARs are identified using the Subdue substructure discovery system that discovers commonly occurring substructures in a given set of compounds. The best substructure given by Subdue is used as a pattern indicative of cancerous activity.

Learning patterns in the dynamics of biological networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Our dynamic graph-based relational mining approach has been developed to learn structural pattern... more Our dynamic graph-based relational mining approach has been developed to learn structural patterns in biological networks as they change over time. The analysis of dynamic networks is important not only to understand life at the system-level, but also to discover novel patterns in other structural data. Most current graph-based data mining approaches overlook dynamic features of biological networks, because they are focused on only static graphs. Our approach analyzes a sequence of graphs and discovers rules that capture the changes that occur between pairs of graphs in the sequence. These rules represent the graph rewrite rules that the first graph must go through to be isomorphic to the second graph. Then, our approach feeds the graph rewrite rules into a machine learning system that learns general transformation rules describing the types of changes that occur for a class of dynamic biological networks. The discovered graph-rewriting rules show how biological networks change over time, and the transformation rules show the repeated patterns in the structural changes. In this paper, we apply our approach to biological networks to evaluate our approach and to understand how the biosystems change over time. We evaluate our results using coverage and prediction metrics, and compare to biological literature.

Comparison of graph-based and logic-based multi-relational data mining

ACM SIGKDD Explorations Newsletter, 2005

We perform an experimental comparison of the graph-based multi-relational data mining system, Sub... more We perform an experimental comparison of the graph-based multi-relational data mining system, Subdue, and the inductive logic programming system, CProgol, on the Mutagenesis dataset and various artificially generated Bongard problems. Experimental results indicate that Subdue can significantly outperform CProgol while discovering structurally large multi-relational concepts. It is also observed that CProgol is better at learning semantically complicated concepts and it tends to use background knowledge more effectively than Subdue. An analysis of the results indicates that the differences in the performance of the systems are a result of the difference in the expressiveness of the logic-based and the graph-based representations. The ability of graph-based systems to learn structurally large concepts comes from the use of a weaker representation whose expressiveness is intermediate between propositional and first-order logic. The use of this weaker representation is advantageous whil...

Graph-Based Data Mining in Dynamic Networks: Empirical Comparison of Compression-Based and Frequency-Based Subgraph Mining

2008 IEEE International Conference on Data Mining Workshops, 2008

We propose a dynamic graph-based relational mining approach using graph-rewriting rules to learns... more We propose a dynamic graph-based relational mining approach using graph-rewriting rules to learns patterns in networks that structurally change over time. A dynamic graph containing a sequence of graphs over time represents dynamic properties as well as structural properties of the network. Our approach discovers graph-rewriting rules, which describe the structural transformations between two sequential graphs over time, and also learns description rules that generalize over the discovered graph-rewriting rules. The discovered graph-rewriting rules show how networks change over time, and the description rules in the graph-rewriting rules show temporal patterns in the structural changes. We apply our approach to biological networks to understand how the biosystems change over time. Our compression-based discovery of the description rules is compared with the frequent subgraph mining approach using several evaluation metrics.

Temporal and structural analysis of biological networks in combination with microarray data

2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2008

Our project introduces a graph-based relational learning approach using graph-rewriting rules for... more Our project introduces a graph-based relational learning approach using graph-rewriting rules for temporal and structural analysis of biological networks changing over time. The analysis of dynamic biological networks is necessary to understand life at the system-level, because biological networks continuously change their structures and properties while an organism performs various biological activities to promote reproduction and sustain our lives. Most current graph-based data mining approaches overlook dynamic features of biological networks, because they are focused on only static graphs. Most approaches for analysis of microarray data disregard structural properties on biological systems. But our dynamic graph-based relational learning approach describes how the graphs temporally and structurally change over time in the dynamic graph representing biological networks in combination with microarray data.

Exploiting Parallelism in Knowledge Discovery Systems to Improve Scalability

Cover story: structural Web search using a graph-based discovery system

intelligence, 2001

Accessing information of interest on the Internet presents a challenge to scientists and analysts... more Accessing information of interest on the Internet presents a challenge to scientists and analysts, particularly if the desired information is structural in nature. Our goal is to design a structural search engine which uses the hyperlink structure of the Web, in addition to textual information, to search for sites of interest. To design a structural search engine, we use the SUBDUE graph-based discovery tool. The tool, called WebSUBDUE, is enhanced by WordNet features that allow the engine to search for synonym terms. Our search engine retrieves sites corresponding to structures formed by graph-based user queries. We demonstrate the approach on a number of structural web queries.

Inferring Graph Grammars by Detecting Overlap in Frequent Subgraphs

International Journal of Applied Mathematics and Computer Science, 2008

Inferring Graph Grammars by Detecting Overlap in Frequent SubgraphsIn this paper we study the inf... more Inferring Graph Grammars by Detecting Overlap in Frequent SubgraphsIn this paper we study the inference of node and edge replacement graph grammars. We search for frequent subgraphs and then check for an overlap among the instances of the subgraphs in the input graph. If the subgraphs overlap by one node, we propose a node replacement graph grammar production. If the subgraphs overlap by two nodes or two nodes and an edge, we propose an edge replacement graph grammar production. We can also infer a hierarchy of productions by compressing portions of a graph described by a production and then inferring new productions on the compressed graph. We validate the approach in experiments where we generate graphs from known grammars and measure how well the approach infers the original grammar from the generated graph. We show graph grammars found in biological molecules, biological networks, and analyze learning curves of the algorithm.

Structure Discovery in Sequentially-Connected Data Streams

International Journal on Artificial Intelligence Tools, 2006

Historically, data mining research has been focused on discovering sets of attributes that discri... more Historically, data mining research has been focused on discovering sets of attributes that discriminate data entities into classes or association rules between attributes. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event driven, such as counter-terrorism intelligence analysis. In this paper we describe an algorithm designed to operate over relational data received from a continuous stream. Our approach includes a mechanism for summarizing discoveries from previous data increments so that the globally best patterns can be computed by examining only the new data increment. We then describe a method by which relational dependencies that span across temporal increment boundaries can be efficiently resolved so that additional pattern instances, which do not reside entirely in a single data increment, can be discovered. We al...

Coupling two complementary knowledge discovery systems

Most approaches to knowledge discovery concentrate on either an attribute-value representation or... more Most approaches to knowledge discovery concentrate on either an attribute-value representation or a structural data representation. The discover}, systems for these two representations are typically different, and their integration is non-trivial. We investigate a simpler integration of the two systems by coupling the two approaches. Our method first executes the structural discovery s}~tem on the data, and then uses these results to augment or compress the data before being input to the attribute-value-based system. We demonstrate this strategy using the AutoClass attribute-valuebased clustering system and the Subdue structural discovery system. The results of the demonstration show that coupling the two systems allows the discovery of knowledge imperceptible to either system alone.

Discovering Substructures in the Chemical Toxicity Domain