Academia.eduAcademia.edu

Outline

Learning Node Replacement Graph Grammars in Metabolic Pathways

2007, BIOCOMP

Abstract

This paper describes graph-based relational, unsupervised learning algorithm to infer node replacement graph grammar and its application to metabolic pathways. We search for frequent subgraphs and then check for overlap among the instances of the subgraphs in the input graph. If subgraphs overlap by one node, we propose a node replacement graph grammar production. We also can infer a hierarchy of productions by compressing portions of a graph described by a production and then inferring new productions on the compressed graph. We show learning curves and how the learning process changes when we increase the size of a sample set. We examine how computation time changes with an increased number of nodes in the input graphs. We inferred graph grammars from metabolic pathways which do not change more with increased number of graphs in the input set. It indicates that graph grammars found represent the input sets well.

References (16)

  1. References
  2. N Chomsky,. Three models of language. IRE Transactions in Information Theory 2, 3, 113-24, 1956
  3. D. Cook and L. Holder, "Substructure Discovery Using Minimum Description Length and Background Knowledge." Journal of Artificial Intelligence Research, Vol 1, (1994), 231-255, 1994
  4. D. Cook and L. Holder, "Graph-Based Data Mining." IEEE Intelligent Systems, 15(2), pages 32-41, 2000.
  5. H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213-221, 2005.
  6. E. Jeltsch, H. Kreowski, "Grammatical Inference Based on Hyperedge Replacement. Graph-Grammars." Lecture Notes in Computer Science 532, 1990: 461-474, 1990
  7. Jonyer. L. Holder, and. D. Cook, "MDL-Based Context-Free Graph Grammar Induction and Applications." International Journal of Artificial Intelligence Tools, Volume 13, No. 1, 65-79, 2004.
  8. M. Kanehisa, S. Goto, S. Kawashima, U. Okuno, and M. Hattori. KEGG resource for deciphering the genome. 32:277-280, 2004.
  9. KEGG website. http://www.kegg.com.
  10. M. Koyuturk, A. Grama, and W. Szpankowski. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics, 20:200-207, 2004.
  11. E. Klipp, R. Herwig, A. Kowald, C. Wierling, and H. Lehrach. Systems Biology. WILEY-VCH, first edition, 2005.
  12. J. Kukluk, L. Holder, and D. Cook, Inference of Node Replacement Recursive Graph Grammars. Sixth SIAM International Conference on Data Mining, 2006
  13. H. Mamitsuka, Y. Okuno and A. Yamaguchi, Mining Biologically Active Patterns in Metabolic Pathways Using Microarray Expression Profiles, ACM SIGKDD Explorations Newsletter, Volume 5 , Issue 2, 113 -121
  14. G. Nevill-Manning and H. Witten, "Identifying hierarchical structure in sequences: A linear-time algorithm." Journal of Artificial Intelligence Research, Vol 7, (1997, 1997),67-82
  15. T. Oates, S. Doshi, and F. Huang, "Estimating maximum likelihood parameters for stochastic context-free graph grammars." volume 2835 of Lecture Notes in Artificial Intelligence. Springer-Verlag, 2003, 281-298
  16. Rissanen, J. Stochastic Complexity in Statistical Inquiry. World Scientific Company. 1989