Academia.eduAcademia.edu

Outline

The ModelCC Model-Based Parser Generator

Abstract

Formal languages let us define the textual representation of data with precision. Formal grammars, typically in the form of BNF-like productions, describe the language syntax, which is then annotated for syntax-directed translation and completed with semantic actions. When, apart from the textual representation of data, an explicit representation of the corresponding data structure is required, the language designer has to devise the mapping between the suitable data model and its proper language specification, and then develop the conversion procedure from the parse tree to the data model instance. Unfortunately, whenever the format of the textual representation has to be modified, changes have to propagated throughout the entire language processor tool chain. These updates are time-consuming, tedious, and error-prone. Besides, in case different applications use the same language, several copies of the same language specification have to be maintained. In this paper, we introduce ModelCC, a model-based parser generator that decouples language specification from language processing, hence avoiding many of the problems caused by grammar-driven parsers and parser generators. ModelCC incorporates reference resolution within the parsing process. Therefore, instead of returning mere abstract syntax trees, ModelCC is able to obtain abstract syntax graphs from input strings.

References (93)

  1. H. Abelson and G. J. Sussman. Structure and Interpretation of Computer Programs. MIT Press, 2nd edition, 1996.
  2. C. C. Aggarwal and C. Zhai, editors. Mining Text Data. Springer, 2012.
  3. A. V. Aho, S. C. Johnson, and J. D. Ullman. Deterministic parsing of ambiguous grammars. Communications of the ACM, 18(8):441-452, 1975.
  4. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison Wesley, 2nd edition, 2006.
  5. A. V. Aho and J. D. Ullman. The Theory of Parsing, Translation, and Compiling, Volume I: Parsing & Volume II: Compiling. Prentice Hall, Englewood Cliffs, N.J., 1972.
  6. M. Anlauff, P. W. Kutter, and A. Pierantonio. Montages/gem-mex: A meta visual program- ming generator. In VL, pages 304-305, 1998.
  7. I. Attali, C. Courbis, P. Degenne, A. Fau, D. Parigot, and C. Pasquier. Smarttools: a generator of interactive environment tools. Electr. Notes Theor. Comput. Sci., 44(2):225- 231, 2001.
  8. J. Aycock and R. N. Horspool. Schrödinger's token. Softw., Pract. Exper., 31(8):803-814, 2001.
  9. F. Berzal, J.-C. Cubero, N. Marín, and M.-A. Vila. Lazy types: Automating dynamic strategy selection. IEEE Software, 22(5):98-106, 2005.
  10. E. Bjarnason. APPLAB -a laboratory for application languages. In Proceedings of the 7th Nordic Workshop on Programming Environment Research, pages 99-104, 1996.
  11. P. Borras, D. Clement, Th. Despeyroux, J. Incerpi, G. Kahn, B. Lang, and V. Pascual. CENTAUR: the system. In Proceedings of the 3rd ACM SIGSOFT/SIGPLAN Software En- gineering Symposium on Practical Software Development Environments, pages 14-24, 1988.
  12. J. T. Boyland. Analyzing direct non-local dependencies in attribute grammars. In CC'98 -Compiler Construction, 7th International Conference, volume 1383 of Lecture Notes in Computer Science, pages 31-49, 1998.
  13. M. Bravenboer, K. T. Kalleberg, R. Vermaas, and E. Visser. Stratego/xt 0.17. a language and toolset for program transformation. Sci. Comput. Program., 72(1-2):52-70, 2008.
  14. C. Bürger, S. Karol, C. Wende, and U. A. man. Reference attributed grammars for meta- model semantics. In Proceedings of the 3rd International Conference on Software Language Engineering, pages 22-41, 2010.
  15. N. Chomsky. Three models for the description of language. IRE Transactions on Information Theory, 2(2):113-123, 1956.
  16. J. Cocke and J. T. Schwartz. Programming languages and their compilers: Preliminary notes. Technical report, Courant Institute of Mathematical Sciences, New York University, 1970.
  17. F. L. DeRemer. Practical translators for LR(k) languages. Technical report, Cambridge, MA, USA, 1969.
  18. F. L. DeRemer. Simple LR(k) grammars. Communications of the ACM, 14(7):453-460, 1971.
  19. F. L. DeRemer and T. Pennello. Efficient computation of LALR(1) look-ahead sets. ACM Transactions on Programming Languages and Systems, 4(4):615-649, 1982.
  20. A. Doan, A. Halevy, and Z. Ives. Principles of Data Integration. Elsevier Science, 2012.
  21. J. Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94- 102, 1970.
  22. J. Earley. Ambiguity and precedence in syntax description. Acta Informatica, 4(2):183-192, 1975.
  23. H. Ehrig and G. Taentzer. Graphical representation and graph transformation. ACM Com- puting Surveys, vol. 31, no. 3es, art. 9, 1999.
  24. E. Evans. Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison- Wesley, 2003.
  25. B. Ford. Packrat parsing: Simple, powerful, lazy, linear time. In Proceedings of the 7th ACM SIGPLAN International Conference on Functional Programming, ICFP '02, pages 36-47, 2002.
  26. B. Ford. Parsing expression grammars: a recognition-based syntactic foundation. In Pro- ceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '04, pages 111-122, 2004.
  27. M. Fowler. Using metadata. IEEE Software, 19(6):13-17, November 2002.
  28. M. Fowler. Language workbenches: The killer-app for domain specific languages?, 2005. http://martinfowler.com/articles/languageWorkbench.html.
  29. M. Fowler. Domain-Specific Languages. Addison-Wesley Signature Series, 2010.
  30. M. Freudenthal. Simpl: A toolkit for domain-specific language development in enterprise information systems. PhD thesis, Institute of Computer Science, Faculty of Mathematics and Computer Science, University of Tartu, Estonia, 2013.
  31. J. Garrido, M. Á. Martos, and F. Berzal. Model-driven development using standard tools. In Proceedings of the 9th International Conference on Enterprise Information Systems, volume DISI of IDEAL 2007, pages 433-436, 2007.
  32. S. Ginsburg. Algebraic and automata theoretic properties of formal languages. North-Holland, 1975.
  33. R. Grimm. Better extensibility through modular syntax. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '06, pages 38-51, 2006.
  34. J. Grundy, J. Hosking, K. N. Li, N. M. Ali, J. Huh, and R. L. Lu. Generating domain- specific visual language tools from abstract visual specifications. IEEE Transactions on Software Engineering, 39:487-515, 2013.
  35. M. A. Harrison. Introduction to Formal Language Theory. Reading, Mass: Addison-Wesley Publishing Company, 1978.
  36. G. Hedin. Reference attributed grammars. In Proceedings of the 2nd Workshop on Attribute Grammars and their Applications, volume 274 of Lecture Notes in Computer Science, pages 154-173, 1987.
  37. G. Hedin and B. Magnusson. The Mjølner environment: Direct interaction with abstractions. In Proceedings of the 2nd European Conference on Object-Oriented Programming, volume 322 of Lecture Notes in Computer Science, pages 41-54, 1988.
  38. P. R. Henriques, M. J. V. Pereira, M. Mernik, M. Lenic, J. Gray, and H. Wu. Auto- matic generation of language-based tools using the LISA system. Software, IEE Proceedings, 152(2):54-69, 2005.
  39. P. Hudak. Building domain-specific embedded languages. ACM Computing Surveys, vol. 28, no. 4es, art. 196, 1996.
  40. M. Ishii, K. Ohta, and H. Saito. An efficient parser generator for natural language. In Proceedings of the 15th Conference on Computational Linguistics, volume 1, pages 417-420, 1994.
  41. S. Jarzabek and T. Krawczyk. LL-Regular grammars. Information Processing Letters, 4(2):31 -37, 1975.
  42. J.-M. Jézéquel, B. Combemale, O. Barais, M. Monperrus, and F. Fouquet. Mashup of meta- languages and its implementation in the kermeta language workbench. CoRR, abs/1306.0760, 2013.
  43. S. C. Johnson. YACC: Yet another compiler compiler. Computing Science Technical Report 32, AT&T Bell Laboratories, 1979.
  44. M. Jourdan, D. Parigot, C. Julié, O. Durin, and C. Bellec. Design, implementation and evaluation of the FNC-2 attribute grammar system. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pages 209-222, 2002.
  45. D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall, 2nd edition, 2009.
  46. T. Kasami. An efficient recognition and syntax-analysis algorithm for context-free languages. Scientific report AFCRL-65-758, Air Force Cambridge Research Lab, Bedford, MA., 1965.
  47. U. Kastens, P. Pfahler, and M. T. Jung. The eli system. In CC, pages 294-297, 1998.
  48. U. Kastens and C. Schmidt. Vl-eli: A generator for visual languages -system demonstration. Electr. Notes Theor. Comput. Sci., 65(3):139-143, 2002.
  49. L. C. L. Kats, E. Visser, and G. Wachsmuth. Pure and declarative syntax definition: Paradise lost and regained. In Proceedings of the ACM International Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA'10), pages 918-932, 2010.
  50. A. Kleppe. Towards the generation of a text-based IDE from a language metamodel. In Proceedings of the 4th European Conference on Model-Driven Architecture -Foundations and Applications, volume 4530 of Lecture Notes in Computer Science, pages 114-129, 2007.
  51. D. E. Knuth. On the translation of languages from left to right. Information and Control, 8(6):607-639, 1965.
  52. M. F. Kuiper and J. Saraiva. Lrc -a generator for incremental language-oriented tools. In CC, pages 298-301, 1998.
  53. B. Lang. Deterministic techniques for efficient non-deterministic parsers. In J. Loeckx, editor, Automata, Languages and Programming, volume 14 of Lecture Notes in Computer Science, pages 255-269. Springer Berlin / Heidelberg, 1974.
  54. M. E. Lesk and E. Schmidt. Lex: A lexical analyzer generator. Technical report, AT&T Bell Laboratories, 1975.
  55. J. R. Levine, T. Mason, and D. Brown. lex & yacc. O'Reilly, 2nd edition, 1992.
  56. P. M. Lewis, II and R. E. Stearns. Syntax-directed transduction. Journal of the ACM, 15(3):465-488, 1968.
  57. M. Machura. Object-oriented environments: The mjølner approach: J. l. knudsen, m. lofgren, o. lehrmann-madsen and b. magnusson (eds) prentice-hall (1993) 627 pp £35.00 isbn 0 13 009291 6. Information & Software Technology, 36(12):752-753, 1994.
  58. C. McManis. Looking for lex and yacc for Java? you don't know jack, 1996. JavaWorld, www.javaworld.com/javaworld/jw-12-1996/jw-12-jack.html.
  59. R. McNaughton and H. Yamada. Regular expressions and state graphs for automata. IRE Transactions on Electronic Computers, EC-9(1):38-47, 1960.
  60. S. McPeak and G. C. Necula. Elkhound: A fast, practical GLR parser generator. In Pro- ceedings of the International Conference on Compiler Constructor (CC04), volume 2985 of Lecture Notes in Computer Science, pages 73-88. Springer, 2004.
  61. M. Mernik, J. Heering, and A. M. Sloane. When and how to develop domain-specific lan- guages. ACM Computing Surveys, 37(4):316-344, 2005.
  62. R. Mugridge and W. Cunningham. Fit for Developing Software: Framework for Integrated Tests (Robert C. Martin). Prentice Hall PTR, 2005.
  63. J. R. Nawrocki. Conflict detection and resolution in a lexical analyzer generator. Information Processing Letters, 38(6):323-328, 1991.
  64. A. Nijholt. On the parsing of LL-Regular grammars. In A. Mazurkiewicz, editor, Mathe- matical Foundations of Computer Science 1976, volume 45 of Lecture Notes in Computer Science, pages 446-452. Springer Berlin / Heidelberg, 1976.
  65. R. Nozohoor-Farshi. GLR parsing for epsilon-grammars. In M. Tomita, editor, Generalized LR Parsing, pages 61-76. Kluwer, 1991.
  66. I. Object Management Group. Meta Object Facility (MOF) 2.0 Core Specification, 2001.
  67. I. Object Management Group. UML Object Constraint Language (OCL) 2.0 Specification, 2003.
  68. A. Oettinger. Automatic syntactic analysis and the pushdown store. In Proceedings of the 12th Symposium in Applied Mathematics, pages 104-129, 1961.
  69. T. Parr and K. Fisher. LL(*): The foundation of the ANTLR parser generator. In Pro- ceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 425-436, 2011.
  70. T. J. Parr and R. W. Quong. ANTLR: A Predicated-LL(k) parser generator. Software Practice and Experience, 25(7):789-810, 1995.
  71. J. Porubän, M. Forgáč, and M. Sabo. Annotation-based parser generator. In Proceedings of the International Multiconference on Computer Science and Information Technology, IEEE Computer Society Press, volume 4, pages 705-712, 2009.
  72. L. Quesada. A model-driven parser generator with reference resolution support. In Proceed- ings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pages 394-397, 2012.
  73. L. Quesada, F. Berzal, and F. J. Cortijo. Lamb -a lexical analyzer with ambiguity sup- port. In Proceedings of the 6th International Conference on Software and Data Technologies, volume 1, pages 297-300, 2011.
  74. L. Quesada, F. Berzal, and F. J. Cortijo. Fence -a context-free grammar parser with constraints for model-driven language specification. In Proceedings of the 7th International Conference on Software Paradigm Trends, pages 5-13, 2012.
  75. L. Quesada, F. Berzal, and J.-C. Cubero. A language specification tool for model-based parsing. In Proceedings of the 12th International Conference on Intelligent Data Engineering and Automated Learning. Lecture Notes in Computer Science, volume 6936, pages 50-57, 2011.
  76. L. Quesada, F. Berzal, and J.-C. Cubero. A model-based multilingual natural language parser -implementing Chomsky's X-bar theory in ModelCC. In Proceedings of the 10th International Conference on Flexible Query Answering Systems, volume 8132 of Lecture Notes in Artificial Intelligente, pages 293-304, 2013.
  77. L. Quesada, F. Berzal, and J.-C. Cubero. A domain-specific language for abstract syntax model to concrete syntax model mappings. In Proceedings of the 2nd International Confer- ence on Model-Driven Engineering and Software Development, 2014. (in press).
  78. L. Quesada, F. Berzal, and J.-C. Cubero. Parsing abstract syntax graphs with ModelCC. In Proceedings of the 2nd International Conference on Model-Driven Engineering and Software Development, 2014. in press.
  79. S. P. Reiss. Graphical program development with pecan program development systems. In Software Development Environments (SDE), pages 30-41, 1984.
  80. S. P. Reiss. Pecan: Program development systems that support multiple views. IEEE Transactions on Software Engineering, SE-11(3):276-285, 1985.
  81. J. Rekers. Parser Generation for Interactive Environments. PhD thesis, University of Ams- terdam, 1992.
  82. T. W. Reps and T. Teitelbaum. The synthesizer generator -a system for constructing language-based editors. Texts and monographs in computer science. Springer, 1989.
  83. J. Saraiva. Component-based programming for higher-order attribute grammars. In GPCE, pages 268-282, 2002.
  84. S. Sarawagi. Information extraction. Foundations and Trends in Databases, 1(3):261-377, 2008.
  85. D. C. Schmidt. Model-driven engineering. IEEE Computer, 39(2):25-31, 2006.
  86. S. D. Swierstra, P. R. A. Alcocer, and J. Saraiva. Designing and implementing combinator languages. In Advanced Functional Programming, pages 150-206, 1998.
  87. M. Tomita. Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, 1985.
  88. M. Tomita and J. G. Carbonell. The universal parser architecture for knowledge-based machine translation. In Proceedings of the 10th International Joint Conference on Artificial Intelligence, volume 2, pages 718-721, 1987.
  89. J. van den Bos, M. Hills, P. Klint, T. van der Storm, and J. J. Vinju. Rascal: From algebraic specification to meta-programming. In AMMSE, pages 15-32, 2011.
  90. M. van den Brand, A. van Deursen, J. Heering, H. de Jong, M. de Jonge, T. Kuipers, P. Klint, L. Moonen, P. A. Olivier, J. Scheerder, J. J. Vinju, E. Visser, and J. Visser. The ASF+SDF meta-environment: a component-based language development environment. Electr. Notes Theor. Comput. Sci., 44(2):3-8, 2001.
  91. A. Warth and I. Piumarta. Ometa: an object-oriented language for pattern matching. In DLS, pages 11-19, 2007.
  92. D. H. Younger. Recognition and parsing of context-free languages in time n 3 . Information and Control, 10:189-208, 1967.
  93. D.-Q. Zhang and K. Zhang. Vispro: A visual language generation toolset. In VL, pages 195-202, 1998.