Academia.eduAcademia.edu

Outline

EMPIRICAL TEXT MINING FOR GENRE DETECTION

https://doi.org/10.5220/0003956207330737

Abstract

In this paper, we report on a preliminary study we carried out for identifying patterns that characterize the genre type of Greek texts. In the course of our study, we address four distinct genre types, we record their observable stylistic elements and we indicate their exploitation for automatic genre-based document classification. The findings of our study demonstrate that texts contain lexical features with discriminative power as far as genre is concerned, however modeling those features so that they can be explored by computer-based applications is still in early stages.

References (8)

  1. Finn, A. and Kushmerick, N. 2003. Learning to classify docu- ments according to genre. In Proceedings of the Computa- tional Approaches to Style Analysis and Synthesis Workshop.
  2. Finn, A., Kushmerick, N. and Smyth, B. 2002. Genre classifica- tion and domain transfer for information filtering. In Pro- ceedings of the European Colloquium on Information Re- trieval Research, pp. 353-362, Glasgow.
  3. Karlgren, J. 1999. Stylistic experiments in information retrieval. Natural Language Information Retrieval, Kluwer.
  4. Lee, Y.B. and Myaeng, S.H. 2004. Automatic identification of text genres and their roles in subject-based categorization. In the 37 th Hawaiian Conference on System Sciences.
  5. Santini, M., Power, R. and Evans, R. 2006. Implementing a characterization of genre for automatic genre identification of web pages. ACL Computational Linguistics Conference.
  6. Santini, M. 2007. Automatic genre identification: towards a flexible classification scheme. In the BCS IRSG Symposium: Future Directions in Information Access, Glasgow, Scotland.
  7. Sharoff, S. 2007. Classifying web corpora into domain and genre using automatic feature identification. In the Web as Corpus Workshop, Louvain-la-Neuve.
  8. Stamatatos E., Fakotakis N. and Kokkinakis G. 2000. Automatic text categorization in terms of genre and author. Computa- tional Linguistics, vol.26, no.4, pp. 461-485, MIT Press