Academia.eduAcademia.edu

Outline

Empirical Text Mining for Genre Detection

2012

https://doi.org/10.5220/0003956207330737

Abstract

In this paper, we report on a preliminary study we carried out for identifying patterns that characterize the genre type of Greek texts. In the course of our study, we address four distinct genre types, we record their observable stylistic elements and we indicate their exploitation for automatic genre-based document classi-fication. The findings of our study demonstrate that texts contain lexical features with discriminative power as far as genre is concerned, however modeling those features so that they can be explored by computer-based applications is still in early stages.

References (8)

  1. Finn, A. and Kushmerick, N. 2003. Learning to classify documents according to genre. In Proceedings of the Computational Approaches to Style Analysis and Syn- thesis Workshop.
  2. Finn, A., Kushmerick, N. and Smyth, B. 2002. Genre clas- sification and domain transfer for information filter- ing. In Proceedings of the European Colloquium on Information Retrieval Research, pp. 353-362, Glas- gow.
  3. Karlgren, J. 1999. Stylistic experiments in information retrieval. Natural Language Information Retrieval, Kluwer.
  4. Lee, Y. B. and Myaeng, S. H. 2004. Automatic identifica- tion of text genres and their roles in subject-based categorization. In the 37 th Hawaiian Conference on System Sciences.
  5. Santini, M., Power, R. and Evans, R. 2006. Implementing a characterization of genre for automatic genre identi- fication of web pages. ACL Computational Linguistics Conference.
  6. Santini, M. 2007. Automatic genre identification: towards a flexible classification scheme. In the BCS IRSG Symposium: Future Directions in Information Access, Glasgow, Scotland.
  7. Sharoff, S. 2007. Classifying web corpora into domain and genre using automatic feature identification. In the Web as Corpus Workshop, Louvain-la-Neuve.
  8. Stamatatos E., Fakotakis N. and Kokkinakis G. 2000. Automatic text categorization in terms of genre and author. Computational Linguistics, vol.26, no.4, pp. 461-485, MIT Press