Quantitative Text Typology: The Impact of Word Length
2004
https://doi.org/10.1007/3-540-28084-7_5…
13 pages
1 file
Sign up for access to the world's latest research
Abstract
The present study aims at the quantitative classification of texts and text types. By way of a case study, 398 Slovenian texts from different genres and authors are analyzed as to their word length. It is shown that word length is an important factor in the synergetic self-regulation of texts and text types, and that word length may significantly contribute to a new typology of discourse types. 1
Related papers
Text typology is concerned with the identification of the criteria leading to the classification (typology) of texts (or text types, text classes, styles, genres). Depending on the criteria adopted, there are several possibilities of classifying texts. Using some of the most obvious criteria, texts can be classified as spoken or written, dialogical or monological, spontaneous (unprepared) or ritual (prepared), informal or formal, individual (personal) and interindividual (interpersonal), private or public (official, institutional), subjective or objective, interactional (contact-oriented) and transactional (message-oriented), etc. However, all text types identified on the basis of a single criterion, in contrast with those based on several criteria (simplex vs. complex styles, K. Hausenblas 1972; secondary vs. primary styles, Mistrík 1997), often include instances which may reveal a more complicated patterning of features than those suggested by these dichotomies (see also Medium and Participation, 3.2.2); for example, news bulletin scripts read by newscasters, dictation of a letter to a secretary, ritualized exchanges (greetings, politeness formulae) characterizing conversations, interactional features contained in otherwise transactional encounters (lectures), etc. (cf. Ferenčík 2000). Dolník and Bajzíková (1998) maintain that it is possible to approach texts as either theoretical linguistic constructs (text typology), or as concretépsychological realities´(text classification). The latter approach is based on the intuition possessed by every language user which is acquired through his/her practical experience with the production of texts and which represents a component of his/her communicative (stylistic) competence. The authors have it that one of the most important criteria is based on the study of the ways that dominating communicative functions of texts determine the choice of expressive means of language; e.g., in appeals, warnings, public notices the conative function dominates, in congratulations or expressions of sympathy it is the phatic function, in research reports the representational function, in advertising the persuasive function, etc. (3.4). The functional perspective initiated the Prague school functional stylistics and the elaboration of the theory of functional styles (K. Havránek, M. Jelínek); functional approach is also present in the approaches of Gaľperin (1977) who differentiates five functional styles of English (the publicistic, newspaper, scientific prose, belles-lettres styles and the style of official documents), and of Crystal and Davy (1969) who offer an in-depth analysis of fivélanguages´(conversation, unscripted commentary, religion, newspaper reporting and legal documents), but suggest possibilities for the study of other varieties as well (the language of TV and press advertising, public speaking, written instructions, broadcast talks and news, science, the civil service and the spoken legal language). It should be noted that the variation based on the functional (contextual) criterion represents one of the three principal types of variation of national language (the other two being regional and social variation, see 9.3). Using the degree of abstraction (generalization) as the main criterion of text typology, the functional styles could stand at the top, followed by the styles of particular social groups and/or traditions of literary writing (interindividual styles), the styles of an individual authors (individual or personal styles) and the styles of individual texts (singular styles) (cf. Hoffmannová 1997). The criterion of théglobal area of activityás proposed by Dolník and Bajzíková (1998) is close to the identification of functional styles in that they identify journalistic, economic, political, legal and scientific texts. We consider their empirically based text classification firmly rooted in the structural-functional theory of text (toward which language users intuitively orient) as a viable approach since it integrates the criteria of communicative function, situation (context) and strategy.
2016
The impact of text length very often biases results of stylometric indices which are based on rank-frequency distribution (e.g. type-token ratio, repeat rate, entropy). The aim of the article is to observe the relation between text size and thematic concentration indicators (TC, STC). The corpus consists of 1471 English texts of various genres. The obtained results show that thematic concentration is independent of text length in the interval <200; 6500>. Given that the analysis corroborates the findings of the previous research in Czech language, TC and STC seem to be reliable stylometric indicators applicable to text analyses of different languages.
It is obvious that not all texts are of the same type. We may distinguish between political texts, legal texts and medical texts; fairy tales, novels and short stories differ from newspaper reports, essays, and scientific papers; food recipes, instructions booklets and advertisements may show similarities but they are not the same, expository texts differ from argumentative texts, etc. All these types of text differ in ways that are somewhat obvious, intuitively, but which nevertheless invite detailed analysis.
SHS Web of Conferences
The work presented in this paper is a part of an ongoing project that investigates academic text features indicative of its complexity at different grade levels. In this study we examine comparative complexity of Social science texts used in Russian secondary and high schools. Based on the metrics of ten descriptive and four lexical features assessed for seven classroom textbooks we claim lexical diversity, frequency, abstractness and the number of terminological units to be statistically significant predictors of text complexity. The total size of the Corpus of over 160.000 tokens comprising two sets of textbooks ranging from the 5th to the 11th grades provides a satisfactory level of its representativeness and as such a solid foundation for statistical validity of the results. We employ RusAC, an online text analyzer, to compute lexical features of texts and the effect of the four lexical features on text complexity is confirmed with a mixed analysis of variance. The study fills a...
2020
The research presented in this paper is aimed at the analysis of dynamic organization of a literary text. Using the statistical time series method, the dynamics of the main extensive text variables — the mean paragraph length and the mean sentence length — is considered. The material for this study was the annotated subcorpus from the Corpus of the Russian Short Stories of 19001930, which consists of 310 stories written by 300 Russian writers. It was narrative fragments of texts (the narrator's speech) that were subjected to analysis, dialogical fragments were not taken into consideration. As a result, the most frequent dynamic profiles of paragraph length and sentence length were obtained, which reflect the most typical structures of the dynamic organization of short literary texts.
Russian Journal of Linguistics
Text complexity assessment is a challenging task requiring various linguistic aspects to be taken into consideration. The complexity level of the text should correspond to the reader’s competence. A too complicated text could be incomprehensible, whereas a too simple one could be boring. For many years, simple features were used to assess readability, e.g. average length of words and sentences or vocabulary variety. Thanks to the development of natural language processing methods, the set of text parameters used for evaluating readability has expanded significantly. In recent years, many articles have been published the authors of which investigated the contribution of various lexical, morphological, and syntactic features to the readability level. Nevertheless, as the methods and corpora are quite diverse, it may be hard to draw general conclusions as to the effectiveness of linguistic information for evaluating text complexity due to the diversity of methods and corpora. Moreover,...
Paper presented at the Workshop on Typological Contrasts, Aarhus, Denmark, May 19-20, 2022, 2022
In this paper, I describe some of the most distinct differences between typical endocentric and exocentric text structure, with particular reference to Danish and Italian respectively. I point to some of the phenomena which – in my experience as a teacher of Italian in Denmark for over 40 years – have been most problematic to Danish students of Italian, namely the differences in text complexity and text density between otherwise parallel Danish and Italian texts. Regarding text and text type comparison, I follow the theoretical framework suggested by Hartmann (1980), cf. also Skytte (2000), with distinctions between what Hartmann calls “Class B” and “Class C” parallel texts (“Class A” being translations), where “Class B” texts are adaptations “conveying an identical message to receivers of sometimes very different cultural backgrounds” (Hartmann 1980: 38), e.g. news bulletins, and “Class C” texts are authentic texts produced independently in the languages in question, but in equivalent situations and with equivalent targets and contents, texts that I shall refer to as “comparable texts” (Korzen & Gylling 2017; Korzen 2021). As can be gathered from to brief examples, (1)-(2), my focus is on the textualisation and syntactic combination of propositions in the two languages. Other things being equal, Romance text structure typically reveals a more compact and complex form than Scandinavian text structure, with more propositions per sentence and more propositions textualised without a finite verb, i.e. “deverbalised”. Whereas examples (1)-(2) are taken from comparable texts, the picture changes – not surprisingly – when we consider adapted texts, “Class B” texts in Hartmann’s terminology. However, with regard to Danish and Italian text structure, the picture seems to change on one account only, namely the sentence compactness, i.e. the number of propositions textualised in the same sentence; not regarding deverbalisation. On the basis of statistical analyses of four different text corpora, three of comparable texts and one of adaptations, I discuss the usefulness of these two kinds of text comparison, as well as whether the mentioned text structure differences should be considered as a question of language typology or language use.
Poznan Studies in Contemporary Linguistics, 2017
As human language is a multi-level complex adaptive system, a text can be seen as emergent from the complex interactions between internal and external factors. Text types such as microblog have size restriction, an external variable which may affect relevant quantitative properties of the texts themselves. Such texts provide a good opportunity to investigate the interactions between the external and internal factors of human language from the perspective of complex adaptive system. This study focuses on how the size restriction of Chinese microblog texts affects the length of their sentences and clauses. Quantitative properties concerning sentence and clause length of Chinese microblog texts are analyzed and compared with those of texts with no size restriction (i.e., prose, news report and romantic fiction). Analysis of sentence length distribution shows that size restriction has an impact on sentence length measured by numbers of words and clauses. The correlation between sentence...
Beatriz López Medina Universidad Antonio de Nebrija, Madrid Encuentro Revista de investigación e innovación en la clase de idiomas. 12, 2002/2003 Throi^out the short history of text linguistics research, some studies have pointed out the relevance of the text as a basic unit to ^proach a foreign language. This article deals with some of the activities on text type characteristics put into practice with Spanish students with upper intermediate level of English language. Following the classical classifícations on text typology (Werlich 1975, Beaugrande and Dressler, 1981, or Hatch, 1992, among others), this paper will provide an outline of the structure of a foreign language class based on text linguistics. The application of the outline will be shown in relation to two kinds of texts: diaríes / joumals and descriptive texts.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (8)
- ADAMCZIK, Kirsten (1995): Textsorten -Texttypologie. Eine kommentierte Biblio- graphie. Nodus, Münster.
- ANTIĆ, G., KELIH, E., and GRZYBEK, P. (2004): Zero-syllable Words in Deter- mining Word Length. In: P. Grzybek ( Ed.): Contributions to the Science of Language. Word Length Studies and Related Issues. [In print]
- GRZYBEK, P. (2004): History and Methodology of Word Length Studies: The State of the Art. In: P. Grzybek (Ed.): Contributions to the Science of Language: Word Length Studies and Related Issues. [In print]
- GRZYBEK, P. and KELIH, E. (2004): Texttypologie in/aus empirischer Sicht. In: J. Bernard, P. Grzybek, and Ju. Fikfak (Eds.): Text and Reality. Ljubljana etc. [In print].
- GRZYBEK, P. and STADLOBER, E. (2003): Zur Prosa Karel Čapeks -Einige quantitative Bemerkungen. In: S. Kempgen, U. Schweier, and und T. Berger (Eds.), Rusistika -Slavistika -Lingvistika. Festschrift für Werner Lehfeldt zum 60. Geburtstag. Sagner, München, 474-488.
- KELIH, E., ANTIĆ, G., GRZYBEK, P., and STADLOBER, E. (2004): Classifica- tion of Author and/or Genre? [Cf. this volume]
- KÖHLER, R. (1986): Zur synergetischen Linguistik: Struktur und Dynamik der Lexik. Brockmeyer, Bochum.
- ORLOV, Ju.K. (19): Linguostatistik: Aufstellung von Sprachnormen oder Analyse des Redeprozesses? (Die Antinomie «Sprache-Rede» in der statistischen Lin- guistik). In: Ju.K. Orlov; M.G. Boroda, I.Š. Nadarešvili: Sprache, Text, Kunst. Quantitative Analysen. Brockmeyer, Bochum.