Abstract
In this article, we present the results of a corpus-based study where we explore whether it is possible to automatically single out different facets of text complexity in a general-purpose corpus. To this end, we use factor analysis as applied in Biber’s multi-dimensional analysis framework. We evaluate the results of the factor solution by correlating factor scores and readability scores to ascertain whether the selected factor solution matches the independent measurement of readability, which is a notion tightly linked to text complexity. The corpus used in the study is the Swedish national corpus, calledStockholm-Umeå Corpusor SUC. The SUC contains subject-based text varieties (e.g., hobby), press genres (e.g., editorials), and mixed categories (e.g., miscellaneous). We refer to them collectively as ‘registers’. Results show that it is indeed possible to elicit and interpret facets of text complexity using factor analysis despite some caveats. We propose a tentative text complexi...
References (36)
- Adesam, Y., Bouma, G. and Johansson, R. ( ). The Koala part-of-speechand morphological tagset for Swedish. SLTC.
- Asención-Delaney, Y., & Collentine, J. ( ). A multidimensional analysis of a written L Spanish corpus. Applied linguistics, ( ), -. Biber, D. ( ). Variation across speech and writing. Cambridge University Press. https://doi.org/./CBO Biber, D. ( ). A typology of English texts. Linguistics, ( ), -.
- Biber, D. ( ). Dimensions of register variation: A cross-linguistic comparison. Cambridge University Press. https://doi.org/./CBO Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. ( ). Longman grammar of spoken and written English. Longman. Biber, D., & Kurjian, J. ( ). Towards a taxonomy of web registers and text types: A multi- dimensional analysis. In Corpus Linguistics and the Web (pp. -). https://doi.org/./_ Biber, D., & Conrad, S. ( ). Register, genre, and style. Cambridge University Press. https://doi.org/./CBO Biber, D., & Egbert, J. ( ). Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics, ( ), -. https://doi.org/./ Björnsson, C. H. ( ). Läsbarhet. Liber.
- Cattell, R. B. ( ). The scree test for the number of factors. Multivariate behavioral research, ( ), -. https://doi.org/./smbr_
- Collins-Thompson, K. ( ). Computational assessment of text readability: A survey of current and future research. ITL-International Journal of Applied Linguistics, ( ), -. https://, Science, and Technical Subjects. Appendix A: Research Supporting Key Elements of the Standards, Glossary of Key Terms. Pinning down text complexity: An exploratory study [ ]
- Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., Zasina, A. J., & Benko, V. ( ). Comparing web-crawled and traditional corpora. Language Resources and Evaluation, -.
- Dahl, Ö. ( ). The growth and maintenance of linguistic complexity (Vol. ). John Benjamins Publishing. https://doi.org/./slcs. Dale, E., & Chall, J. S. ( ). The concept of readability. Elementary English, ( ), -.
- Dell'Orletta, F., Montemagni, S., & Venturi, G. ( ), September). Linguistic pro ling of texts across textual genres and readability levels. An exploratory study on Italian ctional prose. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP (pp. -).
- Dell'Orletta, F., Montemagni, S., & Venturi, G. ( ). Assessing document and sentence readability in less resourced languages and across textual genres. ITL-International Journal of Applied Linguistics, ( ), -. https://doi.org/./itl...del DiStefano, C., Zhu, M., & Mindrila, D. ( ). Understanding and using factor scores: Considerations for the applied researcher. Practical Assessment, Research & Evaluation, ( ), -.
- Fahlborg, D., & Rennes, E. ( ). Introducing SAPIS-an API service for text analysis and simpli cation. In the second national Swe-Clarin workshop: Research collaborations for the digital age, Umeå, Sweden.
- Falkenjack, J. ( ). Towards a model of general text complexity for Swedish (Doctoral dissertation, Linköping University Electronic Press).
- Falkenjack, J., Mühlenbock, K. H., & Jönsson, A. ( ), May). Features indicating readability in Swedish text. In Proceedings of the th Nordic Conference of Computational Linguistics (NODALIDA ) (pp. -).
- Falkenjack, J., Santini, M., & Jönsson, A. ( ). An exploratory study on genre classi cation using readability features. In Proceedings of the Sixth Swedish Language Technology Conference (SLTC ), Umeå, Sweden.
- Feng, L. ( ). Automatic readability assessment (Doctoral dissertation, CUNY Academic Works). Field, A. ( ). Discovering statistics using SPSS for Windows. Londra: Sage Publication. Flesch, R. ( ). A new readibility yardstick. Journal of Applied Psychology, ( ): -. https://doi.org/./h
- Field, A., Miles, J., & Field, Z. ( ). Discovering statistics using R. Sage publications.
- Hayton, J. C., Allen, D. G., & Scarpello, V. ( ). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational research methods, ( ), -. https://doi.org/./
- Hiebert, E. H. ( ). Readability and the common core's staircase of text complexity. Text Matters, .
- Horn, J. L. ( ). A rationale and test for the number of factors in factor analysis.
- Psychometrika , -. https://doi.org/./BF
- Housen, A., De Clercq, B., Kuiken, F., & Vedder, I. ( ). Multiple approaches to complexity in second language research. Second Language Research, ( ), -. https://doi.org/./ Jelen, B. ( ). Excel charts and graphs. Que Publishing Company.
- Jönsson, S., Rennes, E., Falkenjack, J., & Jönsson, A. ( ). A component based approach to measuring text complexity. In Proceedings of The Seventh Swedish Language Technology Conference (SLTC-).
- Marina Santini & Arne Jönsson Kate, R. J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R. J., & Welty, C. ( ), August). Learning to predict readability using diverse linguistic features. In Proceedings of the rd international conference on computational linguistics (pp. -). Association for Computational Linguistics.
- Källgren, G., Gustafson-Capková, S., & Hartmann, B. ( ). Manual of the Stockholm Umeå Corpus version . . Department of Linguistics, Stockholm University, December. So a Gustafson-Capková and Britt Hartmann (eds.).
- Ledesma, R. D., Valero-Mora, P., & Macbeth, G. ( ). The scree test and the number of factors: a dynamic graphics approach. The Spanish journal of psychology, . https://doi.org/./sjp.. Lu, X. ( ). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, ( ), -. https://doi.org/./ijcl...lu Mühlenbock, K. H. ( ). I see what you mean: Assessing readability for speci c target groups. (Doctoral dissertation, University of Gothenburg, Gothenburg, Sweden). Napolitano, D., Sheehan, K. M., & Mundkowsky, R. ( ), June). Online readability and text complexity analysis with Text Evaluator. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations (pp. -).
- Nenkova, A., Chae, J., Louis, A., & Pitler, E. ( ). Structural features for predicting the linguistic quality of text. In Empirical methods in natural language generation (pp. -).
- Springer, Berlin, Heidelberg. https://doi.org/./----_ Nivre, J. ( ). Inductive dependency parsing (pp. -). Springer Netherlands. https://doi.org/./---_ Pallotti, G. ( ). A simple view of linguistic complexity. Second Language Research, ( ), -. https://doi.org/./ Petersen, S. ( ). Natural language processing tools for reading level assessment and text simpli cation for bilingual education. (Doctoral dissertation, University of Washington, Seattle, WA, USA).
- Petersen, S. E., & Ostendorf, M. ( ). A machine learning approach to reading level assessment. Computer Speech & Language, ( ), -. https://doi.org/./j.csl... Pilán, I., Vajjala, S., & Volodina, E. ( ). A readable read: Automatic assessment of language learning materials based on linguistic complexity. arXiv preprint arXiv: . .
- Pitler, E., & Nenkova, A. ( ), October). Revisiting readability: A uni ed framework for predicting text quality. In Proceedings of the conference on empirical methods in natural language processing (pp. -).
- Rello, L., Baeza-Yates, R., Bott, S., & Saggion, H. ( a). Simplify or help? Text simpli cation strategies for people with dyslexia. In Proceedings of the th International Cross- Disciplinary Conference on Web Accessibility (pp. -).
- Rello, L., Baeza-Yates, R., Dempere-Marco, L., and Saggion, H. ( b). Frequent words improve readability and short words improve understandability for people with dyslexia. In IFIP Conference on Human-Computer Interaction (pp. -. Springer.
- Saggion, H. ( ). Automatic text simpli cation. Synthesis Lectures on Human Language Technologies, ( ), -. https://doi.org/./SEDVYHLT Pinning down text complexity: An exploratory study [ ]
- Santini, M., Danielsson, B., & Jönsson, A. ( ), August). Introducing the Notion of 'Contrast'Features for Language Technology. In International Conference on Database and Expert Systems Applications (pp. -).
- Springer, Cham. https://doi.org/./----_ Sardinha, T. B., Kau mann, C., & Acunzo, C. M. ( ). A multi-dimensional analysis of register variation in Brazilian Portuguese. Corpora, ( ), -. https://doi.org/./cor..
- Sardinha, T. B., & Pinto, M. V. (Eds.). ( ). Multi-dimensional analysis, years on: A tribute to Douglas Biber (Vol. ). John Benjamins Publishing Company. https://doi.org/./scl. Štajner, S., & Saggion, H. ( ), August). Data-Driven Text Simpli cation. In Proceedings of the th International Conference on Computational Linguistics: Tutorial Abstracts (pp. -).
- Vega, B., Feng, S., Lehman, B., Graesser, A., & D'Mello, S. ( ), July). Reading into the text: Investigating the in uence of text complexity on cognitive engagement. In Educational Data Mining .
- Wray, D., & Janan, D. ( ). Readability revisited? The implications of text complexity Published in The Curriculum Journal, . https://doi.org/./..