Academia.eduAcademia.edu

Outline

Approximate Entropy in Canonical and Non-Canonical Fiction

2022, Entropy

https://doi.org/10.3390/E24020278

Abstract

Computational textual aesthetics aims at studying observable differences between aesthetic categories of text. We use Approximate Entropy to measure the (un)predictability in two aesthetic text categories, i.e., canonical fiction (‘classics’) and non-canonical fiction (with lower prestige). Approximate Entropy is determined for series derived from sentence-length values and the distribution of part-of-speech-tags in windows of texts. For comparison, we also include a sample of non-fictional texts. Moreover, we use Shannon Entropy to estimate degrees of (un)predictability due to frequency distributions in the entire text. Our results show that the Approximate Entropy values can better differentiate canonical from non-canonical texts compared with Shannon Entropy, which is not true for the classification of fictional vs. expository prose. Canonical and non-canonical texts thus differ in sequential structure, while inter-genre differences are a matter of the overall distribution of loc...

References (74)

  1. Craig, H.; Kinney, A.F. Shakespeare, Computers, and the Mystery of Authorship; Cambridge University Press: Cambridge, UK, 2009.
  2. Koppel, M.; Schler, J.; Argamon, S. Computational Methods in Authorship Attribution. J. Am. Soc. Inf. Sci. Technol. 2009, 60, 9-26.
  3. Biber, D. Dimensions of Register Variation. A Cross-linguistic Comparison; Cambridge University Press: Cambridge, UK, 1995.
  4. Lee, D. Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle. Technology 2001, 5, 37-72. [CrossRef]
  5. Fechner, G.T. Vorschule der Ästhetik; Breitkopf and Härtel: Leipzig, Germany, 1876.
  6. Bell, C. Art; Chatoo & Windus: London, UK, 1914.
  7. Redies, C.; Brachmann, A.; Wagemans, J. High Entropy of Edge Orientations Characterizes Visual Artworks From Diverse Cultural Backgrounds. Vis. Res. 2017, 133, 130-144. [CrossRef]
  8. Brachmann, A.; Redies, C. Computational and Experimental Approaches to Visual Aesthetics. Front. Comput. Neurosci. 2017, 11, 102. [CrossRef]
  9. Mohseni, M.; Gast, V.; Redies, C. Fractality and Variability in Canonical and Non-Canonical English Fiction and in Non-Fictional Texts. Front. Psychol. 2021, 12, 920. [CrossRef]
  10. Diessel, H. The Grammar Network. How Linguistic Structure is Shaped by Language Use; Cambridge University Press: Cambridge, UK, 2019.
  11. Hartung, F.; Wang, Y.; Mak, M.; Willems, R.; Chatterjee, A. Aesthetic Appraisals of Literary Style and Emotional Intensity in Narrative Engagement Are Neurally Dissociable. Commun. Biol. 2021, 4, 1401. [CrossRef]
  12. Simonton, D.K. Lexical Choices and Aesthetic Success: A Computer Content Analysis of 154 Shakespeare Sonnets. Comput. Humanit. 1990, 24, 251-264. [CrossRef]
  13. Forsyth, R.S. Pops and Flops: Some Properties of Famous English Poems. Empir. Stud. Arts 2000, 18, 49-67. [CrossRef]
  14. Kao, J.; Jurafsky, D. A Computational Analysis of Style, Affect, and Imagery in Contemporary Poetry. In Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature, Montreal, QC, Canada, 8 June 2012.
  15. Ashok, V.; Feng, S.; Choi, Y. Success With Style: Using Writing Style to Predict the Success of Novels. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18-21 October 2013.
  16. Maharjan, S.; Arevalo, J.; Montes, M.; González, F.; Solorio, T. A Multi-task Approach to Predict Likability of Books. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 1217-1227. [CrossRef]
  17. Maharjan, S.; Kar, S.; Montes, M.; González, F.A.; Solorio, T. Letting Emotions Flow: Success Prediction by Modeling the Flow of Emotions in Books. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 259-265. [CrossRef]
  18. Montemurro, M.A.; Zanette, D.H. Universal Entropy of Word Ordering Across Linguistic Families. PLoS ONE 2011, 6, e19875. [CrossRef]
  19. Montemurro, M.A.; Zanette, D.H. Complexity and Universality in the Long-Range Order of Words. In Creativity and Universality in Language; Degli Esposti, M., Altmann, E.G., Pachet, F., Eds.; Springer: Cham, Germany, 2016; pp. 27-41. [CrossRef]
  20. Futrell, R.; Mahowald, K.; Gibson, E. Quantifying Word Order Freedom in Dependency Corpora. In Proceedings of the Third International Conference on Dependency Linguistics; Uppsala University Press: Uppsala, Sweden, 2015; pp. 91-100.
  21. Koplenig, A.; Meyer, P.; Wolfer, S.; Müller-Spitzer, C. The Statistical Trade-off Between Word Order and Word Structure-Large- Scale Evidence for the Principle of Least Effort. PLoS ONE 2017, 12, e0173614. [CrossRef]
  22. Piantadosi, S.T.; Tily, H.; Gibson, E. Word Lengths Are Optimized for Efficient Communication. Proc. Natl. Acad. Sci. USA 2011, 108, 3526-3529. [CrossRef]
  23. Mahowald, K.; Fedorenko, E.; Piantadosi, S.T.; Gibson, E. Info/Information Theory: Speakers Choose Shorter Words in Predictive Contexts. Cognition 2013, 126, 313-318. [CrossRef]
  24. Ferrer-i-Cancho, R.; Bentz, C.; Seguin, C. Compression and the Origins of Zipf's Law of Abbreviation. arXiv 2015, arXiv:1504.04884.
  25. Kanwal, J.; Smith, K.; Culbertson, J.; Kirby, S. Zipf's Law of Abbreviation and the Principle of Least Effort: Language users optimise a miniature lexicon for efficient communication. Cognition 2017, 165, 45-52. [CrossRef]
  26. Bentz, C.; Verkerk, A.; Kiela, D.; Hill, F.; Buttery, P. Adaptive Communication: Languages with More Non-Native Speakers Tend to Have Fewer Word Forms. PLoS ONE 2015, 10, e0128254. [CrossRef]
  27. Kalimeri, M.; Constantoudis, V.; Papadimitriou, C.; Karamanos, K.; Diakonos, F.K.; Papageorgiou, H. Word-length Entropies and Correlations of Natural Language Written Texts. J. Quant. Linguist. 2015, 22, 101-118. [CrossRef]
  28. Ehret, K.; Szmrecsanyi, B. An Information-Theoretic Approach to Assess Linguistic Complexity. In Complexity, Isolation, and Variation; Baechler, R., Seiler, G., Eds.; De Gruyter: Berlin, Germany, 2016; pp. 71-94. [CrossRef]
  29. Hernández-Gómez, C.; Basurto-Flores, R.; Obregón-Quintana, B.; Guzmán-Vargas, L. Evaluating the Irregularity of Natural Languages. Entropy 2017, 19, 521. [CrossRef]
  30. Bentz, C.; Alikaniotis, D.; Cysouw, M.; Ferrer-i Cancho, R. The Entropy of Words-Learnability and Expressivity across More than 1000 Languages. Entropy 2017, 19. [CrossRef]
  31. Febres, G.; Jaffe, K. Quantifying Structure Differences in Literature Using Symbolic Diversity and Entropy Criteria. J. Quant. Linguist. 2017, 24, 16-53. [CrossRef]
  32. Chang, M.C.; Yang, A.C.C.; Stanley, H.E.; Peng, C.K. Measuring Information-Based Energy and Temperature of Literary Texts. Phys. A Stat. Mech. Appl. 2017, 468, 783-789. [CrossRef]
  33. Dro żd ż, S.; Oświe ¸cimka, P.; Kulig, A.; Kwapie ń, J.; Bazarnik, K.; Grabska-Gradzi ńska, I.; Rybicki, J.; Stanuszek, M. Quantifying Origin and Character of Long-Range Correlations in Narrative Texts. Inf. Sci. 2016, 331, 32-44. [CrossRef]
  34. Zipf, G.K. Human Behavior and the Principle of Least Effort; Addison-Wesley Press: Cambridge, MA, USA, 1949.
  35. Ferrer i Cancho, R.; Solé, R. Least Effort and the Origins of Scaling in Human Language. Proc. Natl. Acad. Sci. USA 2003, 100, 788-791. [CrossRef] [PubMed]
  36. Gold, B.P.; Pearce, M.T.; Mas-Herrero, E.; Dagher, A.; Zatorre, R.J. Predictability and Uncertainty in the Pleasure of Music: A Reward for Learning? J. Neurosci. 2019, 39, 9397-9409. [CrossRef] [PubMed]
  37. Koelsch, S.; Vuust, P.; Friston, K. Predictive Processes and the Peculiar Case of Music. Trends Cogn. Sci. 2019, 23, 63-77. [CrossRef] [PubMed]
  38. Guillory, J. Canonical and Non-canonical: A Critique of the Current Debate. ELH 1987, 54, 452-483. [CrossRef]
  39. Even-Zohar, I. Polysystem Studies. Poet. Today 1990, 11, 9-26. [CrossRef]
  40. Underwood, T.; Sellers, J. The Long Durée of Literary Prestige. Mod. Lang. Q. 2016, 77, 321-344. [CrossRef]
  41. Gerlach, M.; Font-Clos, F. A Standardized Project Gutenberg Corpus for Statistical Analysis of Natural Language and Quantitative Linguistics. Entropy 2020, 22, 126. [CrossRef]
  42. Green, C. Introducing the Corpus of the Canon of Western Literature: A Corpus for Culturomics and Stylistics. Lang. Lit. 2017, 26, 282-299. [CrossRef]
  43. Bloom, H. The Western Canon: The Books and School of the Ages; Harcourt: New York, NY, USA, 1994.
  44. Reagan, A.J.; Mitchell, L.; Kiley, D.; Danforth, C.M.; Dodds, P.S. The Emotional Arcs of Stories Are Dominated by Six Basic Shapes. EPJ Data Sci. 2016, 5, 31. [CrossRef]
  45. Qi, P.; Zhang, Y.; Zhang, Y.; Bolton, J.; Manning, C.D. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 101-108. [CrossRef]
  46. Smith, C. Modes of Discourse. The Local Structure of Texts; Cambridge University Press: Cambridge, UK, 2003.
  47. Eisen, M.; Ribeiro, A.; Segarra, S.; Egan, G. Stylometric analysis of Early Modern period English plays. Digit. Scholarsh. Humanit. 2017, 33, 500-528. [CrossRef]
  48. Segarra, S.; Eisen, M.; Ribeiro, A. Authorship Attribution Through Function Word Adjacency Networks. IEEE Trans. Signal Process. 2015, 63, 5464-5478. [CrossRef]
  49. Brown, P.; Eisen, M.; Segarra, S.; Ribeiro, A.; Egan, G. How the Word Adjacency Network Algorithm Works. Digit. Scholarsh. Humanit. 2021. [CrossRef]
  50. Pincus, S.M. Approximate Entropy as a Measure of System Complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297-2301. [CrossRef]
  51. Li, X.; Cui, S.; Voss, L. Using Permutation Entropy to Measure the Electroencephalographic Effects of Sevoflurane. Anesthesiology 2008, 109, 448-456. [CrossRef]
  52. Hayashi, K.; Shigemi, K.; Sawa, T. Neonatal Electroencephalography Shows Low Sensitivity to Anesthesia. Neurosci. Lett. 2012, 517, 87-91. [CrossRef]
  53. Lee, G.; Fattinger, S.; Mouthon, A.L.; Noirhomme, Q.; Huber, R. Electroencephalogram Approximate Entropy Influenced by Both Age and Sleep. Front. Neuroinformatics 2013, 7, 33. [CrossRef]
  54. Richman, J.; Moorman, J. Physiological Time-Series Analysis Using Approximate Entropy and Sample Entropy. Am. J. Physiol. Heart Circ. Physiol. 2000, 278, H2039-H2049. [CrossRef]
  55. Costa, M.; Goldberger, A.L.; Peng, C.K. Multiscale Entropy Analysis of Biological Signals. Phys. Rev. E 2005, 71, 021906. [CrossRef]
  56. Ahmed, M.; Mandic, D. Multivariate Multiscale Entropy: A Tool for Complexity Analysis of Multichannel Data. Phys. Rev. Stat. Nonlinear Soft Matter Phys. 2011, 84, 061918. [CrossRef]
  57. Makowski, D.; Pham, T.; Lau, Z.J.; Brammer, J.C.; Lespinasse, F.; Pham, H.; Schölzel, C.; Chen, S.H.A. NeuroKit2: A Python Toolbox for Neurophysiological Signal Processing. Behav. Res. Methods 2021, 53, 1689-1696. [CrossRef]
  58. Zar, J.H. Biostatistical Analysis, 5 ed.; Pearson: Upper Saddle River, NJ, USA, 2010.
  59. Dietterich, T.G. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput. 1998, 10, 1895-1923. [CrossRef]
  60. van Cranenburgh, A.; Ketzan, E. Stylometric Literariness Classification: The Case of Stephen King. In Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 189-197.
  61. Menzerath, P.; de Oleza, J. Spanische Lautdauer. Eine experimentelle Untersuchung; de Gruyter: Berlin, Germany, 1928.
  62. Menzerath, P. Die Architektonik des deutschen Wortschatzes; Dümmler: Bonn, Germany, 1954.
  63. Altmann, G. Prolegomena to Menzerath's law. Glottometrika 1980, 2, 1-10.
  64. Semple, S.; i Cancho, R.F.; Gustison, M.L. Linguistic laws in biology. Trends Ecol. Evol. 2020, 37, 53-66. [CrossRef]
  65. Sellis, D. menzerath: Explore Data Following The Menzerath-Altmann Law. R Package Version 0.1.2. 2022. Available online: http://cran.r-project.org/web/packages/mvnfast/vignettes/mvnfast.html (accessed on 8 February 2022).
  66. Cortez, P.; Embrechts, M.J. Using Sensitivity Analysis and Visualization Techniques to Open Black Box Data Mining Models. Inf. Sci. 2013, 225, 1-17. [CrossRef]
  67. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993-1022.
  68. Kernot, D. Can Three Pronouns Discriminate Identity in Writing? In Data and Decision Sciences in Action; Springer: Cham, Switzerland, 2018; pp. 397-411. [CrossRef]
  69. Yu, B. An Evaluation of Text Classification Methods for Literary Study. Lit. Linguist. Comput. 2008, 23, 327-343. [CrossRef]
  70. Qureshi, M.R.; Ranjan, S.; Rajkumar, R.; Shah, K. A Simple Approach to Classify Fictional and Non-Fictional Genres. In Proceedings of the Second Workshop on Storytelling; Association for Computational Linguistics: Florence, Italy, 2019; pp. 81-89.
  71. Grebenkina, M.; Brachmann, A.; Bertamini, M.; Kaduhm, A.; Redies, C. Edge-Orientation Entropy Predicts Preference for Diverse Types of Man-Made Images. Front. Neurosci. 2018, 12, 678. [CrossRef]
  72. Stanischewski, S.; Altmann, C.S.; Brachmann, A.; Redies, C. Aesthetic Perception of Line Patterns: Effect of Edge-Orientation Entropy and Curvilinear Shape. i-Perception 2020, 11. [CrossRef] [PubMed]
  73. Kraus, N. The Joyful Reduction of Uncertainty: Music Perception as a Window to Predictive Neuronal Processing. J. Neurosci. 2020, 40, 2790-2792. [CrossRef]
  74. Salimpoor, V.N.; Zald, D.H.; Zatorre, R.J.; Dagher, A.; McIntosh, A.R. Predictions and the Brain: How Musical Sounds Become Rewarding. Trends Cogn. Sci. 2015, 19, 86-91. [CrossRef] [PubMed]