Academia.eduAcademia.edu

Outline

Is there a formula for formulaic language?

2015, Poznan Studies in Contemporary Linguistics

https://doi.org/10.1515/PSICL-2015-0019

Abstract

This paper focuses on detecting and measuring traces of "formulaic language". For this purpose, we test a number of computational formulae that quantify the degree to which a text type incorporates inflexible sequences of words. We assess these candidate indices using a number of reference corpora representing a wide variety of text types, both routine and creative. We adopt the concept of "phrase-frame" proposed by Fletcher (2002–2007) as a means of exploring phraseological pattern variability. To date, there have been few studies explicitly addressing this issue, with the exception of Roemer (2010). We examine ten productivity indices, including Roemer's VPR, the Herfindahl-Hirschman index, Simpson's diversity index and relative Shannon entropy. We report that a novel measure, which we term Hapaxity, best meets our criteria, and show how this index of micro-productivity (in phrase-frames) may be used to assess macro-productivity (in text registers), thu...

References (55)

  1. 0008706: ine drug screen in the opinion of the investigator the investigator may fol 0011068: of 4 weeks or in the judgment of the investigator the subject meets criter 0004073: the trial e 2.1 main objective of the trial the primary objective of this s 0004565: nse e 2.2 secondary objectives of the trial the secondary objectives of the ctps147.txt 0000436: 2009-013226-17 a 3 full title of the trial the enigma ii trial nitrous oxi ctps155.txt 0000427: 2009-015903-22 a 3 full title of the trial the effect of eicosapentaenoic 0004070: the trial e 2.1 main objective of the trial the study aims to show the firs ctps167.txt 0003842: the trial e 2.1 main objective of the trial the primary objective of this s ctps171.txt 0000425: 2007-005534-36 a 3 full title of the trial the effect of exenatide on sati ctps172.txt 0004806: the trial e 2.1 main objective of the trial the objective of this study is ctps186.txt 0000421: 2008-002336-15 a 3 full title of the trial the effect of prednisolone vers 0011749: one e 2.2 secondary objectives of the trial the secondary objectives of thi ctps192.txt 0008949: the trial e 2.1 main objective of the trial the primary objective of this s 0009182: ebo e 2.2 secondary objectives of the trial the secondary objectives are to ctps197.txt 0006141: the trial e 2.1 main objective of the trial the trial is conducted in two p ctps218.txt 0004052: the trial e 2.1 main objective of the trial the objectives of this study ar ctps229.txt 0004187: the trial e 2.1 main objective of the trial the primary aim of the scratch ctps240.txt 0008134: the trial e 2.1 main objective of the trial the objective of this trial is By contrast, results from the LOBCORP subsample are shown below. Only 2 of the 31 slot- fillers are repeated more than once. doctype : lobcorp (LOBCORP) of_the_*_the LA02.txt 0002959: akoradi for the past week root of the discontent the austerity budget inclu 0006373: eace and for inciting a breach of the peace the summonses say they are like 0011008: prisoned for inciting a breach of the peace the committee's president 89-y LA23.txt 0002011: st 11 lb 3 lb below the middle of the range the handicapper has certainly t LA24.txt 0003799: ee how it can be all the fault of the girls the secretary of great yarmouth LA27.txt 0007796: ains was extended to the whole of the island the measures had previously ap LB06.txt 0005825: nal issue by misrepresentation of the decisions the introduction of a pseud LC04.txt 0009864: was a splendid interpretation of the part the rest of the cast were well c LC12.txt 0007809: emes reflect the controversies of the age the quebec act with its threat of LD06.txt 0009316: o friday were days one to five of the timetable the following monday was da LD15.txt 0002448: among the english translations of the bible the design draws attention to s 0003851: among the english translations of the bible the commemoration editions will 0000884: r's guilt by the atoning blood of the redeemer the lord's anointed mercy is LE09.txt 0009692: first cave rushes a large part of the river the second penetrates under the LE10.txt 0002929: hange in the wind into the eye of the wind the fantail consists of what is LE34.txt 0010404: bly according to the condition of the hair the most common change of textur LF13.txt 0008350: most sensational murder trials of the century the defence had picked lawren LG05.txt 0008642: ision of aim at the very heart of the confusion the resolute but unbroken g LG13.txt 0006692: war were in his opinion things of the mind the real task was to get better LG27.txt 0000267: in her stead undisputed queen of the home the children and all official so LG43.txt 0004431: als and the cutting and making of the costumes the opera producer is called LG55.txt 0003743: worked upon now comes the turn of the emotions the object of study is now t 0011588: te in their operation the goal of the kgb the present designation for the r LG61.txt 0004710: be achieved by a modification of the image the simplified picture which go LJ03.txt 0005837: soil formation characteristic of the region the two groups of soils exempl LJ17.txt 0003312: pper chest wall and right side of the neck the following day he complained LJ64.txt 0007107: ge and was blinded in a street of the city the two sisters who had little l 0011068: seems coarser in the portrayal of the evangelists the bodies tend to disint 0008767: s the broadening of the planes of the face the empress and her consort are LK14.txt 0005819: m and she sat down on the side of the white the rumanian introduced her to LL23.txt 0003317: ee how this would help in most of the cases the case up that afternoon had 0009776: ige carpet like a continuation of the path the hall stand held one umbrella LN17.txt 0009991: the registered stuff said one of the men the man with the keys jerked his References
  2. Altenberg, B. 1998. "On the phraseology of spoken English: The evidence of recurrent word combinations". In A. Cowie (ed.), Phraseology: Theory, Analysis and Applications. Oxford: Oxford University Press. 101-122.
  3. Anthony, L. 2014. AntConc (ver. 3.4.1). Available at: http://www.antlab.sci.waseda.ac.jp/software/antconc3.2.4w.exe (accessed February 2014).
  4. Baayen, H. 1992. "Quantitative aspects of morphological productivity". In G. Booij & J. van Marle (Eds.), Yearbook of Morphology 1991. Berlin: Springer. 109-149.
  5. Baker, M. 1995. "Corpora in translation studies: An overview and some suggestions for future research". Target, 7 (2), 223-243.
  6. Baker, M. 1996. "Corpus-based translation studies: The challenges that lie ahead". In H. Somers (Ed.), Terminology, LSP and Translation: Studies in Language Engineering. In Honour of Juan C. Sager. Amsterdam: John Benjamins, 175-186.
  7. Bartmiński, J. 2007. "Stereotyp jako przedmiot lingwistyki". In: J. Bartmiński (ed.), Stereotypy mieszkają w języku. Lublin: Wydawnictwo UMCS. 53-71.
  8. Biber, D. 2009. "A corpus-driven approach to formulaic language in English: multi-word patterns in speech and writing". International Journal of Corpus Linguistics, 14 (3), 275- 311. Bolinger, D. 1965. "The atomization of meaning". Language, 41 (4), 555-573.
  9. Bouayad-Agha, N. and Kilgarriff, A. 1999. "Duplication in Corpora" In Proceedings of the 2nd CLUK Colloquium. Colchester, Essex, 11-12 Jan 1999. Available at: http://www.kilgarriff.co.uk/Publications/1999-BouayadAghaKilg-CLUK.pdf (accessed June 2012).
  10. Bouayad-Agha, N. 2006. The Patient Information Leaflet (PIL) 2.0 corpus. Available at: http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ (accessed May 2012).
  11. Chen, Y.-H. and Baker, P. 2010. "Lexical bundles in L1 and L2 academic writing". Language Learning and Technology, 14 (2), 30-49.
  12. Chesterman, A. 2004. "Hypothesis about translation universals". In G. Hansen, K. Malmkjaer & D. Gile (eds.), Claims, Changes and Challenges in Translation Studies. Amsterdam: John Benjamins. 1-13.
  13. Chlebda, W. 2003. Elementy frazematyki: wprowadzenie do frazeologii nadawcy. Łask: Leksem.
  14. Chomsky, N. 1972. Language and Mind [enlarged edition]. New York: Harcourt Brace Jovanovich.
  15. Corrigan, R., Moravcsik, E., Ouali, H. and Wheatley, K. (eds.) 2009. Formulaic Language. Vol. 1. Distribution and historical change. Amsterdam: John Benjamins.
  16. Eggins. S. 1994. An Introduction to Systemic Functional Linguistics. London: Pinter.
  17. Ellis, N.C., Roemer, U., Brook O'Donnell, M., Gries, S. & Wulff, S. 2009. "Measuring the formulaicity of language". Paper presented at colloquium SLA and the inseparability of vocabulary and syntax. Denver, Colorado, 21-24 Mar 2009. Available at: http://researchers.tistory.com/attachment/cfile7.uf@154A94334F197CE02E3025.pdf (accessed January 2012).
  18. Erman, B. and Warren, B. 2000. "The idiom principle and the open choice principle". Text, 20 (1), 29-62 (cited in Schmitt & Carter 2004: 1).
  19. Firth, J.R. 1968. "Linguistics and Translation". In: F. Palmer (ed.), Selected Papers of J. R. Firth 1952-1959, London: Longman, 84-95 (cited in Toolan 1996: 161-162 and Roemer 2010: 96).
  20. Fletcher, W. 2002-2007. KfNgram. Annapolis, MD: USNA. Available at: http://www.kwicfinder.com/kfNgram/kfNgramHelp.html (accessed November 2011).
  21. Foster, P. 2001. "Rules and routines: A consideration of their role in the task-based language production of native and non-native speakers". In: M. Bygate, P. Skehan, & M. Swain (eds.), Researching pedagogic tasks: Second language learning, teaching, and testing. Harlow: Longman. 75-93. (cited in Schmitt 2005: 14).
  22. Fuster-Marquez, M. 2014. "Lexical bundles and phrase frames in the language of hotel websites". English Text Construction, 7 (1), 84-121.
  23. Gerbig, A. 2011. "Key words and key phrases in a corpus of travel writing". In: M. Bondi & M. Scott (eds.), Keyness in Texts. Amsterdam: John Benjamins. 147-168.
  24. Grabowski, Ł. 2015. "Keywords and lexical bundles within English pharmaceutical discourse: a corpus-driven description". English for Specific Purposes, 38: 23-33.
  25. Grabowski, Ł. 2015b. Phraseology in English pharmaceutical discourse: A corpus driven study of register variation. Opole: Wydawnictwo Uniwersytetu Opolskiego.
  26. Gray, B. and Biber, D. 2013. "Lexical frames in academic prose and conversation". International Journal of Corpus Linguistics, 18 (1), 109-135.
  27. Halliday, M.A.K. 2014. That "certain cut": towards a characterology of Mandarin Chinese. Functional Linguistics, 1(2), doi:10.1186/2196-419X-1-2 .
  28. Herdan, G. 1964. Quantitative Linguistics. London: Butterworth.
  29. Hirschman, A. 1964. "The Paternity of an Index". The American Economic Review (American Economic Association), 54 (5), 761.
  30. Hofland, K. and Johansson, S. 1982. Word frequencies in British and American English. Bergen: Norwegian Computing Centre for the Humanities/London: Longman.
  31. Hyland, K. 2008. "As can be seen: Lexical bundles and disciplinary variation". English for Specific Purposes, 27, 4-21.
  32. Kilgarriff, A. 2005. "Language is never ever ever random". Corpus Linguistics and Linguistic Theory, 1 (2), 263-276.
  33. Kuiper, K. 1996. Smooth Talkers: The Linguistic Performance of Auctioneers and Sportscasters. New York: Erlbaum (also cited in Wray 2002: 17).
  34. Lancioni, G. 2009. "Formulaic models and formulaicity in Classical and Modern Standard Arabic". In: R. Corrigan, E. Moravcsik, H. Ouali & K. Wheatley (eds.), Formulaic Language. Vol. 1. Distribution and historical change. Amsterdam: John Benjamins. 219- 238.
  35. Pawley, A. 2009. "Grammarians' languages versus humanists' languages and the place of speech act formula in models of linguistic competence." In: R. Corrigan, E. Moravcsik, H. Ouali & K. Wheatley (eds.), Formulaic Language. Vol. 1. Distribution and historical change. Amsterdam: John Benjamins. 3-26.
  36. Permyakov, G. 1970. От поговорки до сказки (Заметки по общей теории клише) [From Sayings to Fairytales. Notes on General Cliché Theory]. Moscow: Nauka (cited in Chlebda 2003: 25).
  37. Ren, Z, Lu,Y., Cao, J., Liu, Q. & Huang, Y. 2009. "Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions." Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications.
  38. MWE' 09, 47-54. Stroudsburg: Association for Computational Linguistics. Available at: http://www.aclweb.org/anthology/W09-2907 (accessed November 2014).
  39. Roemer, U. 2009. "English in Academia: Does Nativeness Matter?". Anglistik: International Journal of English Studies, 20 (2), 89-100.
  40. Roemer, U. 2010. "Establishing the phraseological profile of a text type. The construction of meaning in academic book reviews". English Text Construction, 3 (1), 95-119.
  41. Schmitt, N. 2005. "Formulaic language: fixed and varied. Estudios de linguistica aplicada (ELIA), 6, 13-39. Available at: http://institucional.us.es/revistas/elia/6/art.2.pdf (accessed November 2014).
  42. Schmitt, N. and Carter, R. 2004. "Formulaic sequences in action: An introduction". In N. Schmitt (ed.), Formulaic Sequences: Acquisition, Processing and Use. Amsterdam: John Benjamins, 1-22.
  43. Shannon, Claude E. 1948. A mathematical theory of communication. The Bell System Technical Journal, 27(3): 379-405.
  44. Simpson, E.H. 1949. Measurement of diversity. Nature, 163, 688.
  45. Simpson-Vlach, R. and Ellis, N.C. 2010. An Academic Formulas List: New Methods in Phraseology Research. Applied Linguistics, 31 (4): 487-512.
  46. Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.
  47. Tiedemann, J. 2009. "News from OPUS -A Collection of Multilingual Parallel Corpora with Tools and Interfaces". In: N. Nicolov, K. Bontcheva, G. Angelova and R. Mitkov (eds.), Recent Advances in Natural Language Processing, 5. Amsterdam/Philadelphia: John Benjamins. 237-248.
  48. Underwood, G., Schmitt, N. and Galpin, A. 2004. "The eyes have it: An eye-movement study into the processing of formulaic sequences". In: N. Schmitt (ed.), Formulaic Sequences: Acquisition, Processing and Use. Amsterdam: John Benjamins. 153-172.
  49. Upton, G. & Cook, I. 2006. Oxford Dictionary of Statistics. Oxford: Oxford University Press.
  50. Wood, D. (ed.) 2010a. Perspectives on Formulaic Language: Acquisition and Communication. London: Continuum.
  51. Wood, D. (ed.) 2010b. Formulaic Language and Second Language Speech Fluency. Background, Evidence and Classroom Applications. London: Continuum.
  52. Wray, A. 2002. Formulaic language and the lexicon. Cambridge: Cambridge University Press.
  53. Wray, A. 2008. Formulaic language. Pushing the boundaries. Oxford: Oxford University Press.
  54. Wray, A. 2009. "Identifying formulaic language. Persistent challenges and new opportunities". In: R. Corrigan, E. Moravcsik, H. Ouali & K. Wheatley (eds.). Formulaic Language. Vol. 1. Distribution and historical change. Amsterdam: John Benjamins. 27-51.
  55. Wray, A. & Perkins, M. 2000. The functions of formulaic language: an integrated model. Language & Communication, 20, 1-28.