Papers by Adriano Ferraresi
Miss Man? Languaging Gendered Bodies, 2018
The present contribution investigates the discursive representation of transgender people in two ... more The present contribution investigates the discursive representation of transgender people in two citizen journalism platforms, Global Voices and the Digital Journal, setting them against the representation gleaned from British quality and popular newspapers. The aim is to assess whether and to what extent negative stereotypes concerning transgender people that have been previously observed in traditional news media are reproduced, contested or overturned in grassroots journalism. The methodological framework adopted is that of corpus-based discourse analysis, a quantitative and qualitative approach to text analysis aiming to reduce researcher’s bias in the search for meaningful regularities in the discourse under study.
This contribution focuses on didactic applications of intermodal corpora, i.e. corpora featuring ... more This contribution focuses on didactic applications of intermodal corpora, i.e. corpora featuring interpreted and translated language. It relies on EPTIC, a multiple-translation and intermodal parallel corpus containing EU Parliament plenary speeches in Italian and English. The peculiar nature of EPTIC allows the investigation of a set of translational alternatives which are distinguished by modality-and task-based constraints (written vs. oral, translated vs. interpreted). To exemplify the potential of such corpus evidence, teaching activities focusing on collocations are proposed that encourage students to reflect on the decision-making processes involved in the slower-paced, reflective task of translation, vs. the faster, more automatic one of interpreting. A method is also described that can facilitate the selection of relevant didactic examples.

This article introduces EPTIC (the European Parliament Translation and Interpreting Corpus), a ne... more This article introduces EPTIC (the European Parliament Translation and Interpreting Corpus), a new bidirectional (English<>Italian) corpus of interpreted and translated EU Parliament proceedings. Built as an extension of the English<>Italian subsection of EPIC (the European Parliament Interpreting Corpus), EPTIC is an intermodal corpus featuring the pseudo-parallel outputs of interpreting and translation processes, aligned to each other and to the corresponding source texts (speeches by MEPs and their written up versions). As a first attempt at unearthing the potential of EPTIC, we investigate lexical simplification replicating the methodology proposed by Laviosa (1998a, 1998b), but extending it to encompass both a monolingual comparable and an intermodal perspective. Our results indicate that the mediation process reduces complexity in both modes of language production and both language directions, with interpreters simplifying the input more than translators, and evidence of simplification being more lexical in English and more lexico-syntactic in Italian.

University registers of an institutional kind −e.g. course syllabi and university brochures− are ... more University registers of an institutional kind −e.g. course syllabi and university brochures− are increasingly attracting scholarly attention. Research so far has focused on native texts, yet it has been suggested that "in order to understand the use of English in present-day academic communities, it is vital to look at English as a lingua franca" (Mauranen 2010). Indeed, universities in non-English speaking countries worldwide also use English to communicate with their stakeholders, trying to stand out in the global educational market. This chapter pursues two inter-related aims: first, it introduces acWaC-EU (an acronym for “academic Web-as-Corpus in Europe”), a 90-million word corpus of institutional academic texts in English, collected using semi-automatic procedures from the websites of European universities; second, it aims to provide a preliminary characterization of the native and lingua franca varieties represented in the corpus with respect to their phraseology. Drawing on Durrant and Schmidt (2009), and focusing on the genre of homepages, we extract contiguous pre-modifier + noun sequences from the native and the comparable lingua franca subcorpora; to ensure greater homogeneity in the latter group, we only take into account texts produced in EU countries with a Romance L1 background (e.g. Italy and France). Deriving frequency data from ukWaC, we classify word sequences according to three criteria: frequent vs. infrequent/unattested combinations, and “strong” vs. “weak” collocations based on two association measures, i.e. t-score and MI. Finally, we compare the degree to which the native/lingua franca varieties represented in the corpus rely on different types of word combinations. Results point to a significant overuse of infrequent combinations and underuse of strongly associated collocations in lingua franca texts. The chapter discusses these results and their relevance for research on institutional academic English and in general for native vs. non-native use of phraseology.

As a result of the European Union's pressure towards internationalization, universities in many c... more As a result of the European Union's pressure towards internationalization, universities in many countries find themselves increasingly urged to provide information on their requirements and services and to promote themselves in English on the web. Hence the need for corpus resources and studies of institutional academic English used as an international language (or lin-gua franca) on the web. This paper introduces "acWaC-EU" (an acronym for " academic Web-as-Corpus in Europe "), a corpus of web pages in English crawled from the websites of European universities and annotated with contextual metadata. The corpus contains approximately 40 million words from native English universities and a similar number of words from universities based in all other European countries, in which English is used as a lingua franca. Thanks to the metadata, it is possible to regroup texts for comparison based, e.g., on the language family of the native language spoken in the country where the text was produced. The paper describes and evaluates the corpus construction pipeline and the corpus itself, presents a case study on the use of modal and semi-modal verbs in lingua franca vs. native texts, and looks at future developments, in particular as concerns simple heuristics for topic-/genre-oriented subcorpus construction.

Use of corpora by language service providers and language professionals remains limited due to th... more Use of corpora by language service providers and language professionals remains limited due to the existence of competing resources that are likely to be perceived as less demanding in terms of time and effort required to obtain and (learn to) use them (e.g. translation memory software, term bases and so forth). These resources however have limitations that could be compensated for through the integration of comparable corpora and corpus building tools in the translator’s toolkit. This chapter provides an overview of the ways in which different types of comparable corpora can be used in translation teaching and practice. First, two traditional corpus typologies are presented, namely small and specialized “handmade” corpora collected by end-users themselves for a specific task, and large and general “manufactured” corpora collected by expert teams and made available to end users. We suggest that striking a middleground between these two opposites is vital for professional uptake. To this end, we show how the BootCaT toolkit can be used to construct largish and relatively specialized comparable corpora for a specific translation task, and how, varying the search parameters in very simple ways, the size and usability of the corpora thus constructed can be further increased. The process is exemplified with reference to a simulated task (the translation of a patient information leaflet from English into Italian) and its efficacy is evaluated through an end-user questionnaire.

Meta: Journal des traducteurs/ …, Jan 1, 2011
Le présent article a pour objet la caractérisation de traits spécifiques de textes traduits : s'a... more Le présent article a pour objet la caractérisation de traits spécifiques de textes traduits : s'appuyant sur une expérience didactique, nous avons étudié l'emploi d'anglicismes dans des textes traduits ou non, dans le domaine de l'informatique. Le corpus utilisé à cette fin est composé de trois parties : des textes rédigés directement en italien, des textes sources rédigés en anglais, ainsi que les traductions de ces derniers. Les textes sources et cibles forment un corpus parallèle, tandis que les deux sous-corpus en italien forment un corpus comparable. Dans celui-ci, la fréquence de trois catégories de mots anglais a été comparée : emprunts directs, emprunts adaptés sur les plans morphologique et sémantique, et calques syntaxiques (pluriels terminant en -s). Le sous-corpus parallèle est ensuite consulté pour réfuter l'hypothèse nulle selon laquelle les différences observées ne relèvent pas du processus de traduction. Les résultats de l'analyse quantitative, complétée par de scrupuleuses observations qualitatives, révèlent que les traducteurs se montrent plus conventionnels dans leurs choix lexicaux et normalisent davantage que les auteurs ; ceux-ci, au contraire, semblent plus enclins à accepter des interférences avec l'anglais, soit la langue véhiculaire dans le monde de l'informatique. L'article se termine par une discussion sur les implications de ces résultats au niveau méthodologique, descriptif/théorique et appliqué.

acorn.aston.ac.uk
This paper reports on the construction and evaluation of a very large Web corpus of English. The ... more This paper reports on the construction and evaluation of a very large Web corpus of English. The corpus, called ukWaC, was obtained through a crawl of Web pages in the .uk domain, and in its final version contains around two billion words. Its aim is that of providing a general-purpose linguistic resource for the study of (British) English. Ideally, this new resource would be comparable to the widely used British National Corpus, while containing up-to-date and substantially larger quantities of language data. As with all corpora built using semi-automated procedures, the possibility of controlling the materials that end up in the corpus is limited, and post hoc evaluation is needed to appraise actual corpus composition. An evaluation method along the lines of Sharoff (2006) is proposed and applied which involves a comparison between ukWaC and the BNC. Different wordlists are created for the main part-of-speech categories (i.e. nouns, verbs, adjectives, -ly adverbs and function words), which are then compared via the log-likelihood measure, thus grouping words that are relatively more typical of one corpus with respect to the other . Results suggest that the two corpora differ insofar as ukWaC contains a higher proportion of texts related to the Web, education and "public sphere" issues, while the BNC contains more fiction and spoken texts. The paper concludes by discussing some of the issues and challenges raised by research on the construction and evaluation of Web corpora.
… Genres on the …, Jan 1, 2009
Institutional English on the websites of Italian (vs. UK/Irish) Universities: A preliminary corpu... more Institutional English on the websites of Italian (vs. UK/Irish) Universities: A preliminary corpus-based analysis of degree programme descriptions Keywords: institutional academic communication, English as a lingua franca, corpus-based methods Discourse areas: 3. Institutional; 1. Academic
Tradumàtica: traducció i tecnologies de la informació i …, Jan 1, 2009
Using Corpora in …, Jan 1, 2010
… of the 4th Web as Corpus …, Jan 1, 2008
Language Resources and …, Jan 1, 2009
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and ... more This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Italian built by web crawling, and describes the methodology and tools used in their construction. The corpora contain more than a billion words each, and are thus among the largest resources for the respective languages. The paper also provides an evaluation of their suitability for linguistic research, focusing on ukWaC and itWaC. A comparison in terms of lexical coverage with existing resources for the languages of interest produces encouraging results. Qualitative evaluation of ukWaC versus the British National Corpus was also conducted, so as to highlight differences in corpus composition (text types and subject matters). The article concludes with practical information about format and availability of corpora and tools.
Conference Presentations by Adriano Ferraresi
University of Bologna, Forlì (Italy) 13 May 2017 – Scientific writing – Key to editing a successful medical article from abstract to conclusion
Seminar on how to edit a bio-medical paper in English
Obiettivi formativi
Sviluppare le abilità ... more Seminar on how to edit a bio-medical paper in English
Obiettivi formativi
Sviluppare le abilità di redazione, traduzione e revisione linguistica di articoli medico-scientifici in lingua inglese, prestando particolare attenzione alle forme e alle strutture tipiche del genere (abstract, introduzione, metodi, risultati, discussione e conclusione).
Uploads
Papers by Adriano Ferraresi
Conference Presentations by Adriano Ferraresi
Obiettivi formativi
Sviluppare le abilità di redazione, traduzione e revisione linguistica di articoli medico-scientifici in lingua inglese, prestando particolare attenzione alle forme e alle strutture tipiche del genere (abstract, introduzione, metodi, risultati, discussione e conclusione).