Rose Tools: A Medieval Manuscript Text-Image Annotation Project
2011, Digitizing Medieval and Early Modern Material Culture, eds. Brent Nelson, Melissa Terras, ACMRS
Sign up for access to the world's latest research
Abstract
With the now widespread availability of digitized data, researchers in the humanities are learning to go about their work in new ways-and requiring new research tools to do so.
Related papers
2022
This essay is focused on the development of new digital paleographical methods to support the study of medieval and modern manuscripts. The potential of image manipulation software such as Adobe Photoshop has not yet been entirely explored, especially if we consider the constant improvements and new features that are offered by continual software updates. Compared to traditional (human eye on the page) manuscript reading, digital imaging processing offers many tools for the analysis of script from several different paleographical perspectives: for instance, close reading of the external shape of words or single letters as well as overlaying letter comparison to define commonalities and differences via accurate pixel computation. In addition, features of this software can be used to enhance readings and build knowledge by restoring effaced, discolored, or corrupted parchment without affecting the original document. The poetic corpus written by Italian poet Francesco Petrarca (Petrarch) during the thirteenth century, entitled Rerum vulgarium fragmenta, also known as the Canzoniere, serves as a case study here. This corpus is fascinating in part because it is extremely rare in medieval literature to be able to trace the process of author revision from first draft (MS Vat. Lat. 3196) through definitive final copy (MS Vat. Lat. 3195). Petrarch is in fact the only medieval poet for whom we have the original autograph manuscript. In composing the Canzoniere, Petrarch transcribed poems previously written in Vat. Lat. 3196, also known as the Codice degli abbozzi, to Vat. Lat. 3195. Since an autograph always provides invaluable insight into a text's genesis and creation, these two codices remain at the center of a long hermeneutical and philological debate. 1 My thanks to Dr. Amyrose McCue Gill of TextFormations for her assistance in editing the text and revising translations.
CLARIN
This chapter presents the Austrian experience of building CLARINrelated infrastructures and services and describes its impact on the wider humanities research community. We will focus on the activities of the Austrian Centre for Digital Humanities and Cultural Heritage at the Austrian Academy of Sciences (ACDH-CH), a centre of expertise which now supports projects in a broad range of humanities disciplines. Part of ACDH-CH's services are concerned with research data preservation in the long-term repository ARCHE, which will be elaborated on here, as will a set of text-technological and semantic services. Furthermore, the crucial role of knowledge sharing measures for the increased adoption of DH methods is described and Austrian contributions and cooperation in the context of building European research infrastructures for the humanities are highlighted.
The merging of corpus linguistic methods and digital technology can provide new ways of representing medieval digital texts. In this paper, we introduce a multi-layered parallel Old Occitan-English corpus. We show how parallel alignment can help overcome some challenges associated with historical manuscripts. Furthermore, we apply a resource-light method of building an emotion annotation via parallel alignment. Finally, using visualization tools such as ANNIS and GoogleViz, we demonstrate how our parallel corpus can be queried and visualized dynamically via modern language.
In the Spring semester of 2011 I signed up for a PhD level seminar at Northern Illinois University entitled "Paleography" taught by Dr. Nicole Clifton. The majority of the coursework consisted of learning various styles of handwriting scripts dating from 100 BCE to 1700 CE as well as transcribing, dating, and identifying the origin of a manuscript housed in the Rare Books section of the university library. It was in this class that I was first introduced to the extensive work conducted by the British Library"s Manuscript Studies division housed on their website. The BL was able to digitize a large assortment of collected texts from across their holdings, especially medieval manuscripts dating as early as the 10 th century. While I have previously involved myself in such technological discussions as Kairos and Computers and Composition, I had never spent time working with the intersection between ancient text and modern technology. The availability of ancient manuscripts and the ability to work with programs like Adobe allowed me to transition my thinking about writing from one that focused exclusively on hypertextual writing to seeing the need for writing to become more accessible, especially works that are normally housed in archives hidden away from public view. Is there a digital medieval humanities? Multimodality within the medieval community is nothing new as projects such as CANTUS database, Project Gutenberg, various linguistic tutorials for medieval Latin, Old High German, Old English, and Old French, and annotated hypertext websites covering the works of Geoffrey Chaucer, Thomas Malory, and Wolfram von Eschenbach.
2012
XML Gayoso-Cabada, Joaquin, Universidad Complutense de Madrid, Spain, gayoxo@gmail.com Ruiz, Cesar, Universidad Complutense de Madrid, Spain, cruiz85@gmail.com Pablo-Nuñez, Luis, Universidad Complutense de Madrid, Spain, lpnunez@filol.ucm.es Sarasa-Cabezuelo, Antonio, Universidad Complutense de Madrid, Spain, asarasa@fdi.ucm.es Goicoechea-de-Jorge, Maria, Universidad Complutense de Madrid, Spain, mgoico@filol.ucm.es Sanz-Cabrerizo, Amelia, Universidad Complutense de Madrid, Spain, amsanz@filol.ucm.es Sierra-Rodriguez, Jose-Luis, Universidad Complutense de Madrid, Spain, jlsierra@fdi.ucm.es
in Tara Andrews, Caroline Macé (ed.), Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, 2014
2002
Users need more sophisticated tools to handle the growing number of image-based documents available in databases. In this paper, we present a system devoted to the editing and browsing of complex literary hypermedia including original manuscript documents and other handwritten sources. Editing capabilities allow the user to transcribe manuscript images in an interactive way and to encode the resulting textual representation by means of a logical markup language (based on the XML/TEI specification). Both representations (image and structured text) are tightly linked to facilitate the reading and the interpretation of documents. This text/image coupling scheme is an attempt to unify several layers of information in order to provide the user with a global vision of the work. Our system also supplies tools capable of processing and relating information stored both in images and structured texts. Finally, application-specific visualization techniques have been developed in order to provide users with a way to identify relationships between source documents and help them to navigate.
De Gruyter eBooks, 2019
Cataloging and Citing Greek and Latin Authors and Works illustrates not only how Classicists have built upon larger standards and data models such as the Functional Requirements for Bibliographic Records (FRBR, allowing us to represent different versions of a text) and the Text Encoding Initiative (TEI) Guidelines for XML encoding of source texts (representing the logical structure of sources) but also highlights some major contributions from Classics. Alison Babeu, Digital Librarian at Perseus, describes a new form of catalog for Greek and Latin works that exploits the FRBR data model to represent the many versions of our sourcesincluding translations. Christopher Blackwell and Neel Smith built on FRBR to develop the Canonical Text Services (CTS) data model as part of the CITE Architecture. CTS provides an explicit framework within which we can address any substring in any version of a text, allowing us to create annotations that can be maintained for years and even for generations. This addressesat least within the limited space of textual dataa problem that has plagued hypertext systems since the 1970s and that still afflicts the World Wide Web. Those who read these papers years from now will surely find that many of the URLs in the citations no longer function but all of the CTS citations should be usablewhether we remain with this data model or replace it with something more expressive. Computer Scientists Jochen Tiepmar and Gerhard Heyer show how they were able to develop a CTS server that could scale to more than a billion words, thus establishing the practical nature of the CTS protocol. If there were a Nobel Prize for Classics, my nominations would go to Blackwell and Smith for CITE/CTS and to Bruce Robertson, whose paper on Optical Character Recognition opens the section on Data Entry, Collection, and Analysis for Classical Philology. Robertson has worked a decade, with funding and without, on the absolutely essential problem of converting images of print Greek into machine readable text. In this effort, he has mastered a wide range of techniques drawn from areas such as computer human interaction, statistical analysis, and machine learning. We can now acquire billions of words of Ancient Greek from printed sources and not just from multiple editions of individual works (allowing us not only to trace the development of our texts over time but also to identify quotations of Greek texts in articles and books, thus allowing us to see which passages are studied by different scholarly communities at different times). He has enabled fundamental new work on Greek. Meanwhile the papers by Tauber, Burns, and Coffee are on representing characters, on a pipeline for textual analysis of Classical languages and on a system that detects where one text alludes towithout extensively quotinganother text. At its base, philology depends upon the editions which provide information about our source texts, including variant readings, a proposed reconstruction of the original, and reasoning behind decisions made in analyzing the text.
This paper presents a digital model and software created in the context of the VEdition project, to provide a critical digital edition of Goethe's Venetian Epigrams. The paper proposes an innovative textological approach, focusing on a generic and reusable model of autographs to represent the dynamic nature of the creative process. While preserving the "objective" reproduction of documents separate from subjective scholarly interpretations, the model focuses on a single structured, computable and compact graph-based data structure, allowing to generate multiple text versions, annotated at any granularity level, for both textual and visual content. A full-fledged web UI (and an alternative complementary DSL) facilitates the creation of content, allowing scholars to focus on the reconstruction of the creative process at a higher abstraction level, while providing virtually unlimited export formats for integration with TEI-based production flows.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.