Academia.eduAcademia.edu

Language Archives

description27 papers
group1 follower
lightbulbAbout this topic
Language archives are systematic collections of audio, video, and textual materials documenting languages, particularly those that are endangered or under-researched. They serve as resources for linguistic research, preservation, and revitalization efforts, facilitating the study of language structure, use, and cultural context.
lightbulbAbout this topic
Language archives are systematic collections of audio, video, and textual materials documenting languages, particularly those that are endangered or under-researched. They serve as resources for linguistic research, preservation, and revitalization efforts, facilitating the study of language structure, use, and cultural context.

Key research themes

1. How can digital language archives be made more accessible and usable for language communities?

This theme focuses on the gap between the availability of digital language archives and their actual accessibility and usability by the language communities whose languages are documented. Challenges include limitations in technical infrastructure, interface design not tailored for community needs, language and literacy barriers, and the need for participatory archive design and mediation. Addressing these issues is critical for empowering language communities to engage with, maintain, and revitalize their linguistic heritage through archives.

Key finding: This paper identifies multiple barriers preventing marginalized language communities from accessing digital language archives, such as limited internet access, electricity, and digital devices, as well as user skills and... Read more
Key finding: Describes a mediated, participatory archiving workflow at CoRSAL that engages diverse language depositors and communities throughout the archiving process, from data intake to post-archival engagement. The approach accounts... Read more
Key finding: Highlights the curricular gap in library and information science (LIS) education regarding the specific needs and user contexts of language archives, especially related to indigenous and endangered languages. Advocates for... Read more
Key finding: Discusses participatory, bottom-up approaches to establishing local digital archives for endangered languages, particularly in Southeast Asia. Stresses the difference between content management systems and sustainable... Read more

2. What standards and metadata frameworks enable interoperability and reusability of linguistic data across digital language archives?

This theme investigates efforts to develop and apply standards for linguistic data formats, metadata schemas, and resource catalogs to facilitate data sharing, comparison, and re-use across different archives and linguistic resources. Standardization enhances FAIR principles (Findable, Accessible, Interoperable, Reusable), which is fundamental for comparative linguistics, NLP applications, and cross-disciplinary research. The establishment and integration of metadata frameworks allow archives to interoperate and make their holdings more discoverable and reusable internationally.

Key finding: The paper proposes new standardized formats for cross-linguistic data, including word lists and structural datasets, together with a validation and manipulation software package and a basic ontology linking to general... Read more
Key finding: Reports on the META-SHARE repository infrastructure designed for cataloging, sharing, and exchanging language technology resources with rich metadata standards. The metadata model decomposes resource description into... Read more
Key finding: Describes OLAC’s federated metadata infrastructure based on the Open Archives Initiative and Dublin Core Metadata Initiative, enabling federated discovery of language resources worldwide. OLAC addresses issues such as... Read more
Key finding: Presents an integrated tool for querying and manipulating diverse language resources, including corpora, dictionaries, and language models, within a unified interface. The system supports searching across heterogeneous... Read more

3. How do language archives document and represent complex sociolinguistic and historical language data while ensuring long-term preservation?

This theme covers challenges and approaches related to the preservation, documentation, and representation of language materials that reflect sociolinguistic diversity, historical variation, and complex naming conventions. It includes problems of data depreciation, authorship, copyright, variant name representation, and the unique nature of fieldwork notes and archival records. Addressing these ensures archival materials remain comprehensible, ethically managed, and scientifically useful over time.

Key finding: Analyzes the challenges of managing Brazilian sociolinguistic repositories as valuable intangible assets, focusing on depreciation of physical media, authorship recognition using CRediT taxonomy versus copyright law, and... Read more
Key finding: Explains the lack of training on specific archival needs of language collections in LIS education, advocating for interdisciplinary collaboration. While primarily related to access, this also highlights the complexities of... Read more
Key finding: Details the complex sociocultural and historical influences on personal and language name variants in Northeast India, illustrating how variable name structures, ordering, and abbreviations introduce significant challenges... Read more
Key finding: Discusses the nature and challenges of field research archives, including the diversity of archival holdings, issues of preservation of physical and digital materials, and the importance of digital modeling and visualization... Read more
Key finding: Argues that quality evaluation of language archives should not rely solely on openness or unrestricted access due to ethical and community-specific considerations. It discusses the dialectic tension between openness and... Read more

All papers in Language Archives

The Open Language Archives Community (OLAC) provides a comprehensive infrastructure that has allowed our community to index and discover language resources over the past 20 years. However, OLAC infrastructure has fallen behind as the... more
We collaborated to investigate humor in the existing corpus of Kere (ISO639-3: sst). This collaboration was a useful test of the Kere corpus and led to the rediscovery of unarchived video recordings, which contained important contextual... more
In hard science, the observer's paradox states that the means of perception itself affects the perceived action. For example in physics, the light used for visible perception can affect the physical state of the phenomenon being observed.
This paper aims to provide a practical guide to those who have the opportunity to conduct sociolinguistic field research in Micronesia, focusing on the hands-on, nitty-gritty experiences of actual data collection rather than theoretical... more
The Archival / Preservation Education SIG panel engages with interconnected external pressures and curricular goals in the archival classroom. Four moderated presentations focus on innovative classroom pedagogy, including modeling and... more
This paper discusses levels of access in language archives and their implications for assessment. In the absence of well-established criteria, part of the evaluation of language archives is often based on accessibility; roughly, the more... more
Linguistic data collection typically involves conducting interviews with participants in close proximity. The safety precautions related to the COVID-19 pandemic brought such data collection to an abrupt halt: Social distancing forced... more
Este artículo incluye un texto glosado en pima bajo, además de describir la situación de la documentación lingüística actual para esta lengua.
This paper provides remarks for a management plan for Brazilian linguistic documentation repositories in order to contribute to their conservation. The depreciation, authorship, sharing, and financing problems are discussed, pointing... more
La presente obra reúne nueve trabajos enfocados en la documentación lingüística, los cuales son resultado de investigaciones ancladas a proyectos de preservación de lenguas y su legado cultural. El objetivo es exponer al lector... more
This paper is a position statement on reproducible research in linguistics, including data citation and attribution, that represents the collective views of some 41 colleagues. Reproducibility can play a key role in increasing... more
Language archives are not only a valuable resource for language communities to tell their stories and to create lasting records of their ways of life, but also for those interested in anthropology, linguistics, agriculture, or art... more
Preservation and revitalization of Indigenous and endangered languages supports a resilient future. Funding agencies have extensively supported efforts aimed at preserving and providing online access to unique and valuable collections of... more
La presente obra reúne nueve trabajos enfocados en la documentación lingüística, los cuales son resultado de investigaciones ancladas a proyectos de preservación de lenguas y su legado cultural. El objetivo es exponer al lector... more
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by other means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission in writing... more
El libro reúne nueve trabajos enfocados en la documentación lingüística, los cuales son resultado de investigaciones ancladas a proyectos de preservación de lenguas y su legado cultural. El objetivo es exponer al lector perspectivas y... more
Preservation and revitalization of Indigenous and endangered languages supports a resilient future. Funding agencies have extensively supported efforts aimed at preserving and providing online access to unique and valuable collections of... more
Slides from the symposium and panel discussion at the event "Data Citation and Attribution for Reproducible Research in Linguistics," Annual Meeting of the Linguistic Society of America, Austin, TX, 5 January 2017.
Language archiving involves the collection and curation of a variety of language materials. As an emerging language archive, CoRSAL caters to a range of different language depositors with different research needs. As such, we have... more
Ramari Dongosaro, or Sonsorolese (ISO 639-3 son) is a nuclear Micronesian language spoken in the South-West islands of the Republic of Palau. Sonsorolese is the official language of the State of Sonsorol but still lacks in status and... more
Slides from the symposium and panel discussion at the event "Data Citation and Attribution for Reproducible Research in Linguistics," Annual Meeting of the Linguistic Society of America, Austin, TX, 5 January 2017.
This paper is a position statement on reproducible research in linguistics, including data citation and attribution, that represents the collective views of some 41 colleagues. Reproducibility can play a key role in increasing... more
Preservation and revitalization of Indigenous and endangered languages supports a resilient future. Funding agencies have extensively supported efforts aimed at preserving and providing online access to unique and valuable collections of... more
Language archives are not only a valuable resource for language communities to tell their stories and to create lasting records of their ways of life, but also for those interested in anthropology, linguistics, agriculture, or art... more
Preservation and revitalization of Indigenous and endangered languages supports a resilient future. Funding agencies have extensively supported efforts aimed at preserving and providing online access to unique and valuable collections of... more
Over the last two decades there has been a surge in activists, linguists, anthropologists, documenters digitally recording endangered language use. These unique records often are uploaded to corporate social media sites or to privately... more
for other reasons as well. The project of writing a grammar is substantially larger in scale than many other research projects. The grammar writer is called upon to have comprehensive knowledge of a language, from its phonetics to its... more
Technologies storage technology for Data preservatIon J.-y. nief requIrements anD solutIons for archIvIng scIentIfIc Data at cInes s. coutin vIrtual envIronments for Data preservatIon v
Over the last two decades there has been a surge in activists, linguists, anthropologists, documenters digitally recording endangered language use. These unique records often are uploaded to corporate social media sites or to privately... more
ethics, and linguistic data. While the causes of language endangerment are many and complex, social and cultural dislocation due to unequal power relations between minority communities and majority populations have played major roles in... more
We describe a structured task for gathering enriched language data for descriptive, comparative, and documentary purposes, focusing on the domain of social cognition. The task involves ollaborative narrative problem-solving and retelling... more
We describe a structured task for gathering enriched language data for descriptive, comparative, and documentary purposes, focusing on the domain of social cognition. The task involves collaborative narrative problem-solving and retelling... more
We describe a structured task for gathering enriched language data for descriptive, comparative, and documentary purposes, focusing on the domain of social cognition. The task involves ollaborative narrative problem-solving and retelling... more
Collaborating remotely comes with its own set of challenges.
This dissertation uses raw data of Ramari Hatohobei or Tobian (ISO 639-3 tox), an endangered Micronesian language, archived at the Endangered Languages Archive (ELAR) for the description of its prosodic patterns. The primary aim is to... more
This paper gives a detailed overview of the archived language documentation materials for the two languages spoken in northern Ambrym, Vanuatu: North Am-brym and Fanbyak. I discuss the speakers and the language situation in northern... more
Circum-Baikal Asia is a linguistically and culturally rich area where until modern days many languages representing a number of linguistic families exist. In terms of indigenous population it is occupied predominantly by the Buryat,... more
Download research papers for free!