Papers by Anthony Aristar
The Internet: an introduction
On the syntactic incorporation of linguistic units
... Email: click here to access email. Homepage: http://linguistlist.org/aristar/. Degree Awarded... more ... Email: click here to access email. Homepage: http://linguistlist.org/aristar/. Degree Awarded: University of Texas at Austin , Department of Linguistics. Degree Date: 1984. Linguistic Subfield(s): Typology. Subject Language(s): Awngi. Director(s): Winfred Lehmann. ...
The Semitic Jussive and the Implications for Aramaic
Maarav
Evolution based approaches to the preservation of endangered natural languages
The 2003 Congress on Evolutionary Computation, 2003. CEC '03.
Abstract Cultural algorithms, a form of evolutionary programming, employ a dual inheritance mecha... more Abstract Cultural algorithms, a form of evolutionary programming, employ a dual inheritance mechanism at population and knowledge levels to support problem solving, reasoning and knowledge extraction. Domain knowledge is extracted and separated from individuals within a population and is placed in a belief space. Hierarchical structures employed in the belief space help to accelerate and guide population evolution. The structure of the cultural algorithm lends itself well to a data rich, but knowledge poor distributed environment. In ...

The Treatment of Grammatical Categories and Word Order in Machine Translation
Machine Translation (MT) systems are now in active use around the world. This paper investigates ... more Machine Translation (MT) systems are now in active use around the world. This paper investigates the applicability of current and foreseeable MT technology to translation between one or more of modern Western European languages and Arabic. After an introduction in which we briefly sketch the history and current status of MT and comment on the situation vis-a-vis Arabic translation, we present some general design constraints for state-of-the-art MT systems before proceeding to consider problems posed by the Arabic language in particular. We then outline some approaches to the solutions of such problems, and indicate what special constraints these place on an MT system's design. We next present an architecture for a system that could handle Arabic, and draw some conclusions regarding the prospects for near-term application of such a system. We will not in this paper survey the field of MT; several such treatments are available elsewhere (e.g., [Slocum, 1984]).

Language data is central to the research of a large social sciences community, including not only... more Language data is central to the research of a large social sciences community, including not only linguists, but also anthropologists, archeologists, historians, sociologists, and political scientists interested in the culture of indigenous peoples. Members of this research community are currently faced with 2 urgent situations: the number of languages in the world is rapidly diminishing while the number of initiatives to create digital archives of language data is rapidly multiplying as a result of the increasing availability and sophistication of web technology. The latter might seem to be an unalloyed good in the face of the former, but there are 2 ways things may go wrong without adequate collaboration among archivists, linguists, and language engineers. First, a common standard for the digitization of linguistic data may never be agreed upon. And the resulting variation in archiving practices and language representation would seriously inhibit data a ccess, searching, and scien...
Studies in Language, 1997
Silverstein (1976) showed that the grammatical cases take varying kinds of case-marking according... more Silverstein (1976) showed that the grammatical cases take varying kinds of case-marking according to the hierarchical value of the nominal being marked. This paper demonstrates that such hierarchical marking occurs in non-grammatical cases as well. Moreover, these cases typically take nominals of a specific hierarchical value as arguments. Analysis of the data according to classic marking theory reveals that departures from the typical pattern often take extra morphological marking. Since the new forms appear in atypical contexts, they are prone to being pragmatically reinterpreted. And the combination of marking and reinterpretation will produce new cases in the language.
2009 International Multiconference on Computer Science and Information Technology, 2009
The Language and Location: Map Annotation Project (LL-MAP) has been funded by the US National Sci... more The Language and Location: Map Annotation Project (LL-MAP) has been funded by the US National Science Foundation to build a database of linguistic information integrated into a Web-based geographical information system. LL-MAP embodies several innovative concepts of computational linguistics, such as spatial data engine driven architecture, dynamic joining of linguistic information with related cultural and geographic data, multi-layered and linked visualization, real time online data harvesting, collaborative toolboxes for linguistic studies, quick search of digital gazetteers, and toponymical analysis. This paper will demonstrate these LL-MAP functions and discuss their disciplinary implications in linguistic studies.
Journal of Linguistics, 1992

International Journal on Semantic Web and Information Systems, 2005
The development of the Semantic Web, the next-generation Web, greatly relies on the availability ... more The development of the Semantic Web, the next-generation Web, greatly relies on the availability of ontologies and powerful annotation tools. However, there is a lack of ontology-based annotation tools for linguistic multimedia data. Existing tools either lack ontology support or provide limited support for multimedia. To fill the gap, we present an ontology-based linguistic multimedia annotation tool, OntoELAN, which features: (1) the support for OWL ontologies; (2) the management of language profiles, which allow the user to choose a subset of ontological terms for annotation; (3) the management of ontological tiers, which can be annotated with language profile terms and, therefore, corresponding ontological terms; and (4) storing OntoELAN annotation documents in XML format based on multimedia and domain ontologies. To our best knowledge, OntoELAN is the first audio/video annotation tool in the linguistic domain that provides support for ontology-based annotation. It is expected t...
arXiv preprint arXiv:0902.3027, Feb 18, 2009
Abstract: There is an increasing interest and effort in preserving and documenting endangered lan... more Abstract: There is an increasing interest and effort in preserving and documenting endangered languages. Language data are valuable only when they are well-cataloged, indexed and searchable. Many language data, particularly those of lesser-spoken languages, are collected as audio and video recordings. While multimedia data provide more channels and dimensions to describe a language's function, and gives a better presentation of the cultural system associated with the language of that community, they ...

Using the E-MELD School of Best Practices to create lasting digital documentation
Practice and values, 2010
The School of Best Practices is an online resource that describes how to create lasting digital d... more The School of Best Practices is an online resource that describes how to create lasting digital documentation according to the standards developed by a community of linguists, archivists, and computer scientists. The School was developed as part of the NSF-sponsored E-MELD (Electronic Metastructure for Endangered Languages Data) project, an initiative undertaken to develop digital infrastructure for the documentation of endangered languages. Users of the School can access recommendations of digital best practices and simple instructions for using the technologies recommended. The School also offers case studies of exemplary digitization processes, and searchable databases of tools and bibliographic resources. This paper describes how to use the School in creating digital language documentation, emphasizing its role among existing documentation and archiving initiatives.
Many languages are being lost as smaller populations disappear in the face of the encroaching meg... more Many languages are being lost as smaller populations disappear in the face of the encroaching mega-cultures of our time, and numerous documentation projects have recently been initiated in an attempt to preserve as much linguistic and cultural information as possible. This response to the threat of language attrition can only be applauded, but it has also drawn attention to the need for more information about optimal formats for digital documentation. Irreplaceable language documentation is often being stored in digital formats vulnerable to hardware and software obsolescence. Moreover, the heterogeneity of formats currently in use limits the accessibility and repurposing of the data.
Language Engineering for the Semantic Web: A Digital Library for Endangered Languages
Information …, 2004
Many languages are in serious danger of being lost and if nothing is done to prevent it, half of ... more Many languages are in serious danger of being lost and if nothing is done to prevent it, half of the world's approximately 6,500 languages will disappear in the next 100 years. Language data are central to the research of a large social science community, including linguists, ...

Unification and the computational analysis of arabic
Computers and Translation, 1987
1. I n t r o d u c t i o n The field of computat ional linguistics possesses some remarkable lacu... more 1. I n t r o d u c t i o n The field of computat ional linguistics possesses some remarkable lacunae. A great deal of work has been devoted to the efficient, reasoned parsing of syntax; as a result all but a very few syntactic theories have been at least partially implemented in an a t t empt to arrive at this goal. Morphological analysis has conversely been deemphasized, in part because of the prevailing emphasis on syntax in linguistics, but mainly because the vast majori ty of work in natural language processing has been done by English-speakers on English, a language which has the interesting and relatively rare peculiarity of having very little morphology. In an analytic language like English, it is a perfectly feasible option -given the memory capacities and the power of modern computat ional machinery -simply to list all possible forms of a word, and allow the the machine to access these forms directly in the lexicon, as if they were ununalyzable. Even when morphological analysis is incorporated as part of an English system, only, u few rules are needed to handle almost all inflectional variants of an English word. Systems with a grand total of six rules -one to handle -s noun plurals, one to handle the third singular present suffix -s, one to handle -ing participials, one to handle the past tense -ed, one to handle the comparat ive -er, and one to handle the superlative -est -can account for all regular morphology in the language. The instant one goes beyond English, however, efficient morphological analysis becomes vitally important . In a language such as Russian, or even Spanish, the number of possible forms becomes so large that it is no longer reasonable to list them. They have to be analyzed from the surface form to a form which can be looked up in a lexicon, typically by means of rules which match partial word-pat terns against candidate strings, strip off affixes, and modify stems to conform to their canonical dictionary entries. It is not surprising that the more innovative recent proposals regarding morphological processing have come from speakers of languages other than English, most prominent ly the Finn Koskenniemi (1983), who uses parallel rules and mini-lexica tha t are in essence the continuation classes for each type of rule operation. Unfortunately, both pat tern marchers and orthodox versions of Koskenniemi's two-level system suffer from three major problems. First, they exhibit an extreme dichotomy between morphology and syntax. The morphological component exists only to provide data to the syntactic component, da ta which are used independently of the morphological component. Changes to the syntactic component are quite independent of changes to the morphology, and any alterations made in one have to be made laboriously in the other. Obviously, input from the morphological component must be acceptable to the syntax. In evolving systems with very large morphological rule bases, ensuring consistency between syntax and morphology becomes a real problem. If a system at one stage utilizes a feature such as '3sgm', for example, and then at a later stage splits this feature into two features, '3sg' and 'm', so that the syntax can access these two features separately, this change will have to be made painstakingly in every rule which referred to the older feature.
On diachronic sources and synchronic pattern: An investigation into the origin of linguistic universals
Language, 1991
This paper explains the Greenbergian universals of modifier and adposition ordering as accidental... more This paper explains the Greenbergian universals of modifier and adposition ordering as accidental side-effects of diachronic derivation, arguing that disparate diachronic processes can conspire to give the effect of synchronic universals. The ordering of modifiers, for ...
GIS2 Colloquium: Language and Location: A Map Annotation Project (LL-MAP)
LINGOES: A Linguistic Ontology Management System
Journal of Digital Information Management, Dec 1, 2005
Abstract: LINGuistic Ontology managEment System (LINGOES) is a framework to enable linguists to t... more Abstract: LINGuistic Ontology managEment System (LINGOES) is a framework to enable linguists to take full advantage of the Semantic Web technologies. Together with OntoGloss, a text annotation tool, and an RDF database with versioning and querying capabilities, it allows a linguist to markup any document with classes in one or more ontologies at the morpheme's level. Textual documents can be in any language as long as they are accessible via a URI (Universal Resource Identifier). The annotated data can be queried ...
Uploads
Papers by Anthony Aristar