We describe a new approach to converting written tokens to their spoken form, which can be shared... more We describe a new approach to converting written tokens to their spoken form, which can be shared by automatic speech recognition (ASR) and text-to-speech synthesis (TTS) systems. Both ASR and TTS need to map from the written to the spoken domain, and we present an approach that enables us to share verbalization grammars between the two systems while exploiting linguistic commonalities to provide simple default verbalizations. We also describe improvements to an induction system for number names grammars. Between these shared ASR/TTS verbalizers and the improved induction system for number names grammars, we achieve significant gains in development time and scalability across languages.
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
It is standard practice in speech & language technology to rank systems according to performance ... more It is standard practice in speech & language technology to rank systems according to performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which reports state-of-the-art performance on a widely-used "standard split". We fail to reliably reproduce some rankings using randomly generated splits. We suggest that randomly generated splits should be used in system comparison.
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
How language-agnostic are current state-ofthe-art NLP tools? Are there some types of language tha... more How language-agnostic are current state-ofthe-art NLP tools? Are there some types of language that are easier to model with current methods? In prior work (Cotterell et al., 2018) we attempted to address this question for language modeling, and observed that recurrent neural network language models do not perform equally well over all the highresource European languages found in the Europarl corpus. We speculated that inflectional morphology may be the primary culprit for the discrepancy. In this paper, we extend these earlier experiments to cover 69 languages from 13 language families using a multilingual Bible corpus. Methodologically, we introduce a new paired-sample multiplicative mixed-effects model to obtain language difficulty coefficients from at-least-pairwise parallel corpora. In other words, the model is aware of inter-sentence variation and can handle missing data. Exploiting this model, we show that "translationese" is not any easier to model than natively written language in a fair comparison. Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.
Transactions of the Association for Computational Linguistics
We propose two models for verbalizing numbers, a key component in speech recognition and synthesi... more We propose two models for verbalizing numbers, a key component in speech recognition and synthesis systems. The first model uses an end-to-end recurrent neural network. The second model, drawing inspiration from the linguistics literature, uses finite-state transducers constructed with a minimal amount of training data. While both models achieve near-perfect performance, the latter model can be trained using several orders of magnitude less data than the former, making it particularly useful for low-resource languages.
We present a system for automatically detecting and classifying phonologically anomalous producti... more We present a system for automatically detecting and classifying phonologically anomalous productions in the speech of individuals with aphasia. Working from transcribed discourse samples, our system identifies neologisms, and uses a combination of string alignment and language models to produce a lattice of plausible words that the speaker may have intended to produce. We then score this lattice according to various features, and attempt to determine whether the anomalous production represented a phonemic error or a genuine neologism. This approach has the potential to be expanded to consider other types of paraphasic errors, and could be applied to a wide variety of screening and therapeutic applications.
Algorithmic Classification of Five Characteristic Types of Paraphasias
American Journal of Speech-Language Pathology, 2016
Purpose This study was intended to evaluate a series of algorithms developed to perform automatic... more Purpose This study was intended to evaluate a series of algorithms developed to perform automatic classification of paraphasic errors (formal, semantic, mixed, neologistic, and unrelated errors). Method We analyzed 7,111 paraphasias from the Moss Aphasia Psycholinguistics Project Database (Mirman et al., 2010) and evaluated the classification accuracy of 3 automated tools. First, we used frequency norms from the SUBTLEXus database (Brysbaert & New, 2009) to differentiate nonword errors and real-word productions. Then we implemented a phonological-similarity algorithm to identify phonologically related real-word errors. Last, we assessed the performance of a semantic-similarity criterion that was based on word2vec (Mikolov, Yih, & Zweig, 2013). Results Overall, the algorithmic classification replicated human scoring for the major categories of paraphasias studied with high accuracy. The tool that was based on the SUBTLEXus frequency norms was more than 97% accurate in making lexicali...
Revisiting frequency and storage in morphological processing
The balance between storage and computation of complex words is a major point of departure both f... more The balance between storage and computation of complex words is a major point of departure both for theories of lexical representation (eg, Goldberg 2006, Halle & Marantz 1993, Jackendoff 1975) and processing (eg, Baayen et al. 1997, Butterworth 1983, Taft 2004). The atoms of lexical memory that are implicated in lexical processing experiments���be they whole words, roots and affixes, or some combination thereof���must ultimately coincide with the units of morphological theory if the latter are to be theories of the mental ...
A sociolinguist who has gathered so much data that it has become difficult to make sense of the r... more A sociolinguist who has gathered so much data that it has become difficult to make sense of the raw observations can turn to graphical presentation, and to descriptive statistics, techniques for distilling a collection of data into a few key numerical values, allowing the researcher to focus on specific, meaningful properties of the data set. A sociolinguist evaluates hypotheses about the connections between linguistic behavior, speakers, and society. The researcher begins this process by gathering data with the potential to falsify the hypotheses under consideration. Inferential statistics allow the researcher to compute the probability that a hypothesized property of the data is due to chance, and to estimate the magnitude of the hypothesized effect. This chapter compares inferential methods appropriate for sociolinguistic data in terms of these assumptions. It examines elements of qualitative analysis and methods for binary analysis, multinomial variables, and continuous variables.
Earlier intervocalic s is found in archaic inscriptions (eg, Lases for later Lares 'local de... more Earlier intervocalic s is found in archaic inscriptions (eg, Lases for later Lares 'local deities'; Baldi 2002: 213f.) and implicated by comparative reconstruction (eg, Latin flōrale 'floral'vs. Vestinian flusare; Watkins 1970). This change was actuated no later than the 4th century BCE, as indicated by Cicero's comment that L. Papirius Crassus, consul in 336 BCE and dictator in 339 BCE, was the first of his line to spell his cōgnōmen as Papirius rather than the ancestral Papisius. Numerous sr alternations in Classical Latin derive from this sound ...
Abstract: Bakovic (2005) proposes that patterns of sufficiently-similar segment avoidance are the... more Abstract: Bakovic (2005) proposes that patterns of sufficiently-similar segment avoidance are the result of interacting agreement and antigemination constraints, a pattern known as cross-derivational feeding (CDF). The bleeding interactions between epenthesis and assimilation which prevent adjacent sufficiently-similar segments in English are shown to follow, however, from extragrammatical considerations. Several case studies provide evidence against the major predictions of CDF.
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
We conduct a manual error analysis of the CoNLL-SIGMORPHON 2017 Shared Task on Morphological Rein... more We conduct a manual error analysis of the CoNLL-SIGMORPHON 2017 Shared Task on Morphological Reinflection. In this task, systems are given a word in citation form (e.g., hug) and asked to produce the corresponding inflected form (e.g., the simple past hugged). This design lets us analyze errors much like we might analyze children's production errors. We propose an error taxonomy and use it to annotate errors made by the top two systems across twelve languages. Many of the observed errors are related to inflectional patterns sensitive to inherent linguistic properties such as animacy or affect; many others are failures to predict truly unpredictable inflectional behaviors. We also find nearly one quarter of the residual "errors" reflect errors in the gold data.
Appropriate use of discourse markers (DMs) such as 'and', 'ok', or 'wait'... more Appropriate use of discourse markers (DMs) such as 'and', 'ok', or 'wait', is important for conversational reciprocity. Based on deficits in social communication and interaction being core symptoms of autism spectrum disorder (ASD), we hypothesize atypical use of conversational DMs in ASD. Plausibly because of the effort of annotating conversations, few studies have tested this hypothesis. However, new computational text analysis tools exist that may be adapted for quantitative characterization of DM use in ASD. 2 Objectives (1) To develop text analysis tools for detecting DMs and for determining whether the examiner asked a question and, if so, whether it is a yes/no (YN) or a WH (e.g., 'who', 'where') question. (2) To apply these tools to transcripts of ADOS conversations involving children with ASD or typical development (TD).
Memory in language-impaired children with and without autism
Journal of Neurodevelopmental Disorders, 2015
A subgroup of young children with autism spectrum disorders (ASD) have significant language impai... more A subgroup of young children with autism spectrum disorders (ASD) have significant language impairments (phonology, grammar, vocabulary), although such impairments are not considered to be core symptoms of and are not unique to ASD. Children with specific language impairment (SLI) display similar impairments in language. Given evidence for phenotypic and possibly etiologic overlap between SLI and ASD, it has been suggested that language-impaired children with ASD (ASD + language impairment, ALI) may be characterized as having both ASD and SLI. However, the extent to which the language phenotypes in SLI and ALI can be viewed as similar or different depends in part upon the age of the individuals studied. The purpose of the current study is to examine differences in memory abilities, specifically those that are key "markers" of heritable SLI, among young school-age children with SLI, ALI, and ALN (ASD + language normal). In this cross-sectional study, three groups of children between ages 5 and 8 years participated: SLI (n = 18), ALI (n = 22), and ALN (n = 20). A battery of cognitive, language, and ASD assessments was administered as well as a nonword repetition (NWR) test and measures of verbal memory, visual memory, and processing speed. NWR difficulties were more severe in SLI than in ALI, with the largest effect sizes in response to nonwords with the shortest syllable lengths. Among children with ASD, NWR difficulties were not associated with the presence of impairments in multiple ASD domains, as reported previously. Verbal memory difficulties were present in both SLI and ALI groups relative to children with ALN. Performance on measures related to verbal but not visual memory or processing speed were significantly associated with the relative degree of language impairment in children with ASD, supporting the role of verbal memory difficulties in language impairments among early school-age children with ASD. The primary difference between children with SLI and ALI was in NWR performance, particularly in repeating two- and three-syllable nonwords, suggesting that shared difficulties in early language learning found in previous studies do not necessarily reflect the same underlying mechanisms.
Hierarchical Regression and the Stratification of (Neg)
ling.upenn.edu
A class of generalized linear models known as hierarchical (or random-effects) models are, I argu... more A class of generalized linear models known as hierarchical (or random-effects) models are, I argue, a way to deal with the well-known fact that while speech communities share similar internal and external constraints on variation (eg, Guy 1980), subject variation may be sufficient to ...
Uploads
Papers by Kyle Gorman