Kyle Gorman

Oregon Health and Science University, Center For Spoken Language Understanding, Post-Doc

Followers

139

Following

Co-authors

Public Views

I study phonology and morphology with formal and quantitative tools
Supervisors: Charles Yang

less

Interests

Uploads

Papers by Kyle Gorman

NeMo Inverse Text Normalization: From Development to Production

Interspeech 2021

Is the Best Better? Bayesian Statistical Model Comparison for Natural Language Processing

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion

Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

Unified Verbalization for Speech Recognition Synthesis Across Languages

Interspeech 2019

We describe a new approach to converting written tokens to their spoken form, which can be shared... more We describe a new approach to converting written tokens to their spoken form, which can be shared by automatic speech recognition (ASR) and text-to-speech synthesis (TTS) systems. Both ASR and TTS need to map from the written to the spoken domain, and we present an approach that enables us to share verbalization grammars between the two systems while exploiting linguistic commonalities to provide simple default verbalizations. We also describe improvements to an induction system for number names grammars. Between these shared ASR/TTS verbalizers and the improved induction system for number names grammars, we achieve significant gains in development time and scalability across languages.

Download

We Need to Talk about Standard Splits

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

It is standard practice in speech & language technology to rank systems according to performance ... more It is standard practice in speech & language technology to rank systems according to performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which reports state-of-the-art performance on a widely-used "standard split". We fail to reliably reproduce some rankings using randomly generated splits. We suggest that randomly generated splits should be used in system comparison.

Download

What Kind of Language Is Hard to Language-Model?

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

How language-agnostic are current state-ofthe-art NLP tools? Are there some types of language tha... more How language-agnostic are current state-ofthe-art NLP tools? Are there some types of language that are easier to model with current methods? In prior work (Cotterell et al., 2018) we attempted to address this question for language modeling, and observed that recurrent neural network language models do not perform equally well over all the highresource European languages found in the Europarl corpus. We speculated that inflectional morphology may be the primary culprit for the discrepancy. In this paper, we extend these earlier experiments to cover 69 languages from 13 language families using a multilingual Bible corpus. Methodologically, we introduce a new paired-sample multiplicative mixed-effects model to obtain language difficulty coefficients from at-least-pairwise parallel corpora. In other words, the model is aware of inter-sentence variation and can handle missing data. Exploiting this model, we show that "translationese" is not any easier to model than natively written language in a fair comparison. Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.

Download

Neural Models of Text Normalization for Speech Applications

Computational Linguistics

Minimally Supervised Number Normalization

Transactions of the Association for Computational Linguistics

We propose two models for verbalizing numbers, a key component in speech recognition and synthesi... more We propose two models for verbalizing numbers, a key component in speech recognition and synthesis systems. The first model uses an end-to-end recurrent neural network. The second model, drawing inspiration from the linguistics literature, uses finite-state transducers constructed with a minimal amount of training data. While both models achieve near-perfect performance, the latter model can be trained using several orders of magnitude less data than the former, making it particularly useful for low-resource languages.

Target word prediction and paraphasia classification in spoken discourse

BioNLP 2017

We present a system for automatically detecting and classifying phonologically anomalous producti... more We present a system for automatically detecting and classifying phonologically anomalous productions in the speech of individuals with aphasia. Working from transcribed discourse samples, our system identifies neologisms, and uses a combination of string alignment and language models to produce a lattice of plausible words that the speaker may have intended to produce. We then score this lattice according to various features, and attempt to determine whether the anomalous production represented a phonemic error or a genuine neologism. This approach has the potential to be expanded to consider other types of paraphasic errors, and could be applied to a wide variety of screening and therapeutic applications.

Download

Algorithmic Classification of Five Characteristic Types of Paraphasias

American Journal of Speech-Language Pathology, 2016

Purpose This study was intended to evaluate a series of algorithms developed to perform automatic... more Purpose This study was intended to evaluate a series of algorithms developed to perform automatic classification of paraphasic errors (formal, semantic, mixed, neologistic, and unrelated errors). Method We analyzed 7,111 paraphasias from the Moss Aphasia Psycholinguistics Project Database (Mirman et al., 2010) and evaluated the classification accuracy of 3 automated tools. First, we used frequency norms from the SUBTLEXus database (Brysbaert & New, 2009) to differentiate nonword errors and real-word productions. Then we implemented a phonological-similarity algorithm to identify phonologically related real-word errors. Last, we assessed the performance of a semantic-similarity criterion that was based on word2vec (Mikolov, Yih, & Zweig, 2013). Results Overall, the algorithmic classification replicated human scoring for the major categories of paraphasias studied with high accuracy. The tool that was based on the SUBTLEXus frequency norms was more than 97% accurate in making lexicali...

Revisiting frequency and storage in morphological processing

The balance between storage and computation of complex words is a major point of departure both f... more The balance between storage and computation of complex words is a major point of departure both for theories of lexical representation (eg, Goldberg 2006, Halle &amp;amp; Marantz 1993, Jackendoff 1975) and processing (eg, Baayen et al. 1997, Butterworth 1983, Taft 2004). The atoms of lexical memory that are implicated in lexical processing experiments��be they whole words, roots and affixes, or some combination thereof��must ultimately coincide with the units of morphological theory if the latter are to be theories of the mental ...

Quantitative Analysis

Oxford Handbooks Online, 2013

A sociolinguist who has gathered so much data that it has become difficult to make sense of the r... more A sociolinguist who has gathered so much data that it has become difficult to make sense of the raw observations can turn to graphical presentation, and to descriptive statistics, techniques for distilling a collection of data into a few key numerical values, allowing the researcher to focus on specific, meaningful properties of the data set. A sociolinguist evaluates hypotheses about the connections between linguistic behavior, speakers, and society. The researcher begins this process by gathering data with the potential to falsify the hypotheses under consideration. Inferential statistics allow the researcher to compute the probability that a hypothesized property of the data is due to chance, and to estimate the magnitude of the hypothesized effect. This chapter compares inferential methods appropriate for sociolinguistic data in terms of these assumptions. It examines elements of qualitative analysis and methods for binary analysis, multinomial variables, and continuous variables.

Exceptions to rhotacism

Earlier intervocalic s is found in archaic inscriptions (eg, Lases for later Lares 'local de... more Earlier intervocalic s is found in archaic inscriptions (eg, Lases for later Lares 'local deities'; Baldi 2002: 213f.) and implicated by comparative reconstruction (eg, Latin flōrale 'floral'vs. Vestinian flusare; Watkins 1970). This change was actuated no later than the 4th century BCE, as indicated by Cicero's comment that L. Papirius Crassus, consul in 336 BCE and dictator in 339 BCE, was the first of his line to spell his cōgnōmen as Papirius rather than the ancestral Papisius. Numerous sr alternations in Classical Latin derive from this sound ...

Cross-derivational feeding is epiphenomenal

Abstract: Bakovic (2005) proposes that patterns of sufficiently-similar segment avoidance are the... more Abstract: Bakovic (2005) proposes that patterns of sufficiently-similar segment avoidance are the result of interacting agreement and antigemination constraints, a pattern known as cross-derivational feeding (CDF). The bleeding interactions between epenthesis and assimilation which prevent adjacent sufficiently-similar segments in English are shown to follow, however, from extragrammatical considerations. Several case studies provide evidence against the major predictions of CDF.

Download

Weird Inflects but OK: Making Sense of Morphological Generation Errors

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We conduct a manual error analysis of the CoNLL-SIGMORPHON 2017 Shared Task on Morphological Rein... more We conduct a manual error analysis of the CoNLL-SIGMORPHON 2017 Shared Task on Morphological Reinflection. In this task, systems are given a word in citation form (e.g., hug) and asked to produce the corresponding inflected form (e.g., the simple past hugged). This design lets us analyze errors much like we might analyze children's production errors. We propose an error taxonomy and use it to annotate errors made by the top two systems across twelve languages. Many of the observed errors are related to inflectional patterns sensitive to inherent linguistic properties such as animacy or affect; many others are failures to predict truly unpredictable inflectional behaviors. We also find nearly one quarter of the residual "errors" reflect errors in the gold data.

Download

Uh and um in children with autism spectrum disorders or language impairment

Autism Research, 2016

Atypical pragmatic language is often present in individuals with autism spectrum disorders (ASD),... more Atypical pragmatic language is often present in individuals with autism spectrum disorders (ASD), along with delays or deficits in structural language. This study investigated the use of the &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;fillers&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot; uh and um by children ages 4-8 during the autism diagnostic observation schedule. Fillers reflect speakers&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39; difficulties with planning and delivering speech, but they also serve communicative purposes, such as negotiating control of the floor or conveying uncertainty. We hypothesized that children with ASD would use different patterns of fillers compared to peers with typical development or with specific language impairment (SLI), reflecting differences in social ability and communicative intent. Regression analyses revealed that children in the ASD group were much less likely to use um than children in the other two groups. Filler use is an easy-to-quantify feature of behavior that, in concert with other observations, may help to distinguish ASD from SLI. Autism Res 2016. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.

Automated morphological analysis of clinical language samples

Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2015

Children's Differing Patterns of Discourse Marker Use in ASD and Typical Development

Appropriate use of discourse markers (DMs) such as 'and', 'ok', or 'wait'... more Appropriate use of discourse markers (DMs) such as 'and', 'ok', or 'wait', is important for conversational reciprocity. Based on deficits in social communication and interaction being core symptoms of autism spectrum disorder (ASD), we hypothesize atypical use of conversational DMs in ASD. Plausibly because of the effort of annotating conversations, few studies have tested this hypothesis. However, new computational text analysis tools exist that may be adapted for quantitative characterization of DM use in ASD. 2 Objectives (1) To develop text analysis tools for detecting DMs and for determining whether the examiner asked a question and, if so, whether it is a yes/no (YN) or a WH (e.g., 'who', 'where') question. (2) To apply these tools to transcripts of ADOS conversations involving children with ASD or typical development (TD).

Download

Memory in language-impaired children with and without autism

Journal of Neurodevelopmental Disorders, 2015

A subgroup of young children with autism spectrum disorders (ASD) have significant language impai... more A subgroup of young children with autism spectrum disorders (ASD) have significant language impairments (phonology, grammar, vocabulary), although such impairments are not considered to be core symptoms of and are not unique to ASD. Children with specific language impairment (SLI) display similar impairments in language. Given evidence for phenotypic and possibly etiologic overlap between SLI and ASD, it has been suggested that language-impaired children with ASD (ASD + language impairment, ALI) may be characterized as having both ASD and SLI. However, the extent to which the language phenotypes in SLI and ALI can be viewed as similar or different depends in part upon the age of the individuals studied. The purpose of the current study is to examine differences in memory abilities, specifically those that are key &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;markers&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot; of heritable SLI, among young school-age children with SLI, ALI, and ALN (ASD + language normal). In this cross-sectional study, three groups of children between ages 5 and 8 years participated: SLI (n = 18), ALI (n = 22), and ALN (n = 20). A battery of cognitive, language, and ASD assessments was administered as well as a nonword repetition (NWR) test and measures of verbal memory, visual memory, and processing speed. NWR difficulties were more severe in SLI than in ALI, with the largest effect sizes in response to nonwords with the shortest syllable lengths. Among children with ASD, NWR difficulties were not associated with the presence of impairments in multiple ASD domains, as reported previously. Verbal memory difficulties were present in both SLI and ALI groups relative to children with ALN. Performance on measures related to verbal but not visual memory or processing speed were significantly associated with the relative degree of language impairment in children with ASD, supporting the role of verbal memory difficulties in language impairments among early school-age children with ASD. The primary difference between children with SLI and ALI was in NWR performance, particularly in repeating two- and three-syllable nonwords, suggesting that shared difficulties in early language learning found in previous studies do not necessarily reflect the same underlying mechanisms.

Hierarchical Regression and the Stratification of (Neg)

ling.upenn.edu

A class of generalized linear models known as hierarchical (or random-effects) models are, I argu... more

Kyle Gorman

Uploads

Papers by Kyle Gorman

Log In