slides (pdf)

Slav Orlinov Petrov

Outline

Title

Abstract

Discussion

References

slides (pdf)

Slav Orlinov Petrov

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

VP VBD increased NP CD 11 NN % PP TO to NP QP # # CD 2.5 CD billion PP IN from NP QP # # CD 2.25 CD billion Slav Petrov, Leon Barrett and Dan Klein Non-Local Modeling with a Mixture of PCFGs Empirical Motivation VP VBD increased NP CD 11 NN % PP TO to NP QP # # CD 2.5 CD billion PP IN from NP QP # # CD 2.25 CD billion Verb Phrase Expansion: capture with lexicalization. VP VBD increased NP CD 11 NN % PP TO to NP QP # # CD 2.5 CD billion PP IN from NP QP # # CD 2.25 CD billion

Stefan Th. Gries

2001

The present paper investigates the word order alternation of English transitive phrasal verbs such as, e.g., to pick up the book versus to pick the book up. It builds on traditional monofactorial analyses, but argues that previously used methods of analysis are grossly inadequate to describe, explain and predict the word order choice by native speakers. A hypothesis integrating virtually all relevant variables ever postulated is proposed and investigated from a multifactorial perspective (using GLM, linear discriminant analysis and CART). As a result, more than 84% of native speakers' choices can be predicted. Further implications (linguistic and methodological) are discussed. 1 The grammatical notation is not committed to any particular grammatical framework and serves expository reasons only. Likewise, the choice of terminology in terms of movement processes is not meant to truly imply any such processesÐit merely re¯ects that these phenomena have most frequently been dealt with within the transformational-generative paradigm.

downloadDownload free PDF View PDFchevron_right

On the Integration of Lexical Information into the P&P Model of Syntax

Ivan Brown

2009

Principles and parameters theory (PPT), as developed by Chomsky and other linguists, aims to explain and account for both the similarities and differences exhibited by the grammatical structures of the world's human languages. The principles are certain generic properties which grammars of all languages are thought to possess. The parameters are aspects of grammatical structures that have limited variability, and are fixed in one of a limited number of possible configurations. The lexicon, which is a list of words with their meanings, pronunciations and various properties, also has a significant role to play in this model. This paper outlines each of these components, and then examines them in depth to reveal the nature of some of the interactions. The rationale for the model and its major components are discussed, as well as the implications of the relevant modules and elements of the theory.

downloadDownload free PDF View PDFchevron_right

Compositional mechanisms in a generative model of the lexicon

Olga Batiukova, Elena de Miguel Aparicio

Sergi Torner and Elisenda Bernal (eds.): Collocations and other lexical combinations in Spanish, 2017

In this chapter, we provide an overview of one of the theoretical frameworks that encode the selectional constraints in the lexicon, the Generative Lexicon theory. We will review the different compositional mechanisms put forward in GL (with special attention to the type shifting or coercion ) and apply them to analyze a set of predicate-argument (verb-argument) and modifi cation (adjectival modifi er-noun) constructions in Spanish.

downloadDownload free PDF View PDFchevron_right

Verb-particle constructions in a computational grammar of English

Ann Copestake, Aline Villavicencio

2002

In this paper we investigate the phenomenon of verb-particle constructions, discussing their characteristics and the challenges that they present for a computational grammar. We concentrate our discussion on the treatment adopted in a wide-coverage HPSG grammar: the LinGO ERG. Given the constantly growing number of verb-particle combinations, possible ways of extending this treatment are investigated, taking into account the regular patterns found in some productive combinations of verbs and particles. We analyse possible ways of identifying regular patterns using different resources. One possible way to try to capture these is by means of lexical rules, and we discuss the dif£culties encountered when adopting such an approach. We also investigate how to restrict the productivity of lexical rules to deal with subregularities and exceptions to the patterns found.

downloadDownload free PDF View PDFchevron_right

The role of grammar in transition-probabilities of subsequent words in English text

Rudolf hanel

2020

Sentence formation is a highly structured, history-dependent, and sample-space reducing (SSR) process. While the first word in a sentence can be chosen from the entire vocabulary, typically, the freedom of choosing subsequent words gets more and more constrained by grammar and context, as the sentence progresses. This sample-space reducing property offers a natural explanation of Zipf’s law in word frequencies, however, it fails to capture the structure of the word-to-word transition probability matrices of English text. Here we adopt the view that grammatical constraints (such as subject–predicate–object) locally re-order the word order in sentences that are sampled by the word generation process. We demonstrate that superimposing grammatical structure–as a local word re-ordering (permutation) process–on a sample-space reducing word generation process is sufficient to explain both, word frequencies and word-to-word transition probabilities. We compare the performance of the grammat...

downloadDownload free PDF View PDFchevron_right

Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions

Stefan Th. Gries

2005

downloadDownload free PDF View PDFchevron_right

Statistical Measures of the Semi-Productivity of Light Verb Constructions

Ryan North

We propose a statistical measure for the degree of acceptability of light verb constructions, such as take a walk, based on their linguistic properties. Our measure shows good correlations with human ratings on unseen test data. Moreover, we find that our measure correlates more strongly when the potential complements of the construction (such as walk, stroll, or run) are separated into semantically similar classes. Our analysis demonstrates the systematic nature of the semi-productivity of these constructions.

downloadDownload free PDF View PDFchevron_right

Revisiting the Case for Explicit Syntactic Information in Language Models

Mark Dredze

2012

Abstract Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naıve, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactically well-motivated models. This unusual resilience of n-grams, as well as their weaknesses, are examined here.

downloadDownload free PDF View PDFchevron_right

Basic verbs in lexical progression and regression.

Åke Viberg

An Integrated View of Language Development. Papers in Honor of Henning Wode, eds. P. Burmeister, T. Piske & A. Rohde, pp. 109-134. , 2002

Wissenschaftlicher Verlag Trier. 'be' 24094 13.7 2 ha 'have' 13826 7.8 3 kunna 'can' 7265 4.1 4 ska 'shall' 5606 3.1 5 få 'get; may' 4588 2.6 6 komma 'come' 3348 1.9 7 bli 'become' 3113 1.7 8 säga 'say' 2868 1.6 9 göra 'make; do' 2669 1.5 10 se 'see' 2592 1.4 11 gå 'go' 2476 1.4 12 finnas 'there is' 2382 1.3 13 ta 'take' 2189 1.2 14 vilja 'want' 1536 0.8 15 ge 'give' 1399 0.7 16 måste 'must' 1251 0.7 17 stå 'stand' 1105 0.6 18 känna 'feel' 1067 0.6 19 veta 'know' 1032 0.5 20 gälla 'apply to' 995 0.5 Total 1-20 most frequent verb types 85401 48.7 Total 1-50 most frequent verb types 104 327 59.5 Total 1-100 most frequent verb types 119 537 68.2 Total Corpus 175 255 100 One important observation that can be made by inspecting Table 1 is the extreme dominance in terms of frequency of a small number of verbs. The 20 most frequent verb types cover close to 50% of all the verb tokens and the 100 most frequent verbs close to 70% in spite of the fact that the corpus contains close to 4000 verb types and larger printed dictionaries of Swedish up to 10 000 verb types. 1.1 Nuclear verbs Some of the basic verbs are language-specific in the sense that they tend to lack an equivalent in other languages. One example of that in Swedish is the verb få 'get; may' with rank 5 in the table (see Viberg 2001a). There is, however, an important set of verb meanings that tend to

downloadDownload free PDF View PDFchevron_right

Triggering V2: The amount of input needed for parameter setting in a split-CP model of word order

Marit Westergaard

… Acquisition and Development: Proceedings of GALA …, 2006

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (1)

Recent Development "Learning Accurate, Compact, and Interpretable Tree Annotation", Petrov et al., ACL 2006: F 1 = 90.2%. More flexible learning framework. Split and merge training to keep grammar compact. Similar in spirit to Klein & Manning 2003 and Matsuzaki et al. 2005.

Tom S Juzek

The Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), 2025

Scientific English is currently undergoing rapid change, with words like "delve," "intricate," and "underscore" appearing far more frequently than just a few years ago. It is widely assumed that scientists' use of large language models (LLMs) is responsible for such trends. We develop a formal, transferable method to characterize these linguistic changes. Application of our method yields 21 focal words whose increased occurrence in scientific abstracts is likely the result of LLM usage. We then pose "the puzzle of lexical overrepresentation": WHY are such words overused by LLMs? We fail to find evidence that lexical overrepresentation is caused by model architecture, algorithm choices, or training data. To assess whether reinforcement learning from human feedback (RLHF) contributes to the overuse of focal words, we undertake comparative model testing and conduct an exploratory online study. While the model testing is consistent with RLHF playing a role, our experimental results suggest that participants may be reacting differently to "delve" than to other focal words. With LLMs quickly becoming a driver of global language change, investigating these potential sources of lexical overrepresentation is important. We note that while insights into the workings of LLMs are within reach, a lack of transparency surrounding model development remains an obstacle to such research.

downloadDownload free PDF View PDFchevron_right

Proceedings of the HPSG04 Conference Center for Computational Linguistics

Valia Kordoni

2004

As usual in the GPSG/HPSG paradigm, (Ginzburg and Sag 2000) treats pied piping as a nonlocal dependency, just like extraction. This treatment faces a number of problems, both conceptual and empirical ones. To solve them, I propose an alternative in which pied piping is treated as a local dependency. This alternative avoids the empirical problems with the nonlocal treatment, and is conceptually and formally simpler.

downloadDownload free PDF View PDFchevron_right

A Grand Challenge for Linguistics: Scaling Up and Integrating Models A Grand Challenge for Linguistics: Scaling Up and Integrating Models

Jeff Good

The preeminent grand challenge facing the field of linguistics is the integration of theories and analyses from different levels of linguistic structure and aspects of language use to develop comprehensive models of language. Addressing this challenge will require massive scaling-up in the size of data sets used to develop and test hypotheses in our field as well as new computational methods, i.e., the deployment of cyberinfrastructure on a grand scale, including new standards, tools and computational models, as well as requisite culture change. Dealing with this challenge will allow us to break the barrier of only looking at pieces of languages to actually being able to build comprehensive models of all languages. This will enable us to answer questions that current paradigms cannot adequately address, not only transforming Linguistics but also impacting all fields that have a stake in linguistic analysis.

downloadDownload free PDF View PDFchevron_right

A probabilistic approach to the distribution of subject and anacoluthon NPs in Topics in spontaneous speech

Luis Filipe Lima e Silva

Gragoatá, 2022

The definition of Topic as well as that of information structure in the literature is very broad (cf. BARBOSA, 2005; MELLO; SILVA, 2015). Here we assume the definition as proposed by the Language into Act Theory (CRESTI, 2000), which says that Topic is the textual unit that is performed by an intonational profile of the prefix type ('t HART et al. 1990), and that has the function of constituting the domain over which the illocutionary force applies. An NP in Topic either can be the subject of the following verb in Comment or an anacoluthon. Anacolutha NPs are phrases that bear no syntactic relations with the predication in Comment. In this paper, we show how NPs are distributed probabilistically between these two conditions when they are performed as Topics in spontaneous speech. For this purpose, we collected data from available spontaneous speech corpora informationally labeled – including the Topic unit as defined above – from three languages: European Spanish (NICOLÁS MARTÍNEZ; LOMBÁN SOMACARRERA, 2018), American English (CAVALCANTE; RAMOS, 2016), and Brazilian Portuguese (PANUNZI; GREGORI; MITTMANN, 2014). The statistical method used to calculate the probability was a mixed-effects logistic regression with crossed random effects conducted with the aid of R (R CORE TEAM, 2018). Three variables were chosen: accessibility of referent, animacy, and definiteness. The model showed that there are about five times more chances for an NP performed in Topic to be the subject of the verb in Comment if it is animate, definite, and given.

downloadDownload free PDF View PDFchevron_right

"Non-Local Modeling with a Mixture of PCFGs"

Slav Orlinov Petrov

While most work on parsing with PCFGs has focused on local correlations between tree configurations, we attempt to model non-local correlations using a finite mixture of PCFGs. A mixture grammar fit with the EM algorithm shows improvement over a single PCFG, both in parsing accuracy and in test data likelihood. We argue that this improvement comes from the learning of specialized grammars that capture non-local correlations.

downloadDownload free PDF View PDFchevron_right

Features and selection in LFG: the English VP

Kersti Borjars

The kinds of analysis that can be provided for selection and agreement phenomena depend significantly on the choices made about the underlying features. In this paper, we review the features that have been used in LFG for the analysis of English verb forms, and propose a motivated alternative which has the consequence that all selection and agreement can be handled through unification. *

downloadDownload free PDF View PDFchevron_right

Introduction: Nominalizations in syntactic theory

Jaklin Kornfilt

Lingua, 2011

downloadDownload free PDF View PDFchevron_right

Local Modelling of Non-Local Dependencies in Syntax

Ángel J. Gallego

2012

3 recent analyses within the minimalist program (including Chomsky (2004, 2005a,b, 2007)): like Slash feature percolation approaches in that displacement phenomena involve minimal local movement steps-not only to the edge of a phase (i.e., clause or predicate phrase), but actually to the edge of each XP 4 similar: recent work on gap phrases (Koster (2000), Neeleman & van de Koot (2007)). Gereon Müller (Institut für Linguistik) Local Modelling of Non-Local Dependencies in Syntax April 2009 3 / 36 3 recent analyses within the minimalist program (including Chomsky (2004, 2005a,b, 2007)): like Slash feature percolation approaches in that displacement phenomena involve minimal local movement steps-not only to the edge of a phase (i.e., clause or predicate phrase), but actually to the edge of each XP 4 similar: recent work on gap phrases (Koster (2000), Neeleman & van de Koot (2007)). Reflexivization: 1 reflexivization by head movement at LF (e.g., Pica (1987), Cole & Sung (1994)) in principles-and-parameters approaches 2 reflexivization by extremely local movement of abstract pronoun matrices to phase edges (Fischer (2004, 2006)) in the minimalist program 3 reflexivization by feature percolation in HPSG (Kiss (2004)

downloadDownload free PDF View PDFchevron_right

On Directionality and the Structure of the Verb Phrase: Evidence from Nupe

Jason Kandybowicz

Syntax, 2003

We propose a movement account of why some verb phrases seem to be head-final in the Nupe language whereas others seem to be head-initial. Several converging arguments are given that verbs come before their complements in the underlying structure. Apparent counterexamples come from the presence of identifiable functional heads within the verb phrase structure that attract NPs to their specifier position. Two such heads are distinguished: Agro 0 , which attracts an NP non-locally for purposes of licensing accusative case, and Infin 0 , which attracts the closest NP to check an EPP feature regardless of whether it is case marked. We briefly compare our analysis to remnant movement analyses to sharpen the typology of leftward movement in natural language. We conclude that the success of Kayne's (1994) approach to word order depends on uncovering and cataloging the triggers of these movements.

downloadDownload free PDF View PDFchevron_right

How verb subcategorization frequencies are affected by corpus choice

Douglas Roland

Proceedings of the 36th annual meeting on Association for Computational Linguistics -, 1998

The probabilistic relation between verbs and their arguments plays an important role in modern statistical parsers and supertaggers, and in psychological theories of language processing. But these probabilities are computed in very different ways by the two sets of researchers. Computational linguists compute verb subcategorization probabilities from large corpora while psycholinguists compute them from psychological studies (sentence production and completion tasks). Recent studies have found differences between corpus frequencies and psycholinguistic measures.

downloadDownload free PDF View PDFchevron_right

slides (pdf)

Sign up for access to the world's latest research

Abstract

Related papers

References (1)

Related papers