Gabriel Infante-lopez

Followers

Following

Co-author

Public Views

Uploads

Papers by Gabriel Infante-lopez

Grupo de Procesamiento de Lenguaje Natural

This paper provides a survey of some ongoing research projects in computational linguistics withi... more

Devant Le Jury Composé De

sous le sceau de l’Université Européenne de Bretagne pour le grade de

Download

Grupo de Procesamiento de Lenguaje Natural

Abstract. We introduce a technique for inducing a refinement of the set of part of speech tags re... more Abstract. We introduce a technique for inducing a refinement of the set of part of speech tags related to verbs. We cluster verbs according to their syntactic behavior in a dependency structure setting. The set of clusters is automatically determined by means of a quality measure over the probabilistic automata that describe words in a bilexical grammar. Each of the resulting clusters defines a new part of speech tag. We try out the resulting tag set in a state-of-the art phrase structure parser and we show that the induced part of speech tags significantly improve the accuracy of the parser. 1

Download

Building Lemmas Using Examples

We present a heuristic for automated lemma discovery that generates lemmas that might help ACL2 i... more We present a heuristic for automated lemma discovery that generates lemmas that might help ACL2 in proving theorems like ∀x : t1(x) = t2(x). This heuristic exploits manually created examples of x. These examples are used to produce ground terms t′1 and t ′ 2, for which semantical models are built. In order to generate useful intermediate lemmas, we search for a specific pattern in these two models. The lemmas suggested by our heuristic are of the form ∀x : h(g1(x)) = h(f1(x)). A lemma is suggested if and only if t′1 and t ′ 2 can be rewritten as terms containing subterms h(g1(a)) and h(f1(a)) respectively, such that h(g1(a)) = h(f1(a)) but g1(a) 6= f1(a). We explain how to search for these patterns and how to build lemmas from a collection of ground equalities.

Download

Malware Evasion Attack and Defense

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), 2019

Machine learning (ML) classifiers are vulnerable to adversarial examples. An adversarial example ... more Machine learning (ML) classifiers are vulnerable to adversarial examples. An adversarial example is an input sample which is slightly modified to induce misclassification in an ML classifier. In this work, we investigate white-box and grey-box evasion attacks to an ML-based malware detector and conduct performance evaluations in a real-world setting. We compare the defense approaches in mitigating the attacks. We propose a framework for deploying grey-box and black-box attacks to malware detection systems.

Download

A Framework for Unsupervised Dependency Parsing using a Soft-EM Algorithm and Bilexical Grammars

Research in Computing Science, 2013

Unsupervised dependency parsing is acquiring great relevance in the area of Natural Language Proc... more Unsupervised dependency parsing is acquiring great relevance in the area of Natural Language Processing due to the increasing number of utterances that become available on the Internet. Most current works are based on Dependency Model with Valence (DMV) [12] or Extended Valence Grammars (EVGs) [11], in both cases the dependencies between words are modeled by using a fixed structure of automata. We present a framework for unsupervised induction of dependency structures based on CYK parsing that uses a simple rewriting techniques of the training material. Our model is implemented by means of a k-best CYK parser, an inductor for Probabilistic Bilexical Grammars (PBGs) [8] and a simple technique that rewrites the treebank from k trees with their probabilities. An important contribution of our work is that the framework accepts any existing algorithm for automata induction making the automata structure fully modifiable. Our experiments showed that, it is the training size that influences parameterization in a predictable manner. Such flexibility produced good performance results in 8 different languages, in some cases comparable to the state-of-the-art ones.

Download

Rijke. Alternative approaches for generating bodies of grammar rules

Download

A Suite of Tools for Analyzing ACL2 Books

We present a tool that inspects and analyzes ACL2 books. The tool provides useful information tha... more We present a tool that inspects and analyzes ACL2 books. The tool provides useful information that might help the user to improve or to optimize a book. For any event e in a book, the tool looks for all the subsets of events in the book that make e admissible by ACL2. All sets that are found have the particular property that no subset with one fewer element makes e admissible. We show that our algorithm is exponential in the number of sets that have this property. We also show that it is correct and we prove that if the events in the book behave monotonically, the algorithm also finds all sets and that those sets are in fact the smallest sets that make an event admissible. We also describe some uses this information might have; in particular we show that there are books in the ACL2 standard distribution that can have 40% of their local lemmas eliminated.

Download

Desarrollo, implementación y utilización de modelos para el procesamiento automático de textos

El libro recoge ponencias y talleres seleccionados de JALIMI 2005 (Jornadas Argentinas de Lingüís... more El libro recoge ponencias y talleres seleccionados de JALIMI 2005 (Jornadas Argentinas de Lingüística Informática: Modelización e Ingeniería), y está organizado en nueve capítulos y un apéndice. Si bien hay sustantivas diferencias en los enfoques, las metodologías, las propiedades específicas estudiadas y las aplicaciones propuestas o proyectadas, todos los capítulos comunican resultados de investigaciones que pretenden contribuir a alcanzar el objetivo a largo plazo de la Lingüística Informática, a saber: emular en términos cibernéticos la extraordinaria capacidad humana de producir y comprender textos en lengua natural.

Download

Expressive Power and Consistency Properties of State-of-the-Art Natural Language Parsers

Advances in Natural Language Processing, 2004

We define Probabilistic Constrained W-grammars (PCWgrammars), a two-level formalism capable of ca... more We define Probabilistic Constrained W-grammars (PCWgrammars), a two-level formalism capable of capturing grammatical frameworks used in two state of the art parsers, namely bilexical grammars and stochastic tree substitution grammars. We provide embeddings of these parser formalisms into PCW-grammars, which allows us to derive properties about their expressive power and consistency, and relations between the formalisms studied.

Download

Beyond Memoryless Distributions: Model Checking Semi-Markov Chains

Lecture Notes in Computer Science, 2001

Recent investigations have shown that the automated verification of continuous-time Markov chains... more Recent investigations have shown that the automated verification of continuous-time Markov chains (CTMCs) against CSL (Continuous Stochastic Logic) can be performed in a rather efficient manner. The state holding time distributions in CTMCs are restricted to negative exponential distributions. This paper investigates model checking of semi-Markov chains (SMCs), a model in which state holding times are governed by general distributions. We report on the semantical issues of adopting CSL for specifying properties of SMCs and present model checking algorithms for this logic.

Download

Sequences of Part of Speech Tags vs. Sequences of Phrase Labels: How Do They Help in Parsing?

Computational Linguistics and Intelligent Text Processing, 2006

We compare the contributions made by sequences of part of speech tags and sequences of phrase lab... more We compare the contributions made by sequences of part of speech tags and sequences of phrase labels for the task of grammatical relation finding. Both are used for grammar induction, and we show that English labels of grammatical relations follow a very strict sequential order, but not as strict as POS tags, resulting in better performance of the latter on the relation finding task.

Download

Upper bounds for unsupervised parsing with unambiguous non-terminally separated grammars

Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference - CLAGI '09, 2009

Unambiguous Non-Terminally Separated (UNTS) grammars have properties that make them attractive fo... more Unambiguous Non-Terminally Separated (UNTS) grammars have properties that make them attractive for grammatical inference. However, these properties do not state the maximal performance they can achieve when they are evaluated against a gold treebank that is not produced by an UNTS grammar. In this paper we investigate such an upper bound. We develop a method to find an upper bound for the unlabeled F 1 performance that any UNTS grammar can achieve over a given treebank. Our strategy is to characterize all possible versions of the gold treebank that UNTS grammars can produce and to find the one that optimizes a metric we define. We show a way to translate this score into an upper bound for the F 1. In particular, we show that the F 1 parsing score of any UNTS grammar can not be beyond 82.2% when the gold treebank is the WSJ10 corpus.

Download

Choosing Word Occurrences for the Smallest Grammar Problem

Lecture Notes in Computer Science, 2010

Download

Searching for Part of Speech Tags That Improve Parsing Models

Lecture Notes in Computer Science, 2008

We introduce a technique for inducing a refinement of the set of part of speech tags related to v... more We introduce a technique for inducing a refinement of the set of part of speech tags related to verbs. We cluster verbs according to their syntactic behavior in a dependency structure setting. The set of clusters is automatically determined by means of a quality measure over the probabilistic automata that describe words in a bilexical grammar. Each of the resulting clusters defines a new part of speech tag. We try out the resulting tag set in a state-of-the art phrase structure parser and we show that the induced part of speech tags significantly improve the accuracy of the parser.

Download

Alternative approaches for generating bodies of grammar rules

Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04, 2004

We compare two approaches for describing and generating bodies of rules used for natural language... more We compare two approaches for describing and generating bodies of rules used for natural language parsing. In today's parsers rule bodies do not exist a priori but are generated on the fly, usually with methods based on n-grams, which are one particular way of inducing probabilistic regular languages. We compare two approaches for inducing such languages. One is based on n-grams, the other on minimization of the Kullback-Leibler divergence. The inferred regular languages are used for generating bodies of rules inside a parsing procedure. We compare the two approaches along two dimensions: the quality of the probabilistic regular language they produce, and the performance of the parser they were used to build. The second approach outperforms the first one along both dimensions.

Download

iSat: Structure Visualization for SAT Problems

Lecture Notes in Computer Science, 2012

We present iSat, a Python command line tool to analyze and find structure in propositional satisf... more We present iSat, a Python command line tool to analyze and find structure in propositional satisfiability problems. iSat offers an interactive shell to control propositional solvers and generate graph representations of the internal structure of the search space explored by them for visualization, with the final aim of providing a unified environment for propositional solving experimentation. iSat was designed to allow the simple integration of both new provers and new visualization graphs and statistics with a minimum of coding overhead.

Download

Head Finders Inspection: An Unsupervised Optimization Approach

Lecture Notes in Computer Science, 2010

... Martın A. Domınguez 1 and Gabriel Infante-Lopez 1,2 1 Grupo de Procesamiento de Lenguaje Natu... more

Two-Level Probabilistic Grammars for Natural Language Parsing

Download

A Note on the Expressive Power of Probabilistic Context Free Grammars

Journal of Logic, Language and Information, 2006

We examine the expressive power of probabilistic context free grammars (PCFGs), with a special fo... more We examine the expressive power of probabilistic context free grammars (PCFGs), with a special focus on the use of probabilities as a mechanism for reducing ambiguity by filtering out unwanted parses. Probabilities in PCFGs induce an ordering relation among the set of trees that yield a given input sentence. PCFG parsers return the trees bearing the maximum probability for a given sentence, discarding all other possible trees. This mechanism is naturally viewed as a way of defining a new class of tree languages. We formalize the tree language thus defined, study its expressive power, and show that the latter is beyond context freeness. While the increased expressive power offered by PCFGs helps to reduce ambiguity, we show that, in general, it cannot be decided whether a PCFG removes all ambiguities.

Download

Gabriel Infante-lopez

Uploads

Papers by Gabriel Infante-lopez

Log In