Papers by Gabriel Infante-lopez
Grupo de Procesamiento de Lenguaje Natural
This paper provides a survey of some ongoing research projects in computational linguistics withi... more This paper provides a survey of some ongoing research projects in computational linguistics within the group of Natural Language Processing at the University of Córdoba, Argentina. We outline our future plans and spotlight some opportunities for collaboration. 1
sous le sceau de l’Université Européenne de Bretagne pour le grade de
Abstract. We introduce a technique for inducing a refinement of the set of part of speech tags re... more Abstract. We introduce a technique for inducing a refinement of the set of part of speech tags related to verbs. We cluster verbs according to their syntactic behavior in a dependency structure setting. The set of clusters is automatically determined by means of a quality measure over the probabilistic automata that describe words in a bilexical grammar. Each of the resulting clusters defines a new part of speech tag. We try out the resulting tag set in a state-of-the art phrase structure parser and we show that the induced part of speech tags significantly improve the accuracy of the parser. 1
We present a heuristic for automated lemma discovery that generates lemmas that might help ACL2 i... more We present a heuristic for automated lemma discovery that generates lemmas that might help ACL2 in proving theorems like ∀x : t1(x) = t2(x). This heuristic exploits manually created examples of x. These examples are used to produce ground terms t′1 and t ′ 2, for which semantical models are built. In order to generate useful intermediate lemmas, we search for a specific pattern in these two models. The lemmas suggested by our heuristic are of the form ∀x : h(g1(x)) = h(f1(x)). A lemma is suggested if and only if t′1 and t ′ 2 can be rewritten as terms containing subterms h(g1(a)) and h(f1(a)) respectively, such that h(g1(a)) = h(f1(a)) but g1(a) 6= f1(a). We explain how to search for these patterns and how to build lemmas from a collection of ground equalities.
2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), 2019
Machine learning (ML) classifiers are vulnerable to adversarial examples. An adversarial example ... more Machine learning (ML) classifiers are vulnerable to adversarial examples. An adversarial example is an input sample which is slightly modified to induce misclassification in an ML classifier. In this work, we investigate white-box and grey-box evasion attacks to an ML-based malware detector and conduct performance evaluations in a real-world setting. We compare the defense approaches in mitigating the attacks. We propose a framework for deploying grey-box and black-box attacks to malware detection systems.

Research in Computing Science, 2013
Unsupervised dependency parsing is acquiring great relevance in the area of Natural Language Proc... more Unsupervised dependency parsing is acquiring great relevance in the area of Natural Language Processing due to the increasing number of utterances that become available on the Internet. Most current works are based on Dependency Model with Valence (DMV) [12] or Extended Valence Grammars (EVGs) [11], in both cases the dependencies between words are modeled by using a fixed structure of automata. We present a framework for unsupervised induction of dependency structures based on CYK parsing that uses a simple rewriting techniques of the training material. Our model is implemented by means of a k-best CYK parser, an inductor for Probabilistic Bilexical Grammars (PBGs) [8] and a simple technique that rewrites the treebank from k trees with their probabilities. An important contribution of our work is that the framework accepts any existing algorithm for automata induction making the automata structure fully modifiable. Our experiments showed that, it is the training size that influences parameterization in a predictable manner. Such flexibility produced good performance results in 8 different languages, in some cases comparable to the state-of-the-art ones.
We present a tool that inspects and analyzes ACL2 books. The tool provides useful information tha... more We present a tool that inspects and analyzes ACL2 books. The tool provides useful information that might help the user to improve or to optimize a book. For any event e in a book, the tool looks for all the subsets of events in the book that make e admissible by ACL2. All sets that are found have the particular property that no subset with one fewer element makes e admissible. We show that our algorithm is exponential in the number of sets that have this property. We also show that it is correct and we prove that if the events in the book behave monotonically, the algorithm also finds all sets and that those sets are in fact the smallest sets that make an event admissible. We also describe some uses this information might have; in particular we show that there are books in the ACL2 standard distribution that can have 40% of their local lemmas eliminated.
El libro recoge ponencias y talleres seleccionados de JALIMI 2005 (Jornadas Argentinas de Lingüís... more El libro recoge ponencias y talleres seleccionados de JALIMI 2005 (Jornadas Argentinas de Lingüística Informática: Modelización e Ingeniería), y está organizado en nueve capítulos y un apéndice. Si bien hay sustantivas diferencias en los enfoques, las metodologías, las propiedades específicas estudiadas y las aplicaciones propuestas o proyectadas, todos los capítulos comunican resultados de investigaciones que pretenden contribuir a alcanzar el objetivo a largo plazo de la Lingüística Informática, a saber: emular en términos cibernéticos la extraordinaria capacidad humana de producir y comprender textos en lengua natural.
Advances in Natural Language Processing, 2004
We define Probabilistic Constrained W-grammars (PCWgrammars), a two-level formalism capable of ca... more We define Probabilistic Constrained W-grammars (PCWgrammars), a two-level formalism capable of capturing grammatical frameworks used in two state of the art parsers, namely bilexical grammars and stochastic tree substitution grammars. We provide embeddings of these parser formalisms into PCW-grammars, which allows us to derive properties about their expressive power and consistency, and relations between the formalisms studied.
Lecture Notes in Computer Science, 2001
Recent investigations have shown that the automated verification of continuous-time Markov chains... more Recent investigations have shown that the automated verification of continuous-time Markov chains (CTMCs) against CSL (Continuous Stochastic Logic) can be performed in a rather efficient manner. The state holding time distributions in CTMCs are restricted to negative exponential distributions. This paper investigates model checking of semi-Markov chains (SMCs), a model in which state holding times are governed by general distributions. We report on the semantical issues of adopting CSL for specifying properties of SMCs and present model checking algorithms for this logic.
Computational Linguistics and Intelligent Text Processing, 2006
We compare the contributions made by sequences of part of speech tags and sequences of phrase lab... more We compare the contributions made by sequences of part of speech tags and sequences of phrase labels for the task of grammatical relation finding. Both are used for grammar induction, and we show that English labels of grammatical relations follow a very strict sequential order, but not as strict as POS tags, resulting in better performance of the latter on the relation finding task.

Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference - CLAGI '09, 2009
Unambiguous Non-Terminally Separated (UNTS) grammars have properties that make them attractive fo... more Unambiguous Non-Terminally Separated (UNTS) grammars have properties that make them attractive for grammatical inference. However, these properties do not state the maximal performance they can achieve when they are evaluated against a gold treebank that is not produced by an UNTS grammar. In this paper we investigate such an upper bound. We develop a method to find an upper bound for the unlabeled F 1 performance that any UNTS grammar can achieve over a given treebank. Our strategy is to characterize all possible versions of the gold treebank that UNTS grammars can produce and to find the one that optimizes a metric we define. We show a way to translate this score into an upper bound for the F 1. In particular, we show that the F 1 parsing score of any UNTS grammar can not be beyond 82.2% when the gold treebank is the WSJ10 corpus.
Lecture Notes in Computer Science, 2010
Lecture Notes in Computer Science, 2008
We introduce a technique for inducing a refinement of the set of part of speech tags related to v... more We introduce a technique for inducing a refinement of the set of part of speech tags related to verbs. We cluster verbs according to their syntactic behavior in a dependency structure setting. The set of clusters is automatically determined by means of a quality measure over the probabilistic automata that describe words in a bilexical grammar. Each of the resulting clusters defines a new part of speech tag. We try out the resulting tag set in a state-of-the art phrase structure parser and we show that the induced part of speech tags significantly improve the accuracy of the parser.
Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04, 2004
We compare two approaches for describing and generating bodies of rules used for natural language... more We compare two approaches for describing and generating bodies of rules used for natural language parsing. In today's parsers rule bodies do not exist a priori but are generated on the fly, usually with methods based on n-grams, which are one particular way of inducing probabilistic regular languages. We compare two approaches for inducing such languages. One is based on n-grams, the other on minimization of the Kullback-Leibler divergence. The inferred regular languages are used for generating bodies of rules inside a parsing procedure. We compare the two approaches along two dimensions: the quality of the probabilistic regular language they produce, and the performance of the parser they were used to build. The second approach outperforms the first one along both dimensions.
Lecture Notes in Computer Science, 2012
We present iSat, a Python command line tool to analyze and find structure in propositional satisf... more We present iSat, a Python command line tool to analyze and find structure in propositional satisfiability problems. iSat offers an interactive shell to control propositional solvers and generate graph representations of the internal structure of the search space explored by them for visualization, with the final aim of providing a unified environment for propositional solving experimentation. iSat was designed to allow the simple integration of both new provers and new visualization graphs and statistics with a minimum of coding overhead.
Head Finders Inspection: An Unsupervised Optimization Approach
Lecture Notes in Computer Science, 2010
... Martın A. Domınguez 1 and Gabriel Infante-Lopez 1,2 1 Grupo de Procesamiento de Lenguaje Natu... more ... Martın A. Domınguez 1 and Gabriel Infante-Lopez 1,2 1 Grupo de Procesamiento de Lenguaje Natural Universidad Nacional de Córdoba - Argentina {mdoming,gabriel}@famaf.unc.edu.ar 2 Consejo Nacional de Investigaciones Cientıficas y Técnicas Abstract. ... NNP Haag VP ...
Journal of Logic, Language and Information, 2006
We examine the expressive power of probabilistic context free grammars (PCFGs), with a special fo... more We examine the expressive power of probabilistic context free grammars (PCFGs), with a special focus on the use of probabilities as a mechanism for reducing ambiguity by filtering out unwanted parses. Probabilities in PCFGs induce an ordering relation among the set of trees that yield a given input sentence. PCFG parsers return the trees bearing the maximum probability for a given sentence, discarding all other possible trees. This mechanism is naturally viewed as a way of defining a new class of tree languages. We formalize the tree language thus defined, study its expressive power, and show that the latter is beyond context freeness. While the increased expressive power offered by PCFGs helps to reduce ambiguity, we show that, in general, it cannot be decided whether a PCFG removes all ambiguities.
Uploads
Papers by Gabriel Infante-lopez