ian lewin

Dynamic quantification in logic and computational semantics

Dynamic Interpretation can be characterized by the idea that when we understand some part of lang... more Dynamic Interpretation can be characterized by the idea that when we understand some part of language, we not only use contextual features to help us in interpreting it but we also, as a result of interpreting it, generate a new context in which further interpretation can take place. The output context for one expression can become the input context for another. Contexts are thereby threaded through the interpretation of language. This thesis extends current ideas in Dynamic Semantics in two directions: first, by defining a precise dynamic truth definition for a fragment of English, rather than a strictly logical language; and secondly, by defining a dynamic logic with binary structured quantifiers where information from the first argument is threaded into the second.

Download

Entity Recognition in Parallel Multi-lingual Biomedical Corpora: The CLEF-ER Laboratory Overview

Lecture Notes in Computer Science, 2013

ABSTRACT

Flexible dialogue management and cost-models

Flexibility in dialogue management requires not just the ability to understand and respond to a g... more Flexibility in dialogue management requires not just the ability to understand and respond to a greater range of user utterance types (or moves), but also the ability to generate them and to do so strategically in accordance with some notion of costs and benets. We ex- plore this issue in the context of the Infor- mation State Update model of dialogue. We add costs and preferences to a simple instan- tiation of the model and explore the added e xibility this brings and also link the inclu- sion of costs to other developments of the model. We compare this work to the work in reinforcement learning which also includes a notion of cost and reward.

Download

Inference in the Resolution of Ellipsis

We discuss the treatment of ellipsis in a spoken languageroute planning enquiry service which use... more We discuss the treatment of ellipsis in a spoken languageroute planning enquiry service which uses theCore Language Engine (CLE) as its linguistic processor.We show how use of the CLE allows us to separatethe interpretation of ellipsis in a dialogue contextfrom the more general issue of dialogue managementin a dialogue context and, especially, to factor out thelinguistic influences on such interpretation and placethem where they belong - in the linguistic processor.The route planning...

Download

Using hand-crafted rules and machine learning to infer SciXML document structure

SciXML is designed to represent the standard hierarchical structure of scientic articles and repr... more SciXML is designed to represent the standard hierarchical structure of scientic articles and represents a candidate common document representation framework for text-mining. Such a framework can greatly facilitate interoperability of text-mining tools. However, no publisher actually generates SciXML. We describe a new framework for inferring SciXML from a presentational level of description, such as PDF, using general purpose compo- nents such as Optical Character Recognition and expert hand-coded rules and then using supervised machine learning to provide the per-journal adaptation required for the differ- ent publication styles embodied in different journals. Adaptation via supervised machine learning can often be hampered by the effort involved in generating the necessary gold standard training material. In our framework, the effort required is substantially reduced by a) the initial processing by expert hand-coded rules which produces a reasonable ìrst draftî tagging and b) the ...

Download

Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)

PLoS ONE, 2013

Motivation: Biomedical entities, their identifiers and names, are essential in the representation... more

Download

Language-processing strategies and mixed-initiative dialogues

Electronic Transactions on Artificial Intelligence, 1999

... BibTeX. @MISC{Boye99language-processingstrategies, author = {Johan Boye and Mats Wirén and Ma... more

UKPMC: a full text article resource for the life sciences

Nucleic Acids Research, 2011

UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the o... more UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first 'mirror' site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access biomedical literature. UKPMC (http:// ukpmc.ac.uk) has undergone considerable development since its inception in 2007 and now includes both a UKPMC and PubMed search, as well as

Download

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus

Journal of biomedical semantics, 2011

Competitions in text mining have been used to measure the performance of automatic text processin... more Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions.

Dialogue moves in negotiative dialogues

Project deliverable, 2000

Comparing grammar-based and robust approaches to speech understanding: a case study

Seventh European …, 2001

Previous work has demonstrated the success of statistical language models when enough training da... more

Download

Language processing for spoken dialogue systems: Is shallow parsing enough?

ESCA Tutorial and …, 1999

With maturing speech technology, spoken dialogue systems are increasingly moving from research pr... more

Download

Natural Language Processing in aid of FlyBase curators

BMC Bioinformatics, 2008

Background: Despite increasing interest in applying Natural Language Processing (NLP) to biomedic... more

Download

Harmonization of gene/protein annotations: towards a gold standard MEDLINE

Bioinformatics, 2012

The recognition of named entities (NER) is an elementary task in biomedical text mining. A number... more The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine-learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs). Results: We present Totum, a machine learning solution that harmonizes gene/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving ≈70%) in exact alignment and 22% (achieving ≈82%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts. Availability and implementation: Totum is implemented in Java and its resources are available at

Download

Retrieving Hierarchical Text Structure from Typeset Scientific Articlesa Prerequisite for E-Science Text Mining

Proc. of the 4th UK E-Science …, 2005

Despite the growth and development of the web in scientific publishing, there remain significant ... more

Download

Parsing as deduction: rules versus principles

Adding intelligent help to mixed-initiative spoken dialogue systems

Proc. ICSLP, 2002

The rapidly expanding voice recognition industry has so far shown a preference for grammar-based ... more The rapidly expanding voice recognition industry has so far shown a preference for grammar-based language modelling, despite the better overall performance of statistical language modelling. Given that the advantages of the grammar-based approach make it unlikely to be replaced as the primary solution in the near future, it is natural to wonder whether some combination of the two approaches may prove useful. Here, we describe an implemented system that uses statistical language modelling and a decision-tree classifier to provide the user with some feedback when grammarbased recognition fails. Users of this system had more successful interactions than did users of a control system.

Download

CALBC: Releasing the Final Corpora

Abstract A number of gold standard corpora for named entity recognition are available to the publ... more

Siridus System Architecture and Interface Report (Baseline)

EU Fifth Framework …

Siridus System Architecture and Interface Report (Baseline) Ian Lewin, CJ Rupp, Jim Hieronymus, D... more

Download

Survey of Existing Interactive Systems

Deliverable D, 1999

Download

Uploads

Papers by ian lewin

Log In