A Spoken Dialog System to Access a NewspaperWeb Site
2004
Sign up for access to the world's latest research
Abstract
In this paper we present a spoken dialog system which provides speech ac- cess to the information stored in a newspaper web site. The user is allowed to access the contents by query and browse mechanisms. The system is based on an Interaction Model and on an Information Model. The interaction model describes how the inter- action with the user is carried out. The information which supports that interaction is described by means of an Information Model. A decision tree and inverted indexes are used depending on the interaction modality chosen by the user.
Related papers
Annual Conference of the International Speech Communication Association, 2007
Currently there are no dialog systems that enable purely voice-based access to the unstructured information on websites such as Wikipedia. Such systems could be revolutionary for non-literate users in the developing world. To investigate interface issues in such a system, we developed VoicePedia, a telephone-based dialog system for searching and browsing Wikipedia. In this paper, we present the system, as
In this paper we propose the use of multilingual multichannel dialogue systems to improve the usability of web contents. In order to improve both the communication and the portability of those dialogue systems we propose the separation of the general components from the application-specific, language-specific and channel-specific aspects. This paper describes the multilingual dialogue system for accessing web contents we develop following this proposal. It is particularly focused two main components of the system: the dialogue manager and the natural language generator.
2004
What would you say if your refrigerator told you, "You're having some friends round for hot chocolate later. Maybe you should order two cartons of milk"? Of course, in Spoken Dialogue Technology, Michael McTear will not give an answer to the question of whether talking to domestic appliances makes sense, but he indicates that even a normal household, for instance, may offer a wide field of application for spokenlanguage dialogue systems in the near future. Consequently his book primarily focuses on theory and practice of these systems. Addressing undergraduate students as well as postgraduate researchers and practitioners in human-computer interfaces, the book is subdivided into three parts which meet the readers' needs: "Background to Spoken Dialogue Technology" (Chapters 1-5), "Developing Spoken Dialogue Applications" (Chapters 6-11), and "Advanced Applications" (Chapters 12-14). Chapter 1, "Talking with Computers: Fact or Fiction," and Chapter 2, "Spoken Dialogue Applications: Research Directions and Commercial Deployment," present recent products and aspects of dialogue technology as well as historical linguistic and artificial intelligence approaches to dialogue and simulated conversation. Aspects of present-day commercial use of spoken dialogue technology are also discussed. In Chapter 3, "Understanding Dialogue," the term dialogue is defined, and four of its key characteristics-dialogue as discourse, dialogue as purposeful activity, dialogue as collaborative activity, and utterances in dialogue-and its structures and processes are described in detail. Chapter 4 gives an overview of the components of a spoken language dialogue system: speech recognition, language understanding, language generation, and text-to-speech synthesis. The central component (i.e., dialogue management) is specified in Chapter 5. Here, dialogue initiative (system initiative, user initiative, and mixed initiative), dialogue control (finite-state-based, frame-based, and agent-based control), and grounding (how to process the user's input) are described. Furthermore, knowledge sources (dialogue history, task record, world knowledge model, domain model, generic model, and user model) and problems that arise when interacting with an external knowledge source are discussed. The second part starts with dialogue engineering, which can be subdivided into analysis and specification of requirements, design, implementation, testing, and evaluation of a dialogue system. The use-case analysis includes user profile (type of user, language, user's experience level, etc.) and usage profile (frequency of use, input/output device type, environment, etc.). The spoken-language requirements can
2007
Interacting with machines that listen, understand a nd react to human stimuli has been for many years the holy grai l of scientists across disciplines. In the last three de cades scientists have made great contributions to the training, desi gn and testing of conversational systems. In this paper we present the fundamentals of Spoken Dialog Systems (SDS) from Automatic Speech Recognition, to Spoken Language Understanding and to Text-to-Speech Synthesis. We report on the spoken dialog system architecture and experimen ts within a university help-desk application.
… International Conference on …, 2000
This paper describes a dialogue manager and its interaction with semantics and context tracking in a spoken dialogue system developed for general information retrieval and transaction applications. The dialogue system supports the following basic functionality: electronic form filling, database query, result navigation, attribute-value pair referencing, and value and reference resolution. General data structures and algorithms for representing and resolving ambiguity in a spoken dialogue system and a parsimonious parameterization for all application-dependent semantic and dialogue information are proposed. Dialogue management algorithms examine the semantics and dialogue state and adapt to the user's needs and task necessities. These algorithms are applied to a travel reservation application developed under the auspices of the DARPA Communicator project. The proposed algorithms are application-independent and facilitate ease of developing new spoken dialogue systems by changing only the semantics encoded in the prototype tree and the domain-dependent templates used by such components as the parser and the prompt generator.
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996
A popular approach to dialogue management is based on a finitestate model, where user utterances trigger transitions between the dialogue states, and these states, in turn, determine the system's response. This paper describes an alternative dialogue planning algorithm based on the notion of filling in an electronic form, or "Eform." Each slot has associated prompts that guide the user through the dialogue, and a priority that determines the order in which the system tries to acquire information. These slots can be optional or mandatory. However, the user is not restricted to follow the system's lead, and is free to ignore the prompts and take the initiative in the dialogue. The E-form-based dialogue planner has been used in an application to search a database of used car advertisements. The goal is to assist the user in selecting, from this database, a small list of cars which match their constraints. For a large number of dialogues collected from over 600 naive users, we found over 70% compliance in answering specific system prompts.
In spoken dialog systems, information must be presented sequentially, making it difficult to quickly browse through a large number of options. Recent studies have shown that user satisfaction is negatively correlated with dialog duration, suggesting that systems should be designed to maximize the efficiency of the interactions. Analysis of the logs of 2000 dialogs between users and nine different dialog systems reveals that a large percentage of the time is spent on the information presentation phase, and thus there is potentially a large pay-off to be gained from optimizing information presentation in spoken dialog systems. This article proposes a method that improves the efficiency of coping with large numbers of diverse options by selecting options and then structuring them based on a model of the user’s preferences. This enables the dialog system to automatically determine trade-offs between alternative options that are relevant to the user and present these trade-offs explicitly. Multiple attractive options are thereby structured such that the user can gradually refine her request to find the optimal trade-off. To evaluate and challenge our approach, we conducted a series of experiments that test the effectiveness of the proposed strategy. Experimental results show that basing the content structuring and content selection process on a user model increases the efficiency and effective- ness of the user’s interaction. Users complete their tasks more successfully and more quickly. Furthermore, user surveys revealed that participants found that the user-model based system presents complex trade-offs understandably and increases overall user satisfaction. The experiments also indicate that presenting users with a brief overview of options that do not fit their requirements significantly improves the user’s overview of available options, also making them feel more confident in having been presented with all relevant options.
Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing - HLT '05, 2005
This paper addresses a dialogue strategy to clarify and constrain the queries for speech-driven document retrieval systems. In spoken dialogue interfaces, users often make utterances before the query is completely generated in their mind; thus input queries are often vague or fragmental. As a result, usually many items are matched. We propose an efficient dialogue framework, where the system dynamically selects an optimal question based on information gain (IG), which represents reduction of matched items. A set of possible questions is prepared using various knowledge sources. As a bottom-up knowledge source, we extract a list of words that can take a number of objects and potentially causes ambiguity, using a dependency structure analysis of the document texts. This is complemented by top-down knowledge sources of metadata and handcrafted questions. An experimental evaluation showed that the method significantly improved the success rate of retrieval, and all categories of the prepared questions contributed to the improvement.
The user's interaction with internet information services is usually based on dealing with a graphical interface, but because of the improvements in speech technology over the last years more and more phone speech information services appear. In March 2004 the W3C published the VoiceXML 2.0 recommendation to bring the advantages of web-based development and content delivery to interactive voice response information systems. Although classical modeling strategies for building graphical user interfaces can also be used for their voice based counterparts, there are certain aspects where voice response systems need special attention. For example, an intuitive navigation structure is crucial due to limited possibilities in expressing information to the user. We present a systematic development process which integrates conceptual modeling, rapid prototyping, simulation and system documentation for voice based information services. Following this approach we have implemented a voice based interface for an e-Democracy portal which will be used as an example within this paper.
Information Processing & Management, 1989
The work described in this article is part of a two-year research program to investigate and implement a voice interface for the British Library Blaise Online Information Retrieval System. Preliminary work consisted of the evaluation of an existing voice-accessed document database system to gain insights into the problems of voice interface design for online searching. Following that, a study was made to determine the syntax rules of the Blaise query language. Using this information, the new interface has been designed and software implementation of its core achieved. The main lessons learned from the evaluation of the existing speech interface are: (a) take full advantage of the syntax of the query language to limit the difficulty of the speech recognition process; and (2) avoid antagonizing the user by providing full control of the configuration of the interface, enabling varying degrees of audio reinforcement of visually presented data. Other considerations suggested the use of PROLOG as the most suitable development language for such an interface. Results of the evaluation show that the use of currently available speech recognition and synthesis hardware, along with intelligent software, can provide an interface well suited to the needs of online information retrieval systems.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (10)
- References
- V. Zue, S. Seneff, J. R. Glass, J. Polifroni, C. Pao, T. J. Hazen, and L. Hetherington, "JUPITER: A Telephone- Based Conversational Interface for Weather Information," IEEE Transactions on Speech and Audio Processing, vol. 8, no. 1, January 2000.
- L. Lamel, S. Rosset, J. Gauvain, S. Bennacef, M. Garnier- Rizet, and B. Prouts, "The LIMSI ARISE System," Speech Communication, vol. 31, no. 4, August 2000.
- B. Vesnicer, J. Zibert, S. Dobrisek, N. Pavesic, and F. Mi- helic, "A Voice-driven Web Browser for Blind People," in Eurospeech, 2003.
- S. Goose, M. Newman, C. Schmidt, and L. Hue, "Enhanc- ing Web Accessibility Via the Vox Portal and a Web Hosted Dynamic HTML & VoxML Converter," in International World Wide Web Conference, May 2000.
- J. Freire, B. Kumar, and D. F. Lieuwen, "WebViews: Ac- cessing Personalized Web Content and Services," in Inter- national World Wide Web Conference, 2001.
- J. Polifroni, G. Chung, and S. Seneff, "Towards the Au- tomatic Generation of Mixed-Initiative Dialogue Systems from Web Content," in Eurospeech, 2003.
- E. Chang, F. Seide, H. Meng, Z. Chen, Y. Shi, and Y. Li, "A System for Spoken Query Information Retrieval on Mobile Devices," IEEE Transactions on Speech and Audio Pro- cessing, vol. 10, no. 8, November 2002.
- G. Salton, A. Wong, and C. S. Yang, "A Vector Space Model for Automatic Indexing," Communications of the ACM, vol. 18, no. 11, November 1975.
- K. Hone and R. Graham, "Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI)," Natural Language Engineering, vol. 6, no. 3/4, 2000.