Speech Technology

Jiyeon Park

doi:10.1007/978-0-387-73819-2

Speech Technology

Jiyeon Park

https://doi.org/10.1007/978-0-387-73819-2

visibility

…

description

348 pages

link

1 file

Abstract

except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

References (1,014)

Allen, J. (2002). From Lord Rayleigh to Shannon: How do we decode speech? In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Orlando, FL, http://www.auditorymodels.org/jba/PAPERS/ICASSP/Plenary_Allen.asp.html.
ATIS Technical Reports (1995). Proc. ARPA Spoken Language Systems Technology Workshop, Austin, TX, 241-280.
Beek, B., Neuberg, E., Hodge, D. (1977). An assessment of the technology of automatic speech recognition for military applications. IEEE Trans. Acoust., Speech, Signal Process., 25, 310-322.
Bridle, J. S., Brown, M. D. (1979). Connected word recognition using whole word templates. In: Proc. Inst. Acoustics Autumn Conf., 25-28.
Chou, W. (2003). Minimum classification error (MCE) approach in pattern recognition. Chou, W., Juang, B.-H. (eds) Pattern Recognition in Speech and Language Processing. CRC Press, New York, 1-49.
Chow, Y. L., Dunham, M. O., Kimball, O. A. (1987). BYBLOS, the BBN continuous speech recognition system. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Dallas, TX, 89-92.
Davis, K. H., Biddulph, R., Balashek, S. (1952). Automatic recognition of spoken digits. J. Acoust. Soc. Am., 24 (6), 637-642.
Ferguson, J. (ed) (1980). Hidden Markov Models for Speech. IDA, Princeton, NJ.
Forgie, J. W., Forgie, C. D. (1959). Results obtained from a vowel recognition computer program. J. Acoust. Soc. Am., 31 (11), 1480-1489.
Fry, D. B., Denes, P. (1959). Theoretical aspects of mechanical speech recognition. The design and operation of the mechanical speech recognizer at University College London. J. British Inst. Radio Eng., 19 (4), 211-229.
Furui, S. (1986). Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust., Speech, Signal Process., 34, 52-59.
Furui, S. (2004). Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Trans. Speech Audio Process., 12, 401-408.
Furui, S. (2004). Fifty years of progress in speech and speaker recognition. In: Proc. 148th Acoustical Society of America Meeting, San Diego, CA, 2497.
Furui, S. (2005). Recent progress in corpus-based spontaneous speech recognition. IEICE Trans. Inf. Syst., E88-D (3), 366-375.
Gales, M. J. F., Young, S. J. (1993). Parallel model combination for speech recognition in noise. Technical Report, CUED/F-INFENG/TR135.
Itakura, F. (1975). Minimum prediction residual applied to speech recognition. IEEE Trans. Acoust., Speech, Signal Process., 23, 67-72.
Jelinek, F. (1985). The development of an experimental discrete dictation recognizer. Proc. IEEE, 73 (11), 1616-1624.
Jelinek, F., Bahl, L., Mercer, R. (1975). Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Trans. Inf. Theory, 21, 250-256.
Juang, B. H., Furui, S. (2000). Automatic speech recognition and understanding: A first step toward natural human-machine communication. Proc. IEEE, 88 (8), 1142-1165.
Juang, B. H., Rabiner, L. R. (2005). Automatic speech recognition: History. Brown, K. (ed) Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, New York, 11, 806-819.
Junqua, J. C., Haton, J. P. (1996). Robustness in Automatic Speech Recognition. Kluwer, Boston.
Katagiri, S. (2003). Speech pattern recognition using neural networks. Chou, W., Juang, B. H. (eds) Pattern Recognition in Speech and Language Processing. CRC Press, New York, 115-147.
Kawahara, T., Lee, C. H., Juang, B. H. (1998). Key-phrase detection and verification for flexible speech understanding. IEEE Trans. Speech Audio Process, 6, 558-568.
Klatt, D. (1977). Review of the ARPA speech understanding project. J. Acoust. Soc. Am., 62 (6), 1324-1366.
Koo, M. W., Lee, C. H., Juang, B. H. (2001). Speech recognition and utterance verification based on a generalized confidence score. IEEE Trans. Speech Audio Process, 9, 821-832.
Lee, C. H., Giachin, E., Rabiner, L. R., Pieraccini, R., Rosenberg, A. E. (1990). Acoustic modeling for large vocabulary speech recognition. Comput. Speech Lang., 4, 127-165.
Lee, C. H., Rabiner, L. R. (1989). A frame synchronous network search algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process, 37, 1649-1658.
Lee, K. F., Hon, H., Reddy, R. (1990). An overview of the SPHINX speech recognition system. IEEE Trans. Acoust., Speech, Signal Process, 38, 600-610.
Leggetter, C. J., Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang., 9, 171-185.
Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Mag., 4 (2), 4-22.
Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communication, 22, 1-15.
Liu, Y., Shriberg, E., Stolcke, A., Peskin, B., Ang, J., Hillard, D., Ostendorf, M., Tomalin, M., Woodland, P. C., Harper, M. (2005). Structural metadata research in the EARS program. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Montreal, Canada, V, 957-960.
Lowerre, B. (1980). The HARPY speech understanding system. Lea, W (ed) Trends in Speech Recognition. Prentice Hall, NJ, 576-586.
Martin, T. B., Nelson, A. L., Zadell, H. J. (1964). Speech recognition by feature abstraction techniques. Technical Report AL-TDR-64-176, Air Force Avionics Lab.
Moore, R. C. (1997). Using natural-language knowledge sources in speech recognition. Ponting, K. (ed) Computational Models of Speech Pattern Processing. Springer, Berlin, 304-327.
Myers, C. S., Rabiner, L. R. (1981). A level building dynamic time warping algo- rithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process., 29, 284-297.
Nagata, K., Kato, Y., Chiba, S. (1963). Spoken digit recognizer for Japanese language. NEC Res. Develop., 6.
Olson, H. F., Belar, H. (1956). Phonetic typewriter. J. Acoust. Soc. Am., 28 (6), 1072-1081.
Paul, D. B. (1989). The Lincoln robust continuous speech recognizer. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Glasgow, Scotland, 449-452.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77 (2), 257-286.
Rabiner, L. R., Juang, B. H. (1993). Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliff, NJ.
Rabiner, L. R., Levinson, S. E., Rosenberg, A. E. (1979). Speaker independent recognition of isolated words using clustering techniques. IEEE Trans. Acoust., Speech, Signal Process., 27, 336-349.
Reddy, D. R. (1966). An approach to computer speech recognition by direct analysis of the speech wave. Technical Report No. C549, Computer Science Department, Stanford University, Stanford.
Sakai, T., Doshita, S. (1962). The phonetic typewriter, information processing. In: Proc. IFIP Congress, Munich.
Sakoe, H. (1979). Two level DP matching -a dynamic programming based pattern matching algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process., 27, 588-595.
Sakoe, H., Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust., Speech, Signal Process., 26, 43-49.
Shinoda, K., Lee, C. H. (2001). A structural Bayes approach to speaker adaptation. IEEE Trans. Speech Audio Process., 9, 276-287.
Soltau, H., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Zweig, G. (2005). The IBM 2004 conversational telephone system for rich transcription. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Montreal, Canada, I, 205-208.
Suzuki, J., Nakata, K. (1961). Recognition of Japanese vowels -preliminary to the recognition of speech. J. Radio Res. Lab., 37 (8), 193-212.
Tappert, C., Dixon, N. R., Rabinowitz, A. S., Chapman, W. D. (1971). Automatic recognition of continuous speech utilizing dynamic segmentation, dual classification, sequential decoding and error recovery. Rome Air Dev. Cen, Rome, NY, Technical Report TR 71-146.
Varga, P., Moore, R. K. (1990). Hidden Markov model decomposition of speech and noise. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Albuquerque, New Mexico, 845-848.
Velichko, V. M., Zagoruyko, N. G. (1970). Automatic recognition of 200 words. Int. J. Man- Machine Studies, 2, 223-234.
Vintsyuk, T. K. (1968). Speech discrimination by dynamic programming. Kibernetika, 4 (2), 81-88.
Viterbi, J. (1967). Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Inf. Theory, 13, 260-269.
Waibel, A., Hanazawa, T., Hinton, G., Shiano, K., Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust., Speech, Signal Process., 37, 393-404.
Weintraub, M., Murveit, H., Cohen, M., Price, P., Bernstein, J., Bell, G. (1989). Linguistic constraints in hidden Markov model based speech recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Glasgow, Scotland, 699-702.
Zue, V., Glass, J., Phillips, M., Seneff, S. (1989). The MIT summit speech recognition system, a progress report. In: Proc. DARPA Speech and Natural Language Workshop, Philadelphia, PA, 179-189.
Zweig, G. (1998). Speech recognition with dynamic Bayesian networks. Ph.D. Thesis, University of California, Berkeley.
Baum, L., Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Statistics, 37, 1554-1563.
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philos. Trans. Roy. Soc. Lond., 53, 370-418.
Bell, A. (1922). Prehistoric telephone days. Natl. Geographic Mag., 41, 223-242.
Bellmann, R. (1957). Dynamic Programming. Princeton University Press, Princeton, USA.
Bennett, C., Black, A. (2006). The Blizzard Challenge 2006. In: Blizzard Challenge Workshop, Pittsburgh, USA.
Bennett, W. (1983). Secret telephony as a historical example of spread-spectrum communica- tions. IEEE Trans. Commun., 31(1), 98-104.
Beutnagel, M., Conkie, A., Schroeter, J., Stylianou, Y., Syrdal, A. (2006). The AT&T Next- Gen TTS system. In: Proc. TC-Star Workshop, Barcelona, Spain.
Black, A., Tokuda, K. (2005). Blizzard Challenge -2005: Evaluating corpus-based speech synthesis on common datasets. In: Proc. Interspeech, Lisbon, Portugal.
Black, A., Zen, H., Tokuda, K. (2007). Statistical parametric synthesis. In: Proc. ICASSP, Honululu, USA.
Bonafonte, A., Höge, H., Tropf, H., Moreno, A., v. d. Heuvel, H., Sündermann, D., Ziegenhain, U., Pérez, J., Kiss, I. (2005). TC-Star: Specifications of language resources for speech synthesis. Technical Report.
Butler, E. (1948). The Myth of the Magus. Cambridge University Press, Cambridge, UK.
Darlington, O. (1947). Gerbert, the teacher. Am. Historical Rev., 52, 456-476.
Darwin, E. (1806). The Temple of Nature. J. Johnson, London, UK.
Dudley, H., Tarnoczy, T. (1950). The speaking machine of Wolfgang von Kempelen. J. Acoust. Soc. Am., 22(2), 151-166.
Dutoit, T. (1997). An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht, Netherlands.
Duxans, H., Erro, D., Pérez, J., Diego, F., Bonafonte, A., Moreno, A. (2006). Voice conversion of non-aligned data using unit selection. In: Proc. TC-Star Workshop, Barcelona, Spain.
Flanagan, J. (1972). Voices of men and machines. J. Acoust. Soc. Am., 51, 1375-1387.
Fraser, M. King, S. (2007). The Blizzard challenge 2007. In: Proc. ISCA Workshop on Speech Synthesis, Bonn, Germany.
Hand, D., Smyth, P., Mannila, H. (2001). Principles of Data Mining. MIT Press, Cambridge, USA.
Höge, H. (2002). Project proposal TC-STAR -Make speech to speech translation real. In: Proc. LREC, Las Palmas, Spain.
Holmes, J., Holmes, W. (2001). Speech Synthesis and Recognition. Taylor and Francis, London, UK.
Hunt, A., Black, A. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. ICASSP, Atlanta, USA.
Kacic, Z. (2004-2007). Proc. 11th-14th Int. Workshops on Advances in Speech Technology. University of Maribor, Maribor, Slovenia.
Kain, A., Macon, M. (1998). Spectral voice conversion for text-to-speech synthesis. In: Proc. ICASSP, Seattle, USA.
Kaszczuk, M., Osowski, L. (2006). Evaluating Ivona speech synthesis system for Blizzard Challenge 2006. In: Blizzard Challenge Workshop, Pittsburgh, USA.
Kominek, J., Black, A. (2004). The CMU arctic speech databases. In: Proc. ISCA Workshop on Speech Synthesis, Pittsburgh, USA.
Kostelanetz, R. (1996). Classic Essays on Twentieth-Century Music. Schirmer Books, New York, USA.
Ladefoged, P. (1998). A Course in Phonetics. Harcourt Brace Jovanovich, New York, USA.
Leonard, R., Doddington, G. (1982). A Speaker-Independent Connected-Digit Database. Texas Instruments, Dallas, USA.
Levenshtein, V. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Dokl., 10, 707-710.
Lindsay, D. (1997). Talking head. Am. Heritage Invention Technol., 13(1), 57-63.
Maia, R., Toda, T., Zen, H., Nankaku, Y., Tokuda, K. (2007). An excitation model for HMM- based speech synthesis based on residual modeling. In: Proc. ISCA Workshop on Speech Synthesis, Bonn, Germany.
Markel, J., Gray, A. (1976). Linear Prediction of Speech. Springer, New York, USA.
Mashimo, M., Toda, T., Shikano, K., Campbell, N. (2001). Evaluation of cross-language voice conversion based on GMM and STRAIGHT. In: Proc. Eurospeech, Aalborg, Denmark.
Masuko, T. (2002). HMM-based speech synthesis and its applications. PhD thesis, Tokyo Institute of Technology, Tokyo, Japan.
Mostefa, D., Garcia, M.-N., Hamon, O., Moreau, N. (2006). TC-Star: D16 Evaluation Report. Technical Report.
Mostefa, D., Hamon, O., Moreau, N., Choukri, K. (2007). TC-Star: D30 Evaluation Report. Technical Report.
Moulines, E. and Sagisaka, Y. (1995). Voice conversion: State of the art and perspectives. Speech Commun., 16(2), 125-126.
Ni, J., Hirai, T., Kawai, H., Toda, T., Tokuda, K., Tsuzaki, M., Sakai, S., Maia, R., Nakamura, S. (2007). ATRECSS -ATR English speech corpus for speech synthesis. In: Proc. ISCA Workshop on Speech Synthesis, Bonn, Germany.
Nurminen, J., Popa, V., Tian, J., Tang, Y., Kiss, I. (2006). A parametric approach for voice conversion. In: Proc. TC-Star Workshop, Barcelona, Spain.
Pallet, D. (1987). Test procedures for the March 1987 DARPA Benchmark Tests. In: Proc. DARPA Speech Recognition Workshop, San Diego, USA.
Pérez, J., Bonafonte, A., Hain, H.-U., Keller, E., Breuer, S., Tian, J. (2006). ECESS inter- module interface specification for speech synthesis. In: Proc. LREC, Genoa, Italy.
Pfitzinger, H. (2006). Five dimensions of prosody: Intensity, intonation, timing, voice quality, and degree of reduction. In: Proc. Speech Prosody, Dresden, Germany.
Poe, E. (1836). Maelzel's Chess Player. Southern Literary Messenger, 2(5), 318-326.
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77(2), 257-286.
Rabiner, L., Rosenberg, A., Levinson, S. (1978). Considerations in dynamic time warping algorithms for discrete word recognition. IEEE Trans. Acoustics, Speech Signal Process., 26(6), 575-582.
Ritter von Kempelen, W. (1791). Mechanismus der menschlichen Sprache nebst der Beschreibung einer sprechenden Maschine. J. V. Degen, Vienna, Austria.
Stork, D. (1996). HAL's Legacy: 2001's Computer as Dream and Reality. MIT Press, Cambridge, USA.
Stylianou, Y., Cappé, O., Moulines, E. (1995). Statistical methods for voice quality transfor- mation. In: Proc. Eurospeech, Madrid, Spain.
Stylianou, Y., Laroche, J., Moulines, E. (1995). High-quality speech modification based on a harmonic + noise model. In: Proc. Eurospeech, Madrid, Spain.
Suendermann, D., Raeder, H. (1997). Digital Emperator: Out of O2. d.l.h.-productions, Cologne, Germany.
Sündermann, D., Bonafonte, A., Ney, H., Höge, H. (2005). A study on residual prediction techniques for voice conversion. In: Proc. ICASSP, Philadelphia, USA.
Sündermann, D., Höge, H., Bonafonte, A., Ney, H., Hirschberg, J. (2006). TC-Star: Cross- language voice conversion revisited. In: Proc. TC-Star Workshop, Barcelona, Spain.
Young, S., Woodland, P., Byrne, W. (1993). The HTK Book, Version 1.5. Cambridge University Press, Cambridge, UK.
Zen, H., Toda, T. (2005). An overview of Nitech HMM-based speech synthesis system for Blizzard Challenge 2005. In: Proc. Interspeech, Lisbon, Portugal.
Ai, H., Raux, A., Bohus, D., Eskenazi, M., Litman, D. (2007). Comparing spoken dialog corpora collected with recruited subjects versus real users. In: Proc. 8th SIGDial Workshop on Discourse and Dialogue, Antwerp, Belgium.
Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., Stent, A. (2000). An architecture for a generic dialog shell. Nat. Lang. Eng., 6 (3), 1-16.
Allen, J., Perrault, C.R. (1980). Analyzing intention in utterances. Artif. Intell., 15, 143-178.
Allen, J. F., Schubert, L. K., Ferguson, G., Heeman, P., Hwang, C. H., Kato, T., Light, M., Martin, N. G., Miller, B. W. Poesio, M., Traum, D. R. (1995). The TRAINS Project: A case study in building a conversational planning agent. J. Exp. Theor. AI, 7, 7-48. Also available as TRAINS Technical Note 94-3 and Technical Report 532, Computer Science Department, University of Rochester, September 1994.
Allwood, J. (1976). Linguistic Communication as Action and Cooperation. Department of Linguistics, University of Göteborg. Gothenburg Monographs in Linguistics, 2.
Allwood, J. (1977). A critical look at speech act theory. In: Dahl, Ö. (ed.) Logic, Pragmatics, and Grammar, Studentlitteratur, Lund.
Allwood, J. (1994). Obligations and options in dialogue. Think Q., 3, 9-18.
Allwood, J. Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. In: Martin, J. C., Paggio, P., Kuenlein, P., Stiefelhagen, R., Pianesi F. (eds), Multimodal Corpora For Modelling Human Multimodal Behaviour. Int. J. Lang. Res. Eval. (Special Issue), 41 (3-4), 273-287.
Allwood, J., Traum, D., Jokinen, K. (2000). Cooperation, dialogue, and ethics. Int. J. Hum. Comput. Studies, 53, 871-914.
Anderson, A. H., Bader, M., Bard, E. G., Boyle, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H. S., Weinert, R. (1991). The HCRC map task corpus. Lang. Speech, 34 (4), 351-366.
Appelt, D. E. (1985). Planning English Sentences. Cambridge University Press, Cambridge.
Aust, H., Oerder, M., Seide, F., Steinbiss, V. (1995). The Philips automatic train timetable information system. Speech Commun., 17, 249-262.
Austin, J. L. (1962). How to do Things with Words. Clarendon Press, Oxford.
Axelrod, R. (1984). Evolution of Cooperation. Basic Books, New York.
Ballim, A., Wilks, Y. (1991). Artificial Believers. Lawrence Erlbaum Associates, Hillsdale, NJ.
Black, W., Allwood, J., Bunt, H., Dols, F., Donzella, C., Ferrari, G., Gallagher, J., Haidan, R., Imlah, B., Jokinen, K., Lancel, J.-M., Nivre, J., Sabah, G., Wachtel, T. (1991). A pragmatics based language understanding system. In: Proc. ESPRIT Conf. Brussels, Belgium.
Bolt, R.A. (1980). Put-that-there: Voice and gesture at the graphic interface. Comput. Graphics, 14 (3), 262-270.
Bos, J., Klein, E., Oka T. (2003). Meaningful conversation with a mobile robot. In: Proceedings of the Research Note Sessions of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL'03), Budapest, 71-74.
Brown, P., Levinson, S. C. (1999) [1987]. Politeness: Some universals in language usage. In: Jaworski, A., Coupland, N. (eds) The Discourse Reader. Routledge, London, 321-335.
Bunt, H. C. (1990). DIT -Dynamic interpretation in text and dialogue. In: Kálmán, L., Pólos, L. (eds) Papers from the Second Symposium on Language and Logic. Akademiai Kiadó, Budapest.
Bunt, H. C. (2000). Dynamic interpretation and dialogue theory. In: Taylor, M. M. Néel, F., Bouwhuis, D. G. (eds) The Structure of Multimodal Dialogue II., John Benjamins, Amsterdam, 139-166.
Bunt, H. C. (2005). A framework for dialogue act specification. In: Fourth Workshop on Multimodal Semantic Representation (ACL-SIGSEM and ISO TC37/SC4), Tilburg.
Carberry, S. (1990). Plan Recognition in Natural Language Dialogue. MIT Press, Cambridge, MA.
Carletta, J. (2006). Announcing the AMI Meeting Corpus. ELRA Newslett., 11 (1), 3-5.
Carletta, J., Dahlbäck, N., Reithinger, N., Walker, M. (eds) (1997). Standards for Dialogue Coding in Natural Language Processing. Dagstuhl-Seminar Report 167.
Carlson R. (1996). The dialog component in the Waxholm system. In: LuperFoy, S., Nijholt, A., Veldhuijzen van Zanten, G. (eds) Proc. Twente Workshop on Language Technology. Dialogue Management in Natural Language Systems (TWLT 11), Enschede, The Netherlands, 209-218.
Chin, D. (1989). KNOME: Modeling what the user knows in UC. In: Kobsa, A., Wahlster, W. (eds) User Modeling in Dialogue Systems. Springer-Verlag Berlin, Heidelberg, 74-107.
Chomsky, N. (1957). Syntactic Structures. Mouton, The Hague/Paris.
Chu-Carroll, J., Brown, M. K. (1998). An evidential model for tracking initiative in collaborative dialogue interactions. User Model. User-Adapted Interact., 8 (3-4), 215-253.
Chu-Carroll, J., Carpenter, B. (1999). Vector-based natural language call routing. Comput. Linguist., 25 (3), 256-262.
Clark, H. H., Schaefer, E. F. (1989). Contributing to discourse. Cogn. Sci., 13, 259-294.
Clark, H. H., Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1-39.
Cohen, P. R., Levesque, H. J. (1990a). Persistence, intention, and commitment. In: Cohen, P. R., Morgan, J., Pollack, M. E. (eds) Intentions in Communication. The MIT Press, Cambridge, MA, 33-69.
Cohen, P. R., Levesque, H. J. (1990b). Rational interaction as the basis for communication. In: Cohen, P. R., Morgan, J., Pollack, M. E. (eds) Intentions in Communication. The MIT Press, Cambridge, MA, 221-255.
Cohen, P. R., Levesque, H. J. (1991). Teamwork. Nous, 25 (4), 487-512.
Cohen, P. R., Morgan, J., Pollack, M. (eds) (1990). Intentions in Communication. MIT Press, Cambridge.
Cohen, P. R., Perrault, C. R. (1979). Elements of plan-based theory of speech acts. Cogn. Sci., 3, 177-212.
Cole, R. A., Mariani, J., Uszkoreit, H., Zaenen, A., Zue, V. (eds) (1996). Survey of the State of the Art in Human Language Technology. Also available at http://www.cse.ogi.edu/CSLU/HLTSurvey/
Core, M. G., Allen, J. F. (1997). Coding dialogs with the DAMSL annotation scheme. In: Working Notes of AAAI Fall Symposium on Communicative Action in Humans and Machines, Boston, MA.
Danieli M., Gerbino E. (1995). Metrics for evaluating dialogue strategies in a spoken lan- guage system. In: Proc. AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation, Stanford University, 34-39.
Dybkjaer, L., Bernsen, N. O., Dybkjaer, H. (1996). Evaluation of spoken dialogue systems. In: Proc. 11th Twente Workshop on Language Technology, Twente.
Erman, L. D., Hayes-Roth, F., Lesser, V. R., Reddy, D. R. (1980). The HEARSAY-II speech understanding system: Integrating knowledge to resolve uncertainty. Comput. Surv., 12 (2), 213-253.
Esposito, A., Campbell, N., Vogel, C., Hussain, A., and Nijholt, A. (Eds.). Development of Multimodal Interfaces: Active Listening and Synchrony. Springer Publishers.
Galliers, J. R. (1989). A theoretical framework for computer models of cooperative dia- logue, acknowledging multi-agent conflict. Technical Report 17.2, Computer Laboratory, University of Cambridge.
Gmytrasiewicz, P. J., Durfee, E. H. (1993). Elements of utilitarian theory of knowledge and action. In: Proc. 12th Int. Joint Conf. on Artificial Intelligence, Chambry, France, 396-402.
Gmytrasiewicz, P. J., Durfee, E. H., Rosenschein, J. S. (1995). Towards rational commu- nicative behavior. In: AAAI Fall Symp. on Embodied Language, AAAI Press, Cambridge, MA.
Goodwin, C. (1981). Conversational Organization: Interaction between Speakers and Hearers. Academic Press, New York.
Gorin, A. L., Riccardi, G., Wright, J. H. (1997). How may i help you? Speech Commun., 23 (1/2), 113-127.
Grice, H. P. (1975). Logic and conversation. In: Cole, P., Morgan, J. L. (eds) Syntax and Semantics. Vol 3: Speech Acts. Academic Press, New York, 41-58.
Grosz, B. J. (1977). The Representation and Use of Focus in Dialogue Understanding. SRI Stanford Research Institute, Stanford, CA.
Grosz, B. J., Hirschberg, J. (1992). Some international characteristics of discourse. Proceedings of the Second International Conference on Spoken Language Processing (ICSLP'92), Banff, Alberta, Canada, 1992, 429-432.
Grosz, B. J., Kraus, S. (1995). Collaborative plans for complex group action. Technical Report TR-20-95, Harvard University, Center for Research in Computing Technology.
Grosz, B. J., Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Comput. Linguist., 12 (3), 175-203.
Grosz, B. J., Sidner, C. L. (1990). Plans for discourse. In: Cohen, P. R., Morgan, J., Pollack, M. E. (eds) Intentions in Communication. The MIT Press. Cambridge, MA, 417-444.
Guinn, C. I. (1996). Mechanisms for mixed-initiative human-computer collaborative dis- course. In: Proc. 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, California, USA, 278-285.
Hasida, K., Den, Y., Nagao, K., Kashioka, H., Sakai, K., Shimazu, A. (1995). Dialeague: A proposal of a context for evaluating natural language dialogue systems. In: Proc. 1st Annual Meeting of the Japanese Natural Language Processing Society, Tokyo, Japan, 309-312 (in Japanese).
Heeman, P. A., Allen, J. F. (1997). International boundaries, speech repairs, and discourse markers: Modelling spoken dialog. In: Proc. 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain.
Hirasawa, J., Nakano, M., Kawabata, T., Aikawa, K. (1999). Effects of system barge- in responses on user impressions. In: Sixth Eur. Conf. on Speech Communication and Technology, Budapest, Hungary, 3, 1391-1394.
Hirschberg, J., Litman, D. (1993). Empirical studies on the disambiguation of cue phrases Comput. Linguist., 19 (3), 501-530.
Hirschberg, J., Nakatani, C. (1998). Acoustic indicators of topic segmentation. In: Proc. Int. Conf. on Spoken Language Processing, Sydney, Australia, 976-979.
Hobbs, J. (1979). Coherence and coreference. Cogn. Sci., 3 (1), 67-90.
Hovy, E. H. (1988). Generating Natural Language under Pragmatic Constraints. Lawrence Erlbaum Associates, Hillsdale, NJ.
Isard, A., McKelvie, D., Cappelli, B., Dybkjaer, L., Evert, S., Fitschen, A., Heid, U., Kipp, M., Klein, M., Mengel, A., Møller, M. B., Reithinger, N. (1998). Specification of workbench architecture. MATE Deliverable D3.1.
Jekat, S., Klein, A., Maier, E., Maleck, I., Mast, M., Quantz, J. (1995). Dialogue acts in VERBMOBIL. Technical Report 65, BMBF Verbmobil Report.
Jokinen, K. (1996). Goal formulation based on communicative principles. In: Proc. 16th Int. Conf. on Computational Linguistics (COLING -96) Copenhagen, Denmark, 598-603.
Jokinen, K. (2009). Constructive Dialogue Modelling -Speech Interaction and Rational Agents. John Wiley, Chichester.
Jokinen, K., Hurtig, T. (2006). User expectations and real experience on a multimodal inter- active system. In: Proc. 9th Int. Conf. on Spoken Language Processing (Interspeech 2006 - ICSLP) Pittsburgh, US.
Jokinen, K., Hurtig, T., Hynnä, K., Kanto, K., Kerminen, A., Kaipainen, M. (2001). Self-organizing dialogue management. In: Isahara, H., Ma, Q. (eds) NLPRS2001 Proc. 2nd Workshop on Natural Language Processing and Neural Networks, Tokyo, Japan, 77-84.
Joshi, A., Webber, B. L., Weischedel, R. M. (1984). Preventing false inferences. In: Proc. 10th In Conf. on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics, 1984, Stanford, California, USA, 34-138.
Jurafsky, D., Shriberg, E., Fox, B., Curl, T. (1998). Lexical, prosodic, and syntactic cues for dialog acts. In: ACL/COLING-98 Workshop on Discourse Relations and Discourse Markers. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics Montreal, Quebec, Canada.
Kearns, M., Isbell, C., Singh, S., Litman, D., Howe, J. (2002). CobotDS: A spoken dia- logue system for chat. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence, Edmonton, Alberta.
Keizer, S., Akker, R. op den, Nijholt, A. (2002). Dialogue act recognition with Bayesian Network for Dutch dialogues. In: Jokien, K., McRoy, S. (eds.) Proc. 3rd SIGDial Workshop on Discourse and Dialogue, Philadelphia, US.
Kerminen, A., Jokinen, K. (2003). Distributed dialogue management. In: Jokinen, K., Gambäck, B., Black, W. J., Catizone, R., Wilks, Y. (eds.) Proc. EACL Workshop on Dialogue Systems: Interaction, Adaptation and Styles of Management. Budapest, Hungary.
Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge. In: Proc. 13th Eur. Conf. on Artificial Intelligence (ECAI).
Kipp, M. (2001). Anvil -A generic annotation tool for multimodal dialogue. In: Proc. 7th Eur. Conf. on Speech Communication and Technology, (Eurospeech), Aalborg, Denmark, 1367-1370.
Koeller, A., Kruijff, G.-J. (2004). Talking robots with LEGO mindstorms. In: Proc. 20th COLING, Geneva.
Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., Den, Y. (1998). An analysis of turn taking and backchannels based on prosodic and syntactic features in Japanese Map Task dialogs. Lang. Speech, 41 (3-4), 295-321.
Krahmer, E., Swerts, M., Theune, M., Weegels, M. (1999). Problem spotting in human- machine interaction. In: Proc. Eurospeech '99, Budapest, Hungary, 3, 1423-1426.
Lemon, O., Bracy, A., Gruenstein, A., Peters, S. (2001). The WITAS multi-modal dialogue system I. In: Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech), Aalborg, Denmark.
Lendvai, P., Bosch, A. van den, Krahmer, E. (2003). Machine learning for shallow interpre- tation of user utterances in spoken dialogue systems. In: Jokinen, K., Gambäck B., Black, W. J., Catizone, R., Wilks, Y. (eds) Proc. ACL Workshop on Dialogue Systems: Interaction, Adaptation and Styles of Management, Budapest, Hungary, 69-78.
Lesh, N., Rich, C., Sidner, C. L. (1998). Using plan recognition in human-computer collaboration. MERL Technical Report.
Levesque, H. J., Cohen, P. R., Nunes, J. H. T. (1990). On acting together. In: Proc. AAAI-90, 94-99. Boston, MA.
Levin, E., Pieraccini, R. (1997). A stochastic model of computer-human interaction for learning dialogue strategies. In: Proc. Eurospeech, 1883-1886, Rhodes, Greece.
Levin, E., Pieraccini, R., Eckert, W. (2000). A stochastic model of human-machine interaction for learning dialog strategies. IEEE Trans. Speech Audio Process., 8, 1.
Levinson, S. (1983). Pragmatics. Cambridge University Press, Cambridge.
Litman, D. J., Allen, J. (1987). A plan recognition model for subdialogues in conversation. Cogn. Sci., 11(2), 163-200.
Litman, D., Kearns, M., Singh, S., Walker, M. (2000). Automatic optimization of dia- logue management. In: Proc. 18th Int. Conf. on Computational Linguistics (COLING 2000) Saarbrcken, Germany, 502-508.
Lopez Cozar, R., Araki, M. (2005). Spoken, multilingual and multimodal dialogue systems. Wiley, New York, NY.
Majaranta, P., Räihä, K. (2002). Twenty years of eye typing: Systems and design issues. In: Proc. 2002 Symp. on Eye Tracking Research & Applications (ETRA 02), ACM, New York, 15-22.
Martin, D., Cheyer, A., Moran, D. (1998). Building distributed software systems with the Open Agent Architecture. In: Proc. 3rd Int. Conf. on the Practical Application of Intelligent Agents and Multi-Agent Technology, Blackpool, UK. The Practical Application Company, Ltd.
McCoy, K. F. (1988). Reasoning on a highlighted user model to respond to misconceptions. Comput. Linguist., 14 (3), 52-63.
McGlashan, S., Fraser, N. M, Gilbert, N., Bilange, E., Heisterkamp, P., Youd, N. J. (1992). Dialogue management for telephone information services. In: Proc. Int. Conf. on Applied Language Processing, Trento, Italy.
McRoy, S. W., Hirst, G. (1995). The repair of speech act misunderstandings by abductive inference. Comput. Linguist., 21 (4), 435-478.
McTear, M. (2004). Spoken Dialogue Technology: Toward the Conversational User Interface. Springer Verlag, London.
Miikkulainen, R. (1993). Sub-symbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and Memory. MIT Press, Cambridge.
Minsky, M. (1974). A Framework for Representing Knowledge. AI Memo 306. M.I.T. Artificial Intelligence Laboratory, Cambridge, MA.
Moore, J. D., Swartout, W. R. (1989). A reactive approach to explanation. In: Proc. 11th Int. Joint Conf. on Artificial Intelligence (IJCAI), Detroit, MI, 20-25.
Motooka, T., Kitsuregawa, M., Moto-Oka, T., Apps, F. D. R. (1985). The Fifth Generation Computer: The Japanese Challenge. Wiley, New York, NY.
Möller, S. (2002). A new taxonomy for the quality of telephone services based on spoken dialogue systems. In: Jokinen, K., McRoy, S. (eds) Proc. 3rd SIGdial Workshop on Discourse and Dialogue, Philadelphia, PA, 142-153.
Nagata, M., Morimoto, T. (1994). First steps towards statistical modeling of dialogue to predict the speech act type of the next utterance. Speech Commun., 15 (3-4), 193-203.
Nakano, M., Miyazaki, N., Hirasawa, J., Dohsaka, K., Kawabata, T. (1999). Understanding unsegmented user utterances in real-time spoken dialogue systems. In: Proc. 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, Maryland, USA, 200-207.
Nakano, M., Miyazaki, N., Yasuda, N., Sugiyama, A., Hirasawa, J., Dohsaka, K., Aikawa, K. (2000). WIT: Toolkit for building robust and real-time spoken dialogue systems. In: Dybkjaer, L., Hasida, K., Traum, D. (eds) Proc. 1st SIGDial workshop on Discourse and Dialouge -Volume 10, Hong Kong, 150-159.
Nakatani, C., Hirschberg, J. (1993). A speech-first model for repair detection and correction. In: Proc. 31st Annual Meeting on Association for Computational Linguistics, Columbus, OH, 46-53.
Nakatani, C., Hirschberg, J., Grosz, B. (1995). Discourse structure in spoken language: Studies on speech corpora. In: Working Notes of the AAAI-95 Spring Symposium on Empirical Methods in Discourse Interpretation, Palo Alto, CA.
Newell, A., Simon, H. (1976). Computer science as empirical inquiry: Symbols and search. Commun. ACM, 19, 113-126.
Nielsen, J. (1994). Heuristic evaluation. In: Nielsen, J., Mack, R. L. (eds) Usability Inspection Methods, Chapter 2, John Wiley, New York.
Norman, D. A., Draper, S. W. (eds) (1986). User Centered System Design: New Perspectives on Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale, NJ.
Paek; T., Pieraccini, R. (2008). Automating spoken dialogue management design using machine learning: an industry perspective. In: McTear, M. F, Jokinen, K., Larson, J. (eds) Evaluating New Methods and Models for Advanced Speech-Based Interactive Systems. Special Issue of Speech Commun., 50 (8-9).
Paris, C. L. (1988). Tailoring object descriptions to a user's level of expertise. Comput. Linguist., 14 (3), 64-78.
Power, R. (1979). Organization of purposeful dialogue. Linguistics, 17, 107-152.
Price, P., Hirschman, L., Shriberg, E., Wade, E. (1992). Subject-based evaluation mea- sures for interactive spoken language systems. In: Proc. Workshop on Speech and Natural Language, Harriman, New York, 34-39.
Reichman, R. (1985). Getting Computers to Talk Like You and Me. Discourse Context, Focus, and Semantics (An ATN Model). The MIT Press, Cambridge, MA.
Reithinger, N., Maier, E. (1995). Utilizing statistical dialogue act processing in Verbmobil. In: Proc. 33rd Annual Meeting of ACL, MIT, Cambridge, US, 116-121.
Ries, K. (1999). HMM and neural network based speech act detection. ICASSP. Also available: citeseer.nj.nec.com/ries99hmm.html
Roy, N., Pineau, J., Thrun, S. (2000). Spoken dialog management for robots. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong.
Rudnicky, A., Thayer, E, Constantinides, P., Tchou, C., Shern, R., Lenzo, K., Xu, W., Oh, A. (1999). Creating natural dialogs in the Carnegie Mellon Communicator System. In: Proc. 6th Eur. Conf. on Speech Communication and Technology (Eurospeech-99), Budapest, 1531-1534.
Sacks, H., Schegloff, E., Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50 (4), 696-735.
Sadek, D., Bretier, P., Panaget, F. (1997). ARTIMIS: Natural dialogue meets rational agency. In: Proc. IJCAI-97, Nagoya, Japan, 1030-1035.
Samuel, K., Carberry, S., Vijay-Shanker, K. (1998). Dialogue act tagging with transformation-based learning. In: Proc. 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (ACL-COLING), Montreal, Quebec, Canada, 1150-1156.
Schank, R. C., Abelson, R. P. (1977). Scripts, Plans, Goals, and Understanding. Lawrence Erlbaum Associates, Hillsdale, NJ.
Schatzmann, J., Weilhammer, K., Stuttle, M. N., Young, S. (2006). A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Eng. Rev., 21 (2), 97-126.
Scheffler, K., Young, S. (2000). Probabilistic simulation of human-machine dialogues. In: Proc. IEEE ICASSP, Istanbul, Turkey, 1217-1220.
Searle, J. R. (1979). Expression and Meaning: Studies in the Theory of Speech Acts. Cambridge University Press, Cambridge.
Seneff, S., Hurley, E., Lau, R., Pao, C., Schmid, P., Zue, V. (1998). GALAXY-II: A refer- ence architecture for conversational system development. In: Proc. 5th Int. Conf. on Spoken Language Processing (ICSLP 98). Sydney, Australia.
Shriberg, E., Bates, R., Taylor, P., Stolcke, A., Jurafsky, D., Ries, K., Coccaro, N., Martin, R., Meteer, M., Van Ess-Dykema, C. (1998). Can prosody aid the automatic classification of dialog acts in conversational speech? Lang. Speech, 41, 3-4, 439-487.
Sinclair, J. M., Coulthard, R. M. (1975). Towards an Analysis of Discourse: The English Used by Teacher and Pupils. Oxford University Press, Oxford.
Smith, R. W. (1998). An evaluation of strategies for selectively verifying utterance meanings in spoken natural language dialog. Int. J. Hum. Comput. Studies, 48, 627-647.
Smith, R. W., Hipp, D. R. (1994). Spoken Natural Language Dialog Systems -A Practical Approach. Oxford University Press, New York, NY.
Stent, A., Dowding, J., Gawron, J. M., Owen-Bratt, E., Moore, R. (1999). The CommandTalk spoken dialogue system. In: Proc. 37th Annual Meeting of the Association for Computational Linguistics, College Park, Maryland, US, 20-26.
Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Bates, R., Jurafsky, D., Taylor, P., Martin, R., Van Ess-Dykema, C., Meteer, M. (2000). Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. Comput. Linguist., 26 (3), 339-373.
Suhm, B., Geutner, P., Kemp, T., Lavie, A., Mayfield, L., McNair, A. E., Rogina, I., Schultz, T., Sloboda, T., Ward, W., Woszczyna, M., Waibel, A. (1995). JANUS: Towards multilingual spoken language translation. In: Proc. ARPA Spoken Language Workshop, Austin, TX.
Swerts, M., Hirschberg, J., Litman, D. (2000). Correction in spoken dialogue systems. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP-2000), Beijing, China, 615-618.
Takezawa, T., Morimoto, T., Sagisaka, Y., Campbell, N., Iida, H., Sugaya, F., Yokoo, A., Yamamoto, S. (1998). A Japanese-to-English speech translation system: ATR-MATRIX. In: Proc. (ICSLP-98), Sydney, Australia, 957-960.
Traum, D. R. (2000). 20 questions on dialogue act taxonomies. J. Semantics, 17, 7-30.
Traum, D. R., Allen, J. F. (1994). Discourse obligations in dialogue processing. In: Proc. 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, USA, 1-8.
Traum, D., Roque, A., Leuski, A., Georgiou, P., Gerten, J., Martinovski, B., Narayanan, S., Robinson, S., Vaswani Hassan, A. (2007). A virtual human for tactical questioning. In: Proc. 8th SIGDial Workshop on Discourse and Dialogue, Antwerp, Belgium, 71-74.
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 49, 433-460.
Wahlster, W. (ed) (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin.
Wahlster, W., Marburger, H., Jameson, A., Busemann, S. (1983). Overanswering yes-no Questions: Extended responses in a NL interface to a vision system. In: Proc. 8th Int. Joint Conf. on Artificial Intelligence (IJCAI 83), Karlsruhe, 643-646.
Walker, M. A., Fromer, J. C., Narayanan, S. (1998). Learning optimal dialogue strate- gies: A case study of a spoken dialogue agent for email. In: Proc. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics Montreal, Quebec, Canada.
Walker, M. A., Hindle, D., Fromer, J., Di Fabbrizio, G., Mestel, G. (1997a). Evaluating competing agent strategies for a voice email agent. In: Proc. 5th Eur. Conf. on Speech Communication and Technology. (Eurospeech 97), Rhodes, Greece.
Walker, M. A., Litman, D. J., Kamm, C. A., Abella, A. (1997b). Evaluating spoken dialogue agents with PARADISE: Two case studies. Comput. Speech Lang., 12 (3), 317-347.
Wallace, M. D., Anderson, T. J. (1993). Approaches to interface design. Interacting Comput., 5 (3), 259-278.
Ward, N., Tsukahara, W. (2000). Prosodic features which cue back-channel responses in English and Japanese. J. Pragmatics, 23, 1177-1207.
Weinschenk, S., Barker, D. (2000). Designing Effective Speech Interfaces. Wiley, London.
Weiser, M. (1991). The computer for the twenty-first century. Sci. Am., September 1991 (Special Issue: Communications, Computers and Networks), 265(3), 94-104.
Weizenbaum, J. (1966). ELIZA -A computer program for the study of natural language communication between man and machine. Commun. ACM, 9, 36-45.
Wermter, S., Weber, V. (1997). SCREEN: Learning a flat syntactic and semantic spoken language analysis using artificial neural networks. J. Artif. Intell. Res., 6 (1), 35-85.
Williams, J. D., Young, S. J. (2007). Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang., 21 (2), 231-422.
Winograd, T. (1972). Understanding Natural Language. Academic Press, New York.
Woods, W. A., Kaplan, R. N., Webber, B. N. (1972). The lunar sciences natural language information system: Final Report. BBN Report 2378, Bolt Beranek and Newman Inc., Cambridge, MA.
Yankelovich, N. (1996). How do users know what to say? Interactions, 3 (6), 32-43.
Young, S. L., Hauptmann, A. G., Ward, W. H., Smith, E. T., Werner, P. (1989). High-level knowledge sources in usable speech recognition systems, Commun. ACM, 32 (2), 183-194.
Zock, M., Sabah, G. (eds) (1988). Advances in Natural Language Generation: An Interdisciplinary Perspective. Pinter Publishers, London.
References
Davis, K., Biddulph, R., Balashek, S. (1952). Automatic recognition of spoken digits. Soc. Am., 637-642.
Flanagan, J. L., Levinson, S. E., Rabiner, L. R., Rosenberg, A. E. (1980). Techniques for expanding the capabilities of practical speech recognizers. In: Trends in Speech Recognition, Prentice Hall, Englewood Cliffs, NJ.
Price, P., Fisher, W. M., Bernstein, J., Pallet, D. S. (1988). The DARPA 1000 word Resource Management database for continuous speech recognition. In: IEEE Conf. on Acoustics Speech and Signal Processing.
Hirschmann, L. (1992). Multi-site data collection for a spoken language corpus. In: Proc. 5th DARPA Speech and Natural Language Workshop. Defense Advanced Research Projects Agency.
Walker, M., Rudnicky, A., Aberdeen, J., Bratt, E. O., Carofolo, J., Hastie, H., Le Audrey Pellom, B., Potamianos, A., Passonneau, R., Prasad, R., Roukos, S., Sanders, G., Seneff, S., Stallard, D. (2002). DARPA communicator: Cross system results for the 2001 evaluation. In: ICSLP 2002.
Barnard, E., Halberstadt, A., Kotelly, C., Phillips, M. (1999). A consistent approach to designing spoken-dialogue systems. In: IEEE Workshop. Keystone, CO.
Zue, V. (1997). Conversational interfaces: Advances and challenges. In: Eurospeech 97. Rhodes, Greece.
Pieraccini, R., Huerta, J. (2005). Where do we go from here? Research and commercial spoken dialogue systems. SIGdial, 1-10.
Gorin, A. L" Riccardi, G., Wright, J. H. (1997). How may i help you? Speech Commun., 113-127.
Chu-Carroll, J., Carpenter, B. (1999). Vector-based natural language call routing. Comput. Linguist., 361-388.
Oviatt, S. L. (1995). Predicting spoken disfluencies during human-computer interaction. Comp. Speech Lang., 19-35.
Voice Extensible Markup Language (VoiceXML) 2.1. (2005). W3C Candidate Recommendation 13 June 2005.
Media Resource Control Protocol (MRCP) Introduction.
Standard ECMA-262 (1999). ECMAScript Language Specification, 3rd Edition.
Speech Recognition Grammar Specification Version 1.0. (2004). W3C Recommendation.
Voice Browser Call Control: CCXML Version 1.0. (2005). W3C Working Draft.
Speech Synthesis Markup Language (SSML), Version 1.0. (2004).
State Chart XML (SCXML) State Machine Notation for Control Abstraction. (2006). W3C Working Draft.
Harel, D., Politi, M. (1998). Modeling Reactive Systems with Statecharts: The STATEMATE Approach. McGraw-Hill, New York, NY.
O'Reilly, T. (2004). What is Web 2.0. In: Design Patterns and Business Models for the Next Generation of Software. W3C Recommendation.
Acomb, K., Bloom, J., Dayanidhi, K., Hunter, P., Krogh, P., Levin, E., Pieraccini, R. (2007). Technical support dialog systems, issues, problems, and solutions. In: Bridging the Gap, Academic and Industrial Research in Dialog Technology. Rochester, NY.
DePaulo, B. M., Lindsay, J. J., Malone, B. E., Muhlenbruck, L., Charlton, K., Cooper, H. (2003). Cues to deception. Psychol. Bull., 129, 74-118.
Frank, M. G., Feeley, T. H. (2003). To catch a liar: Challenges for research in lie detection training. J. Appl. Commun. Res., 31(1), 58-75.
Vrij, A. (1994). The impact of information and setting on detection of deception by police detectives. J. Nonverbal Behav., 18(2), 117-136.
Aamodt, M. G., Mitchell, H. (2004). Who can best detect deception: A meta-analysis. Paper presented at the Annual Meeting of the Society for Police and Criminal Psychology, Rome.
Ekman, P., Friesen, W. V. (1976). Pictures of Facial Affect. Consulting Psychologists Press, Palo Alto, CA.
Burgoon, J. K., Buller, D. B. (1994). Interpersonal deception: iii. Effects of deceit on perceived communication and nonverbal behavior dynamics. J. Nonverbal Behav., 18(2), 155-184.
Horvath, F. (1973). Verbal and nonverbal clues to truth and deception during poligraph examinations. J. Police Sci. Admin., 1(2), 138-152.
Haddad, D., Ratley, R. (2002). Investigation and evaluation of voice stress analysis technology. Technical report, National Criminal Justice Reference Service.
Hopkins, C. S., Ratley, R. J., Benincasa, D. S., Grieco, J. J. (2005). Evaluation of voice stress analysis technology. In: Proc. 38th Hawaii Int. Conf. on System Sciences, Hilton Waikoloa Village Island of Hawaii.
Cowie, R., Douglas-Cowie, E., Campbell, N., eds. (2003). Speech communication: Special issue on speech and emotion.
Lee, C. M., Narayanan, S., Pieraccini, R. (2002). Combining acoustic and language infor- mation for emotion recognition. In: Proc. Int. Conf. on Spoken Language Processing 2002, Denver, 873-876.
Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A. (2002). Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proc. Int. Conf. on Spoken Language Processing, Denver, 2037-2039.
Batliner, A., Fischer, R., Huber, R., Spilker, J., Nöth, E. (2003). How to find trouble in communication. Speech Commun., 40(1-2), 117-143.
Litman, D., Forbes-Riley, K. (2004). Predicting student emotions in computer-human dia- logues. In: Proc. ACL-2004, Barcelona.
Liscombe, J., Venditti, J., Hirschberg, J. (2005). Detecting certainness in spoken tutorial dialogues. In: Proc. INTERSPEECH 2005, Lisbon.
Hirschberg, J., Benus, S., Brenier, J. M., Enos, F., Friedman, S., Gilman, S., Girand, C., Graciarena, M., Kathol, A., Michaelis, L., Pellom, B., Shriberg, E., Stolcke, A. (2005). Distinguishing deceptive from non-deceptive speech. In: Proc. INTERSPEECH 2005, Lisbon.
Liu, X. (2005). Voice stress analysis: Detection of deception. Master's Thesis at the University of Sheffield, Department of Computer Science.
Fadden, L. (2006). The prosody of suspects' responses during police interviews. In: Speech Prosody 2006, Dresden.
Enos, F., Benus, S., Cautin, R. L., Graciarena, M., Hirschberg, J., Shriberg, E. (2006). Personality factors in human deception detection: Comparing human to machine performance. In: Proc. INTERSPEECH 2006, Pittsburgh, PA.
Graciarena, M., Shriberg, E., Stolcke, A., Enos, F., Hirschberg, J., Kajarekar, S., Haddad, D., Ratley, R. (2006). Combining prosodic lexical and cepstral systems for deceptive speech detection Investigation and evaluation of voice stress analysis technology. In: Proc. ICASSP- 2006 toulouse.
Ekman, P. (1992). Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage. Norton, New York, NY.
Frank, M. G. Ekman, P. (1997). The ability to detect deceit generalizes across different types of high stake lies. J. Personality Social Psychol., 72, 1429-1439.
Mehrabian, A. (1971). Nonverbal betrayal of feeling. J. Exp. Res. Personality, 5, 64-73.
Harrison, A. A., Hwalek, M., Raney, D. F., Fritz, J. G. (1978). Cues to deception in an interview situation. Social Psychol., 41, 156-161.
Baskett, G. D. a R. O. F. (1974). Aspects of language pragmatics and the social perception of lying. J. Psycholinguist. Res., 117-130.
Vrij, A., Edward, K., Roberts, K. P., Bull, R. (2000). Detecting deceit via analysis of verbal and nonverbal behavior. J. Nonverbal Behav., 24(4), 239-263.
Gozna, L. F., Babooram, N. (2004). Nontraditional interviews: Deception in a simulated cus- toms baggage search. Paper presented at the 14th European Conference of Psychology and Law, Krakow, Poland, July 7-10.
Ekman, P., Friesen, W. V., Scherer, K. R. (1976). Body movement and voice pitch in deceptive interaction. Semiotica, 16(1), 23-77.
Streeter, L. A., Krauss, R. M., Geller, V., Olson, C., Apple, W. (1977). Pitch changes during attempted deception. J. Personality Social Psychol., 35(5), 345-350.
Benus, S., Enos, F., Hirschberg, J., Shriberg, E. (2006). Pauses in deceptive speech. In: Speech Prosody 2006, Dresden.
Wiener, M. Mehrabian, A. (1968). Language within Language: Immediacy, a Channel in Verbal Communication. Appleton-Century-Crofts, New York, NY.
Zuckerman, M., DePaulo, B. M., Rosenthal, R. (1981). Verbal and Nonverbal Communication of Deception. Academic Press, New York, NY, 1-59.
Zaparniuk, J., Yuille, J. C., Taylor, S. (1995). Assessing the credibility of true and false statements. Int. J. Law Psychiatry, 18, 343-352.
Steller, M. Koehnken, G. (1989). Criteria Based Content Analysis. Springer-Verlag, New York, NY, 217-245.
Masip, J., Sporer, S. L., Garrido, E., Herrero, C. (2005). The detection of deception with the Reality Monitoring approach: A review of the empirical evidence. Psychology, Crime, Law, 11(1), 99-122.
Reid, J. E. and Associates (2000). The Reid Technique of Interviewing and Interrogation. Chicago: John E. Reid and Associates, Inc.
Adams, S. H. (1996). Statement analysis: What do suspects' words really reveal? FBI Law Enforcement Bull. October, 1996.
Pennebaker, J. W., Francis, M. E., Booth, R. J. (2001). Linguistic Inquiry and Word Count. Erlbaum Publishers, Mahwah, NJ.
Newman, M. L., Pennebaker, J. W., Berry, D. S., Richards, J. M. (2003). Lying words: Predicting deception from linguistic style. Personality Social Psychol. Bull., 29, 665-675.
Qin, T., Burgoon, J. K., Nunamaker, J. F. (2004). An exploratory study on promising cues in deception detection and application of decision tree. In: Proc. 37th Annual Hawaii Int. Conf. on System Sciences, 23-32. Big Island, Hawaii, USA.
NIST (2004). Fall 2004 rich transcription (rt-04f) evaluation plan. References
Moore, R. K. (2005). Research challenges in the automation of spoken language interaction. In: Proc. COST278 and ISCA Tutorial and Research Workshop on Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005): Aalborg University, Denmark, 10-11.
Huang, X. D. (2002). Making speech mainstream. Microsoft Speech Technologies Group.
Henton, C. (2002). Fiction and reality of TTS, Speech Technology Magazine 7(1).
Moore, R. K. (2003). A comparison of the data requirements of automatic speech recognition systems and human listeners. In: Proc. EUROSPEECH'03, Geneva, Switzerland, September 1-4, 2582-2584.
Gorin, A., Riccardi, G., Wright, J. (1997). How may I help you? Speech Commun., 23, 113-127.
Young, S. J. (2006). Using POMDPs for dialog management. In: Proc. IEEE/ACL Workshop on Spoken Language Technology, Aruba Marriott, Palm Beach, Aruba, December 10-13, 8-13.
Maslow, A. H. (1943). A theory of human motivation. Psychol. Rev., 50, 370-396.
Scherer, K. R., Schorr, A., Johnstone, T. (2001). Appraisal Processes in Emotion: Theory, Methods, Research. Oxford University Press, New York and Oxford.
Broadbent, D. E. (1958). Perception and Communication. Pergamon Press, London.
Toates, F. (2006). A model of the hierarchy of behaviour, cognition and consciousness. Consciousness Cogn., 15, 75-118.
Brunswik, E. (1952). The conceptual framework of psychology. International Encyclopaedia of Unified Science, vol. 1, University of Chicago Press, Chicago.
Figueredo, A. J., Hammond, K. R., McKierman, E. C. (2006). A Brunswikian evolutionary developmental theory of preparedness and plasticity. Intelligence, 34, 211-227.
Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Commun., 40, 227-256.
Rizzolatti, G., Craighero, L. (2004). The mirror-neuron system. Annu. Rev. Neurosci., 27, 169-192.
Powers, W. T. (1973). Behaviour: The Control of Perception. Aldine, Hawthorne, NY.
Wilson, M., Knoblich, G. (2005). The case for motor involvement in perceiving conspecifics. Psychol. Bull., 131, 460-473.
Becchio, C., Adenzato, M., Bara, B. G. (2006). How the brain understands intention: Different neural circuits identify the componential features of motor and prior intentions. Consciousness Cogn., 15, 64-74.
Grush, R. (2004). The emulation theory of representation: Motor control, imagery, and perception. Behav. Brain Sci., 27, 377-442.
Hawkins, J. (2004). On Intelligence. Times Books, New York, NY.
Lexandrov, Y. I., Sams, M. E. (2005). Emotion and consciousness: End of a continuum. Cogn. Brain Res., 25, 387-405.
Taylor, M. M. (1992). Strategies for speech recognition and understanding using layered pro- tocols. Speech Recognition and Understanding -Recent Advances. NATO ASI Series F75, Springer-Verlag, Berlin, Heidelberg.
Gerdes, V. G. J., Happee, R. (1994). The use of an internal representation in fast goal-directed movements: A modeling approach. Biol. Cybernet., 70, 513-524.
Wilson, S. M., Saygin, A. P., Sereno, M. I., Iacoboni, M. (2004). Listening to speech activates motor areas involved in speech production. Nat. Neurosci., 7, 701-702.
Gopnik, A., Meltzoff, A. N., Kuhl, P. K. (2001). The Scientist in the Crib. Perennial, New York, NY.
Kuhl, P. K. (2004). Early language acquisition: Cracking the speech code. Nat. Rev.: Neurosci., 5, 831-843.
Cowley, S. J. (2004). Simulating others: The basis of human cognition. Lang. Sci., 26, 273-299.
Weber, C., Wermter, S., Elshaw, M. (2006). A hybrid generative and predictive model of the motor cortex. Neural Netw., 19, 339-353.
Mountcastle, V. B. (1978). An organizing principle for cerebral function: The unit model and the distributed system. In: Edelman, G. M., Mountcastle, V. B. (eds) The Mindful Brain, MIT Press, Cambridge, MA.
Hawkins, J., George, D. (2006). Hierarchical Temporal Memory: Concepts, Theory, and Terminology. Numenta Inc., Redwood City, CA.
Chartrand, T. L., Bargh, J. A. (1999). The chameleon effect: The perception-behavior link and social interaction. Social Psychol., 76, 893-910.
Meltzoff, M., Moore, K. (1997). Explaining facial imitation: A theoretical model. Early Dev. Parenting, 6, 179-192.
Brass, M., Bekkering, H., Wohlschlager, A., Prinz, W. (2000). Compatibility between observed and executed finger movements: Comparing symbolic, spatial, and imitative cues. Brain Cogn., 44, 124-143.
Kerzel, D., Bekkering, H. (2000). Motor activation from visible speech: Evidence from stimulus response compatibility. J. Exp. Psychol. [Hum. Percept.], 26, 634-647.
Rizzolatti, G., Fadiga, L., Gallese, V., Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Res., 3, 131-141.
Iacoboni, M., Molnar-Szakacs, I., Gallesse, V., Buccino, G., Mazziotta, J. C., Rizzolatti, G. (2005). Grasping the intentions of others with one's own mirror system. PLoS Biol., 3, 529-535.
Gallese, V., Keysers, C., Rizzolatti, G. (2004). A unifying view of the basis of social cognition. Trends Cogn. Sci., 8(9), 396-403.
Baron-Cohen, S., Leslie, A. M., Frith, U. (1985). Does the autistic child have a "theory of mind"? Cognition, 21, 37-46.
Baron-Cohen, S. (1997). Mindblindness: Essay on Autism and the Theory of Mind. MIT Press, Cambridge, MA.
Kohler, E., Keysers, C., Umilta, M. A., Fogassi, L., Gallese, V., Rizzolatti, G. (2002). Hearing sounds, understanding actions: Action representation in mirror neurons. Science, 297, 846-848.
Pulvermüller, F. (2005). Brain mechanisms linking language and action. Nat. Neurosci. Rev., 6, 576-582.
Rizzolatti, G., Arbib, M. A. (1998). Language within our grasp. Trends Neurosci., 21, 188-194.
Pacherie, E., Dokic, J. (2006). From mirror neurons to joint actions. Cogn. Syst. Res., 7, 101-112.
Studdart-Kennedy, M. (2002). Mirror neurons, vocal imitation, and the evolution of partic- ulate speech. In: Mirror Neurons and the Evolution of Brain and Language. M.I. Stamenov, V. Gallese (Eds.), Philadelphia: Benjamins, 207-227.
Arbib, M. A. (2005). From monkey-like action recognition to human language: An evolution- ary framework for neurolinguists. Behav. Brain Sci., 28, 105-167.
Aboitiz, F., Garcia, R. R., Bosman, C., Brunetti, E. (2006). Cortical memory mechanisms and language origins. Brain Lang., 40-56.
Newell, A. (1990). Unified Theories of Cognition. Harvard University Press, Cambridge, MA.
Rosenbloom, P. S., Laird, J. E., Newell, A. (1993). The SOAR Papers: Research on Integrated Intelligence. MIT Press, Cambridge, MA.
Anderson, J. R. (1996). ACT: A simple theory of complex cognition. American Psychol., 51(4), 355-365.
Bratman, M. E. (1987). Intention, Plans, and Practical Reason, Harvard University Press, Cambridge, MA.
Rao, A., Georgoff, M. (1995). BDI agents: From theory to practice. Technical Report TR-56. Australian Artificial Intelligence Institute, Melbourne.
Winograd, T. (2006). Shifting viewpoints: Artificial intelligence and human-computer inter- action. Artif. Intell., 170, 1256-1258.
Brooks, R. A. (1991). Intelligence without representation. Artif. Intell., 47, 139-159.
Brooks, R. A. (1991). Intelligence without reason. In: Proc. 12th Int. Joint Conf. on Artificial Intelligence, Sydney, Australia, 569-595.
Brooks, R. A. (1986). A robust layered control system for a mobile robot. IEEE J. Rob. Autom. 2, 4-23.
Prescott, T. J., Redgrave, P., Gurney, K. (1999). Layered control architectures in robots and vertebrates. Adaptive Behav., 7, 99-127.
Roy, D., Reiter E. (2005). Connecting language to the world. Artif. Intell., 167, 1-12.
Roy, D. K., Pentland, A. P. (2002). Learning words from sights and sounds: A computational model. Cogn. Sci., 26, 113-146.
Roy, D. (2005). Semiotic schemas: A framework for grounding language in action and perception. Artif. Intell., 167, 170-205.
Wang, Y. (2003). Cognitive informatics: A new transdisciplinary research field. Brain Mind, 4, 115-127.
Wang, Y. (2003). On cognitive informatics. Brain Mind, 4, 151-167.
Moore, R. K. (2005). Cognitive informatics: The future of spoken language processing? In: Proc. SPECOM -10th Int. Conf. on Speech and Computer, Patras, Greece, October 17-19.
Moore, R. K. (2007). Spoken language processing: Piecing together the puzzle. J. Speech Commun. 49:418-43.
Moore, R. K. (2005). Towards a unified theory of spoken language processing. In: Proc. 4th IEEE Int. Conf. on Cognitive Informatics, Irvine, CA, USA, 8-10 August, 167-172.
The Japan Science & Technology Agency. (2000-2005). Core Research for Evolutional Science & Technology.
Campbell, N. (2007). On the use of nonverbal speech sounds in human communication. In: Verbal and Nonverbal Communication Behaviors, Berlin, Heidelberg, Springer, 2007, LNAI Vol. 4775, 117-128.
Campbell, N., Mokhtari, P. (2003). Voice quality is the 4th prosodic parameter. In: Proc. 15th ICPhS, Barcelona, 203-206.
Alku, P., Bäckström, T., Vilkman, E. (2002). Normalized amplitude quotient for parametriza- tion of the glottal flow. J Acoust Soc Am, 112(2), 701-710.
Hanson, H. M. (1995). Glottal characteristics of female speakers. Ph.D. dissertation, Harvard University.
Cahn, J. (1989). The generation of affect in synthesised speech. J. Am. Voice I/O Soc., 8, 251-256. SSML, The Speech Synthesis Markup Language, www.w3.org/TR/speech synthesis/
Campbell, N. (2005). Getting to the heart of the matter; speech as expression of affect rather than just text or language, Lang. Res. Eval., 39 (1), 109-118.
Calzolari, N. (2006). Introduction of the Conference Chair. In: Proc. 5th Int. Conf. on Language Resources and Evaluation, Genoa, I-IV.
ICSI meeting corpus web page, http://www.icsi.berkeley.edu/speech/mr. As of May 2010.
AMI: Augmented Multi-party Interaction (http://www.amiproject.org). As of May 2010. References
Noma, T., Zhao, L., Badler, N. I. (2000). Design of a virtual human presenter. IEEE Comput. Graphics Appl., 20, 79-85.
André, E., Rist, T., Müller, J. (1999). Employing AI methods to control the behavior of animated interface agents. Appl, Artif, Intell, 13, 415-448.
André, E., Concepcion, K., Mani, I., van Guilder, L. (2005). Autobriefer: A system for authoring narrated briefings. In: Stock, O., Zancanaro, M., (eds) Multimodal Intelligent Information Presentation. Springer, Berlin, 143-158.
Weizenbaum, J. (1967). Contextual understanding by computers. Commun. ACM, 10, 474-480.
Gustafson, J., Lindberg, N., Lundeberg, M. (1999). The August spoken dialog system. In: Proc. Eurospeech'99, Budapest, Hungary.
Cassell, J., Nakano, Y. I., Bickmore, T. W., Sidner, C. L., Rich, C. (2001). Non-verbal cues for discourse structure. ACL, 106-115.
Pelachaud, C., Carofiglio, V., Carolis, B. D., de Rosis, F., Poggi, I. (2002). Embodied contex- tual agent in information delivering application. In: AAMAS '02: Proc. 1st Int. Joint Conf. on Autonomous Agents and Multiagent Systems, ACM Press, New York, NY, 758-765.
Kopp, S., Jung, B., Leßmann, N., Wachsmuth, I. (2003). Max -A multimodal assistant in virtual reality construction. Künstliche Intelligenz, 4(3), 11-17.
Wahlster, W. (2003). Towards symmetric multimodality: Fusion and fission of speech, gesture, facial expression. KI, 1-18.
André, E., Rist, T., van Mulken, S., Klesen, M., Baldes, S. (2000). The automated design of believable dialogues for animated presentation teams. In: Cassell, J., Prevost, S., Sullivan, J., Churchill, E. (eds) Embodied Conversational Agents. MIT Press, Cambridge, MA, 220-255.
Prendinger, H., Ishizuka, M. (2001). Social role awareness in animated agents. In: AGENTS '01: Proc. 5th Int. Conf. on Autonomous Agents, ACM Press, New York, NY, 270-277.
Pynadath, D. V., Marsella, S. (2005). Psychsim: Modeling theory of mind with decision- theoretic agents. IJCAI, 1181-1186.
Rehm, M., André, E., Nischt, M. (2005). Let's come together -Social navigation behaviors of virtual and real humans. INTETAIN, 124-133.
Traum, D., Rickel, J. (2002). Embodied agents for multi-party dialogue in immersive virtual worlds. In: AAMAS '02: Proc. 1st Int. Joint Conf. on Autonomous Agents and Multiagent Systems, ACM Press, New York, NY, 766-773.
Rickel, J., Johnson, W. L. (1999). Animated agents for procedural training in virtual reality: Perception, cognition, and motor control. Appl. Artif. Intell., 13, 343-382.
Gebhard, P., Kipp, M., Klesen, M., Rist, T. (2003). Authoring scenes for adaptive, interac- tive performances. In: AAMAS '03: Proc. 2nd Int. Joint Conf. on Autonomous Agents and Multiagent Systems, ACM Press, New York, NY, 725-732.
Laurel, B. (1993). Computers as Theatre. Addison Wesley, Boston, MA, USA.
Paiva, A., Dias, J., Sobral, D., Aylett, R., Sobreperez, P., Woods, S., Zoll, C., Hall, L. (2004). Caring for agents and agents that care: Building empathic relations with synthetic agents. In: AAMAS '04: Proc. 3rd Int. Joint Conf. on Autonomous Agents and Multiagent Systems, IEEE Computer Society, Washington, DC, USA, 194-201.
Isbister, K., Nakanishi, H., Ishida, T., Nass, C. (2000). Helper agent: Designing an assistant for human-human interaction in a virtual meeting space. In: CHI '00: Proc. SIGCHI Conf. on Human Factors in Computing Systems, ACM Press, New York, NY, 57-64.
Rist, T., André, E., Baldes, S. (2003). A flexible platform for building applications with life- like characters. In: IUI '03: Proc. 8th Int. Conf. on Intelligent User Interfaces, ACM Press, New York, NY, 158-168.
Cassell, J., Vilhjálmsson, H. H., Bickmore, T. W. (2001). BEAT: the Behavior Expression Animation Toolkit. SIGGRAPH, 477-486.
Larsson, S., Traum, D. R. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Nat. Lang. Eng., 6, 323-340.
Rich, C., Sidner, C. (1998). Collagen -A collaboration manager for software interface agents. User Model. User-Adapted Interact., 8, 315-350.
Rickel, J., Lesh, N., Rich, C., Sidner, C. L., Gertner, A. S. (2002). Collaborative discourse theory as a foundation for tutorial dialogue. Intell. Tutoring Syst., 542-551.
Sidner, C. L., Lee, C., Kidd, C. D., Lesh, N., Rich, C. (2005). Explorations in engagement for humans and robots. Artif. Intell., 166, 140-164.
Jan, D., Traum, D. R. (2005). Dialog simulation for background characters. In: Int. Conf. on Intelligent Virtual Agents, Kos, Greece, 65-74.
Bales, R. F. (1951). Interaction Process Analysis. Chicago University Press, Chicago.
Guye-Vuillième, A., Thalmann, D. (2001). A high level architecture for believable social agents. Virtual Reality J., 5, 95-106.
Prada, R., Paiva, A. (2005). Intelligent virtual agents in collaborative scenarios. In: Int. Conf. on Intelligent Virtual Agents, Kos, Greece, 317-328.
Poggi, I. (2003). Mind markers. In: Rector, I. Poggi, N. T. (ed) Gestures. Meaning and Use. University Fernando Pessoa Press, Oporto, Portugal.
Chovil, N. (1991). Social determinants of facial displays. J.Nonverbal Behav., 15. 141-154.
Condon, W., Osgton, W. (1971). Speech and body motion synchrony of the speaker-hearer. In: Horton, D., Jenkins, J. (eds) The Perception of Language. Academic Press, New York, NY, 150-184.
Kendon, A. (1974). Movement coordination in social interaction: Some examples described. In: Weitz, S. (ed) Nonverbal Communication. Oxford University Press, Oxford.
Scheflen, A. (1964). The significance of posture in communication systems. Psychiatry, 27, 316-331.
Ekman, P. (1979). About brows: Emotional and conversational signals. In: von Cranach, M., Foppa, K., Lepenies, W., Ploog, D. (eds) Human Ethology: Claims and Limits of a New Discipline: Contributions to the Colloquium. Cambridge University Press, Cambridge, England; New York, 169-248.
Cavé, C., Guaitella, I., Bertrand, R., Santi, S., Harlay, F., Espesser, R. (1996). About the relationship between eyebrow movements and f0-variations. In: Proc. ICSLP'96: 4th Int. Conf. on Spoken Language Processing, Philadelphia, PA.
Krahmer, E., Swerts, M. (2004). More about brows. In: Ruttkay, Z., Pelachaud, C. (eds) From Brows till Trust: Evaluating Embodied Conversational Agents. Kluwer, Dordrecht.
McNeill, D. (1992) Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago.
Knapp, M., Hall, J. (1997). Nonverbal Communication in Human Interaction, Fourth edition. Harcourt Brace, Fort Worth, TX.
Pelachaud, C., Bilvi, M. (2003). Computational model of believable conversational agents. In: Huget, M. P. (ed) Communication in Multiagent Systems. Volume 2650 of Lecture Notes in Computer Science. Springer, Berlin, 300-317.
Pelachaud, C. (2005). Multimodal Expressive Embodied Conversational Agent. ACM Multimedia, Brave New Topics session, Singapore.
DeCarolis, B., Pelachaud, C., Poggi, I., Steedman, M. (2004). APML, a mark-up lan- guage for believable behavior generation. In: Prendinger, H., Ishizuka, M. (eds) Life-Like Characters. Tools, Affective Functions and Applications. Springer, Berlin, 65-85.
Cassell, J., Bickmore, J., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H., Yan, H. (1999). Embodiment in conversational interfaces: Rea. CHI'99, Pittsburgh, PA, 520-527.
Kopp, S., Wachsmuth, I. (2004). Synthesizing multimodal utterances for conversational agents. J. Comput. Anim. Virtual Worlds, 15, 39-52.
Kopp, S., Gesellensetter, L., Krämer, N. C. (2005). Wachsmuth, I.: A conversational agent as museum guide -Design and evaluation of a real-world application. In: Int. Conf. on Intelligent Virtual Agents, Kos, Greece, 329-343.
Heylen, D. (2005). Challenges ahead. Head movements and other social acts in conversa- tion. In: AISB -Social Presence Cues Symposium. University of Hertfordshire, Hatfield, England.
Ortony, A., Clore, G., Collins, A. (1988). The Cognitive Structure of Emotions. Cambridge University Press, Cambridge.
Scherer, K. (2000). Emotion. In: Hewstone, M., Stroebe, W. (eds) Introduction to Social Psychology: A European Perspective. Oxford University Press, Oxford, 151-191.
Ekman, P. (2003). The Face Revealed. Weidenfeld & Nicolson, London.
DeCarolis, B., Carofiglio, V., Bilvi, M., Pelachaud, C. (2002). APML, a mark-up language for believable behavior generation. In: Embodied Conversational Agents -Let's Specify and Evaluate Them! Proc. AAMAS'02 Workshop, Bologna, Italy.
Ball, G., Breese, J. (2000). Emotion and personality in a conversational agent. In: Cassell, J., Sullivan, S. P., Churchill, E. (eds) Embodied Conversational Characters. MIT Press, Cambridge, MA, 189-219.
Tanguy, E., Bryson, J. J., Willis, P. J. (2006). A dynamic emotion representation model within a facial animation system. Int. J. Humanoid Robotics, 3, 293-300.
Pandzic, I., Forchheimer, R. (2002). MPEG4 Facial Animation -The Standard, Implementations and Applications. Wiley, New York, NY.
deRosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., Carolis, B. D. (2003). From Greta's mind to her face: Modelling the dynamics of affective states in a conversational embodied agent. Int. J. Hum. Comput. Studies, Special Issue on Applications of Affective Computing in HCI, 59, 81-118.
Bui, T. D. (2004). Creating emotions and facial expressions for embodied agents. PhD thesis, University of Twente, Department of Computer Science, Enschede.
Tsapatsoulis, N., Raouzaiou, A., Kollias, S., Cowie, R., Douglas-Cowie, E. (2002). Emotion recognition and synthesis based on MPEG-4 FAPs in MPEG-4 facial animation. In: Pandzic, I. S., Forcheimer, R. (eds) MPEG4 Facial Animation -The Standard, Implementations and Applications. Wiley, New York, NY.
Albrecht, I., Schroeder, M., Haber, J., Seidel, H. P. (2005). Mixed feelings -expression of nonbasic emotions in a muscle-based talking head. Virtual Reality -Special Issue on Language, Speech and Gesture for VR, 8(4).
Whissel, C. M. (1989). The dictionary of affect in language. In: Plutchnik, R., Kellerman, H. (eds) The measurement of Emotions. Volume Emotion: Theory, Research and Experience: Vol. 4. Academic Press, New York.
Plutchnik, R. (1980). Emotion: A Psychoevolutionary Synthesis. Harper and Row, New York, NY.
Ruttkay, Z., Noot, H., ten Hagen, P. (2003). Emotion disc and emotion squares: Tools to explore the facial expression face. Comput. Graph. Forum, 22, 49-53.
Schlosberg, H. A. (1952). A description of facial expressions in terms of two dimensions. J. Exp. Psychol., 44, 229-237.
Ekman, P., Friesen, W. (1975). Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues. Prentice-Hall, Inc, Englewood Cliffs, NJ.
Rehm, M., André, E. (2005). Catch me if you can: Exploring lying agents in social set- tings. AAMAS, Proceedings of the 4th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), Utrecht, Netherlands, ACM: New York, USA, 937-944.
Ochs, M., Niewiadomski, R., Pelachaud, C., Sadek, D. (2005). Intelligent expressions of emotions. In: 1st Int. Conf. on Affective Computing and Intelligent Interaction ACII, China.
Martin, J. C., Niewiadomski, R., Devillers, L., Buisine, S., Pelachaud, C. (2006). Multimodal complex emotions: Gesture expressivity and blended facial expressions. Int. J. Humanoid Robotics. Special issue on "Achieving Human-Like Qualities in Interactive Virtual and Physical Humanoids", 3(3).
Wehrle, T., Kaiser, S., Schmidt, S., Scherer, K. R. (2000). Studying the dynamics of emo- tional expression using synthesized facial muscle movements. J. Pers. Social Psychol., 78, 105-119.
Kaiser, S., Wehrle, T. (2006). Modeling appraisal theory of emotion and facial expression. In: Magnenat-Thalmann, N. (ed) Proc. 19th Int. Conf. on Computer Animation and Social Agents , CASA 2006, Geneva, Computer Graphics Society (CGS).
Wehrle, T. (1996). The Geneva Appraisal Manipulation Environment (GAME). University of Geneva, Switzerland. Unpublished computer software edn.
Perlin, K., Goldberg, A. (1996). Improv: A system for interactive actors in virtual worlds. In: Computer Graphics Proc., Annual Conference Series, ACM SIGGRAPH, New Orleans, Lousiana, USA, 205-216.
Bruderlin, A., Williams, L. (1995). Motion signal processing. In: Proc. 22nd Annual Conf. on Computer Graphics and Interactive Techniques, ACM Press, New York, NY, 97-104.
Chi, D. M., Costa, M., Zhao, L., Badler, N. I. (2000). The EMOTE model for effort and shape. In: Akeley, K. (ed) Siggraph 2000, Computer Graphics Proc., ACM Press/ACM SIGGRAPH/Addison Wesley Longman, 173-182.
Laban, R., Lawrence, F. (1974). Effort: Economy in Body Movement. Plays, Inc., Boston.
Wallbott, H. G., Scherer, K. R. (1986). Cues and channels in emotion recognition. J. Pers. Soc. Psychol., 51, 690-699.
Gallaher, P. E. (1992). Individual differences in nonverbal behavior: Dimensions of style. J. Pers. Soc. Psychol., 63, 133-145.
Hartmann, B., Mancini, M., Pelachaud, C. (2005). Implementing expressive gesture synthe- sis for embodied conversational agents. In: Gesture Workshop, Vannes.
Egges, A., Magnenat-Thalmann, N. (2005). Emotional communicative body animation for multiple characters. In: V-Crowds'05, Lausanne, Switzerland, 31-40.
Stocky, T., Cassell, J. (2002). Shared reality: Spatial intelligence in intuitive user interfaces. In: IUI '02: Proc. 7th Int. Conf. on Intelligent User Interfaces, ACM Press, New York, NY, 224-225.
Chopra-Khullar, S., Badler, N. I. (2001). Where to look? automating attending behaviors of virtual human characters. Autonomous Agents Multi-Agent Syst., 4, 9-23.
Nakano, Y. I., Reinstein, G., Stocky, T., Cassell, J. (2003). Towards a model of face- to-face grounding. ACL'03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan, 553-561.
Peters, C. (2005). Direction of attention perception for conversation initiation in virtual environments. In: Int. Conf. on Intelligent Virtual Agents, Kos, Greece, 215-228.
Baron-Cohen, S. (1994). How to build a baby that can read minds: Cognitive Mechanisms in Mind-Reading. Cah. Psychol. Cogn., 13, 513-552.
Batliner, A., Huber, R., Niemann, H., Nöth, E., Spilker, J., Fischer, K. (2005). The recog- nition of emotion. In: Wahlster, W. (ed) Verbmobil: Foundations of Speech-to-Speech Translations. Springer, Berlin, 122-130.
Maatman, R. M., Gratch, J., Marsella, S. (2005). Natural behavior of a listening agent. In: Int. Conf. on Intelligent Virtual Agents, Kos, Greece, 25-36.
Bickmore, T., Cassel, J. (2005). Social dialogue with embodied conversational agents. In: van Kuppevelt, J., Dybkjaer, L., Bernsen, N. O. (eds) Advances in Natural, Multimodal Dialogue Systems. Springer, Berlin.
Brown, P., Levinson, S. C. (1987). Politeness -Some Universals in Language Usage. Cambridge University Press, Cambridge.
Walker, M. A., Cahn, J. E., Whittaker, S. J. (1997). Improvising linguistic style: Social and affective bases for agents, First International Conference on Autonomous Agents, Marina del Rey, CA, USA, ACM: New York, USA, 96-105.
Johnson, W. L., Rizzo, P., Bosma, W., Kole, S., Ghijsen, M., vanWelbergen, H. (2004). Generating socially appropriate tutorial dialog. Affective Dialogue Systems, Tutorial and Research Workshop, ADS 2004, Kloster Irsee, Germany, June 14-16, 2004, Springer, Lecture Notes in Computer Science, Vol. 3068, 254-264.
Johnson, L., Mayer, R., André, E., Rehm, M. (2005). Cross-cultural evaluation of politeness in tactics for pedagogical agents. In: Proc. of the 12th Int. Conf. on Artificial Intelligence in Education (AIED), Amsterdam, Netherlands.
Rehm, M., André, E. (2006). Informing the design of embodied conversational agents by analysing multimodal politeness behaviours in human-human communica- tion. In: Nishida, T. (ed) Engineering Approaches to Conversational Informatics. Wiley, Chichester, UK.
Cassell, J. (2006). Body language: Lessons from the near-human. In: Riskin, J. (ed) The Sistine Gap: History and Philosophy of Artificial Intelligence. University of Chicago, Chicago.
Martin, J. C., Abrilian, S., Devillers, L., Lamolle, M., Mancini, M., Pelachaud, C. (2005). Levels of representation in the annotation of emotion for the specification of expressivity in ECAs. In: Int. Conf. on Intelligent Virtual Agents, Kos, Greece, 405-417.
Kipp, M. (2005). Gesture generation by imitation: from human behavior to computer character animation. Dissertation.com, Boca Raton, FL.
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C. (2004): Speaking with hands: Creating animated conversational characters from recordings of human performance. ACM Trans. Graph, 23, 506-513.
Buisine, S., Abrilian, S., Niewiadomski, R., MARTIN, J. C., Devillers, L., Pelachaud, C. (2006). Perception of blended emotions: From video corpus to expressive agent. In: The 6th Int. Conf. on Intelligent Virtual Agents, Marina del Rey, USA.
Ruttkay, Z., Pelachaud, C. (2004). From Brows to Trust: Evaluating Embodied Conversational Agents (Human-Computer Interaction Series). Springer-Verlag, New York, Inc., Secaucus, NJ, USA.
Buisine, S., Abrilian, S., Martin, J. C. (2004). Evaluation of multimodal behaviour of embod- ied agents. In: Ruttkay, Z., Pelachaud, C. (eds) From Brows to Trust: Evaluating Embodied Conversational Agents. Kluwer, Norwell, MA, 217-238.
Lee, K. M., Nass, C. (2003). Designing social presence of social actors in human computer interaction. In: CHI '03: Proc. SIGCHI Conf. on Human Factors in Computing Systems, ACM Press, New York, NY, 289-296.
Nass, C., Gong, L. (2000). Speech interfaces from an evolutionary perspective. Commun. ACM, 43, 36-43.
Vinayagamoorthy, V., Garau, M., Steed, A., Slater, M. (2004). An eye gaze model for dyadic interaction in an immersive virtual environment: Practice and experience. Comput. Graph. Forum, 23, 1-12.
Lee, S. P., Badler, J. B., Badler, N. I. (2002). Eyes alive. In: SIGGRAPH '02: Proc. 29th Annual Conf. on Computer Graphics and Interactive Techniques, ACM Press, New York, NY, 637-644.
Rehm, M., André, E. (2005). Where do they look? Gaze behaviors of multiple users inter- acting with an embodied conversational agent. In: Int. Conf. on Intelligent Virtual Agents, Kos, Greece, 241-252.
Cowell, A. J., Stanney, K. M. (2003) Embodiment and interaction guidelines for designing credible, trustworthy embodied conversational agents. In: Int. Conf. on Intelligent Virtual Agents, Kos, Greece, 301-309.
Picard R W. (1997). Affective Computing. MIT Press, Cambridge, MA.
James W. (1884). What is emotion? Mind, vol. 9(34), 188-205.
Oatley K. (1987). Cognitive science and the understanding of emotions. Cogn. Emotion, 3(1), 209-216.
Bigun E. S., Bigun J., Duc B., Fischer S. (1997). Expert conciliation for multimodal person authentication systems using bayesian statistics. In: Int. Conf. on Audio and Video-Based Biometric Person Authentication (AVBPA), Crans-Montana, Switzerland, 291-300.
Scherer K. R. (1986). Vocal affect expression: A review and a model for future research. Psychol. Bull., vol. 99(2), 143-165.
Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Commun., 40, 227-256.
Scherer, K. R., Banse, R., Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. J. Cross-Cultural Psychol., 32 (1), 76-92.
Johnstone, T., van Reekum, C. M., Scherer, K. R. (2001). Vocal correlates of appraisal processes. In: Scherer, K. R., Schorr, A., Johnstone, T. (eds) Appraisal Processes in Emotion: Theory, Methods, Research. Oxford University Press, New York and Oxford, 271-284.
Petrushin, V. A. (2000). Emotion recognition in speech signal: Experimental study, develop- ment and application. In: 6th Int. Conf. on Spoken Language Processing, ICSLP2000, Beijing, 222-225.
Gobl, C., Chasaide, A. N. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Commun., 40(1-2), 189-212.
Tato, R., Santos, R., Kompe, R., Pardo, J. M. (2002). Emotional space improves emotion recognition. In: ICSLP2002, Denver, CO, 2029-2032.
Dellaert, F., Polzin, T., Waibel, A. (1996). Recognizing emotion in speech. In: ICSLP 1996, Philadelphia, PA, 1970-1973.
Lee, C. M., Narayanan, S., Pieraccini, R. (2001). Recognition of negative emotion in the human speech signals. In: Workshop on Automatic Speech Recognition and Understanding.
Yu, F., Chang, E., Xu, Y. Q., Shum H. Y. (2001). Emotion detection from speech to enrich multimedia content. In: The 2nd IEEE Pacific-Rim Conf. on Multimedia, Beijing, China, 550-557.
Campbell, N. (2004). Perception of affect in speech -towards an automatic processing of paralinguistic information in spoken conversation. In: ICSLP2004, Jeju, 881-884.
Cahn, J. E. (1990). The generation of affect in synthesized speech. J. Am. Voice I/O Soc., vol. 8, 1-19.
Schroder, M. (2001). Emotional speech synthesis: A review. In: Eurospeech 2001, Aalborg, Denmark, 561-564.
Campbell, N. (2004). Synthesis units for conversational speech -using phrasal segments. Autumn Meet. Acoust.: Soc. Jpn., vol. 2005, 337-338.
Schroder, M., Breuer, S. (2004). XML representation languages as a way of interconnect- ing TTS modules. In: 8th Int. Conf. on Spoken Language Processing, ICSLP'04, Jeju, Korea.
Eide, E., Aaron, A., Bakis, R., Hamza, W., Picheny, M., Pitrelli, J. (2002). A corpus-based approach to <ahem/> expressive speech synthesis. In: IEEE Speech Synthesis Workshop, Santa Monica, 79-84.
Chuang, Z. J., Wu, C. H. (2002). Emotion recognition from textual input using an emo- tional semantic network. In: Int. Conf. on Spoken Language Processing, ICSLP 2002, Denver, 177-180.
Tao, J. (2003). Emotion control of chinese speech synthesis in natural environment. In: Eurospeech2003, Geneva.
Moriyama, T., Ozawa, S. (1999). Emotion recognition and synthesis system on speech. In: IEEE Int. Conf. on Multimedia Computing and Systems, Florence, Italy, 840-844.
Massaro, D. W., Beskow, J., Cohen, M. M., Fry, C. L., Rodriguez, T. (1999). Picture my voice: Audio to visual speech synthesis using artificial neural networks. In: AVSP'99, Santa Cruz, CA, 133-138.
Darwin, C. (1872). The Expression of the Emotions in Man and Animals. University of Chicago Press, Chicago.
Etcoff, N. L., Magee, J. J. (1992). Categorical perception of facial expressions. Cognition, vol. 44, 227-240.
Ekman, P., Friesen, W. V. (1997). Manual for the Facial Action Coding System. Consulting Psychologists Press, Palo Alto, CA.
Yamamoto, E., Nakamura, S., Shikano, K. (1998). Lip movement synthesis from speech based on Hidden Markov Models. Speech Commun., vol. 26, 105-115.
Tekalp, A. M., Ostermann, J. (2000). Face and 2-D mesh animation in MPEG-4. Signal Process.: Image Commun., vol. 15, 387-421.
Lyons, M. J., Akamatsu, S., Kamachi, M., Gyoba, J. (1998). Coding facial expressions with gabor wavelets. In: 3rd IEEE Int. Conf. on Automatic Face and Gesture Recognition, Nara, Japan, 200-205.
Calder, A. J., Burton, A. M., Miller, P., Young, A. W., Akamatsu, S. (2001). A principal component analysis of facial expression. Vis. Res., vol. 41, 1179-208.
Kobayashi, H., Hara, F. (1992). Recognition of six basic facial expressions and their strength by neural network. In: Intl. Workshop on Robotics and Human Communications, New York, 381-386.
Bregler, C., Covell, M., Slaney, M. (1997). Video rewrite: Driving visual speech with audio. In: ACM SIGGRAPH'97, Los Angeles, CA, 353-360.
Cosatto, E., Potamianos, G., Graf, H. P. (2000). Audio-visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE Int. Conf. on Multimedia and Expo, New York, 619-622.
Ezzat, T., Poggio, T. (1998). MikeTalk: A talking facial display based on morphing visemes. In: Computer Animation Conf., Philadelphia, PA, 456-459.
Gutierrez-Osuna, R., Rundomin, J. L. (2005). Speech-driven facial animation with realistic dynamics. IEEE Trans. Multimedia, vol. 7, 33-42.
Hong, P. Y., Wen, Z., Huang, T. S. (2002). Real-time speech-driven face animation with expressions using neural networks. IEEE Trans. Neural Netw., vol. 13, 916-927.
Verma, A., Subramaniam, L. V., Rajput, N., Neti, C., Faruquie, T. A. (2004). Animating expressive faces across languages. IEEE Trans Multimedia, vol. 6, 791-800.
Collier, G. (1985). Emotional expression, Lawrence Erlbaum Associates. http://faculty.uccb.ns.ca/∼gcollier/
Argyle, M. (1988). Bodily Communication. Methuen & Co, New York, NY.
Siegman, A. W., Feldstein, S. (1985). Multichannel Integrations of Nonverbal Behavior, Lawrence Erlbaum Associates, Hillsdale, NJ.
Feldman, R. S., Philippot, P., Custrini, R. J. (1991). Social competence and nonverbal behav- ior. In: Rimé, R. S. F. B. (ed) Fundamentals of Nonverbal Behavior. Cambridge University Press, Cambridge, 329-350.
Knapp, M. L., Hall, J. A. (2006). Nonverbal Communication in Human Interaction, 6th edn. Thomson Wadsworth, Belmont, CA.
Go, H. J., Kwak, K. C., Lee, D. J., Chun, M. G. (2003). Emotion recognition from facial image and speech signal. In: Int. Conf. Society of Instrument and Control Engineers, Fukui, Japan, 2890-2895.
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M. et al. (2004), Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Int. Conf. on Multimodal Interfaces, State College, PA, 205-211.
Song, M., Bu, J., Chen, C., Li, N. (2004). Audio-visual based emotion recognition -A new approach. In: Int. Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, 1020-1025.
Zeng, Z., Tu, J., Liu, M., Zhang, T., Rizzolo, N., Zhang, Z., Huang, T. S., Roth, D., Levinson, S. (2004). Bimodal HCI-related emotion recognition. In: Int. Conf. on Multimodal Interfaces, State College, PA, 137-143.
Zeng, Z., Tu, J., Pianfetti, B., Huang, T. S. Audio-visual affective expression recognition through multi-stream fused HMM. IEEE Trans. Multimedia, vol. 10(4), 570-577.
Zeng, Z., Tu, J., Liu, M., Huang, T. S., Pianfetti, B., Roth D., Levinson, S. (2007). Audio- visual affect recognition. IEEE Trans. Multimedia, 9 (2), 424-428.
Wang, Y., Guan, L. (2005). Recognizing human emotion from audiovisual information. In: ICASSP, Philadelphia, PA, Vol. II, 1125-1128.
Hoch, S., Althoff, F., McGlaun, G., Rigoll, G. (2005). Bimodal fusion of emotional data in an automotive environment. In: ICASSP, Philadelphia, PA, Vol. II, 1085-1088.
Fragopanagos, F., Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Netw., 18, 389-405.
Pal, P., Iyer, A. N., Yantorno, R. E. (2006). Emotion detection from infant facial expressions and cries. In: Proc. Int'l Conf. on Acoustics, Speech & Signal Processing, Philadelphia, PA, 2, 721-724.
Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Paouzaiou, A., Karpouzis, K. (2006). Modeling naturalistic affective states via facial and vocal expression recognition. In: Int. Conf. on Multimodal Interfaces, Banff, Alberta, Canada, 146-154.
Karpouzis, K., Caridakis, G., Kessous, L., Amir, N., Raouzaiou, A., Malatesta, L., Kollias, S. (2007). Modeling naturalistic affective states via facial, vocal, and bodily expression recognition. In: Lecture Notes in Artificial Intelligence, vol. 4451, 91-112.
Chen, C. Y., Huang, Y. K., Cook, P. (2005). Visual/Acoustic emotion recognition. In: Proc. Int. Conf. on Multimedia and Expo, Amsterdam, Netherlands, 1468-1471.
Picard, R. W. (2003). Affective computing: Challenges. Int. J. Hum. Comput. Studies, vol. 59, 55-64.
Ortony, A., Clore, G. L., Collins, A. (1990). The Cognitive Structure of Emotions. Cambridge University Press, Cambridge.
Carberry, S., de Rosis, F. (2008). Introduction to the Special Issue of UMUAI on 'Affective Modeling and Adaptation', International Journal of User Modeling and User-Adapted Interaction, vol. 18, 1-9.
Esposito, A., Balodis, G., Ferreira, A., Cristea, G. (2006). Cross-Modal Analysis of Verbal and Non-verbal Communication. Proposal for a COST Action.
Yin, P. R., Tao J. H. (2005). Dynamic mapping method based speech driven face animation system. In: The 1st Int. Conf. on Affective Computing and Intelligent Interaction (ACII2005), Beijing., 755-763.
O'Brien, J. F., Bodenheimer, B., Brostow, G., Hodgins, J. (2000). Automatic joint param- eter estimation from magnetic motion capture data. In: Graphics Interface 2000, Montreal, Canada, 53-60.
Aggarwal, J. K., Cai, Q. (1999). Human motion analysis: A review. Comput. Vision Image Understand., vol. 73(3), 428-440.
Gavrila, D. M. (1999). The visual analysis of human movement: A survey. Comput. Vision Image Understand., vol. 73(1), 82-98.
Azarbayejani, A., Wren, C., Pentland, A. (1996). Real-time 3-D tracking of the human body. In: IMAGE'COM 96, Bordeaux, France.
Camurri, A., Poli, G. D., Leman, M., Volpe, G. (2001). A multi-layered conceptual framework for expressive gesture applications. In: Intl. EU-TMR MOSART Workshop, Barcelona.
Cowie, R. (2001). Emotion recognition in human-computer interaction. IEEE Signal Process. Mag., vol. 18(1), 32-80.
Brunelli, R., Falavigna, D. (1995). Person identification using multiple cues. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 17(10), 955-966.
Kumar, A., Wong, D. C., Shen, H. C., Jain, A. K. (2003). Personal verification using palmprint and hand geometry biometric. In: 4th Int. Conf. on Audio-and Video-based Biometric Person Authentication, Guildford, UK, 668-678.
Frischholz, R. W., Dieckmann, U. (2000). Bioid: A multimodal biometric identification system. IEEE Comput., vol. 33(2), 64-68.
Jain, A. K., Ross, A. (2002). Learning user-specific parameters in a multibiometric system. In: Int. Conf. on Image Processing (ICIP), Rochester, New York, 57-60.
Ho, T. K., Hull, J. J., Srihari, S. N. (1994). Decision combination in multiple classifier systems. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16(1), 66-75.
Kittler, J., Hatef, M., Duin, R. P. W., Matas, J. (1998). On combining classifiers. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20(3), 226-239.
Dieckmann, U., Plankensteiner, P., Wagner T. (1997). Sesam: A biometric person identifica- tion system using sensor fusion. Pattern Recognit. Lett., vol. 18, 827-833.
Silva, D., Miyasato, T., Nakatsu, R. (1997). Facial emotion recognition using multi-modal information, In: Proc. Int. Conf. on Information and Communications and Signal Processing, Singapore, 397-401. [[utterance_type,ynq], [pronoun,you], [state,have_symptom], [symptom,headache], [tense,present], [voice,active], [sc,when], [clause,[[utterance_type,dcl], [pronoun,you], [action,drink], [cause,red_wine] [tense,present
Source = [[utterance_type,ynq], [modal,can], [cause,bright_light], [action,give], [voice,active], [pronoun,you], [symptom,headache]]
Interlingua = [[utterance_type,ynq], [pronoun,you], [state,have_symptom], [symptom,headache], [tense,present] [sc,when], [clause, [[utterance_type,dcl], [pronoun,you], [cause,bright_light], [state,experience], [tense,present], [voice,active
References
Adams, D. (1979). The Hitchhiker's Guide to the Galaxy, London: Pan Books.
Levinson, S., Liberman, M. (1981). Speech recognition by computer. Sci. Am., 64-76.
Weinstein, C., McCandless, S., Mondshein, L., Zue, V. (1975). A system for acoustic-phonetic analysis of continuous speech. IEEE Trans. Acoust. Speech Signal Process., 54-67.
Bernstein, J., Franco, H. (1996). Speech recognition by computer. In: Principles of Experimental Phonetics, St. Louis: Mosby, 408-434.
Young, S. (1996). A review of large-vocabulary continuous-speech recognition. IEEE Signal Process. Mag., 45-57.
Ehsani, F., Knodt, E. (1998). Speech technology in computer-aided language learning: strengths and limitations of a new CALL paradigm. Language Learning and Technology, 2, 45-60. Available online, February 2010: http://llt.msu.edu/vol2num1/article3/index.html.
Deng, L., Huang, X. (2004). Challenges in adopting speech recognition. Commun. ACM, (47-1), 69-75.
Nyberg, E., Mitamura, T. (1992). The KANT system: fast, accurate, high-quality translation in practical domains. In: Proc. 14th Conf. on Computational Linguistics, Nantes, France.
Cavalli-Sforza, V., Czuba, K., Mitamura, T., Nyberg, E. (2000). Challenges in adapting an interlingua for bidirectional english-italian translation. In: Proc. 4th Conf. Assoc. Machine Translation in the Americas on Envisioning Machine Translation in the Information Future, 169-178.
Somers, H. (1999). Review article: Example-based machine translation. Machine Translation, 113-157.
Brown, R. (1996). Example-based machine translation in the pangloss system. In: Proc. 16th Int. Conf. on Computational Linguistics (COLING-96), Copenhagen, Denmark.
Trujillo, A. (1999). Translation engines: Techniques for machine translation. London: Springer.
Brown, P., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Lafferty, J., Mercer, R., Roossin, P. (1990). A statistical approach to machine translation, Comput. Linguistics, 16(2), 79-85.
Berger, A., Della Pietra, V., Della Pietra, S. (1996). A maximum entropy approach to natural language processing. Comput. Linguistics, 22(1), 39-71.
Brown, R., Frederking, R. (1995). Applying statistical English language modeling to symbolic machine translation. In: Proc. 6th Int. Conf. on Theoretical and Methodological Issues in Machine Translation (TMI-95): Leuven, Belgium, 221-239.
Knight, K. (1999). A statistical MT tutorial workbook. Unpublished. Available online, May 2010: http://www.isi.edu/natural-language/mt/wkbk.rtf.
Koehn, P., Knight, K. (2001). Knowledge sources for word-level translation models. In: Proc. EMNLP 2001 Conf. on Empirical Methods in Natural Language Processing, Pittsburgh, PA, 27-35.
Brown, R. (1999). Adding linguistic knowledge to a lexical example-based translation system. In: Proc. TMI-99, Chester, England.
Yamada, K., Knight, K. (2001). A syntax-based statistical translation model. In: Proc. 39th Annual Meeting on Association for Computational Linguistics, Toulouse, France, 523-530.
Alshawi, H., Douglas, S., Bangalore, S. (2000). Learning dependency translation models as collections of finite-state head transducers. Comput. Linguistics, 26(1), 45-60.
Wu, D. (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguistics, 23(3), 377-403.
Wang, Y. (1998). Grammar inference and statistical machine translation. Ph.D. thesis, Carnegie Mellon University.
Och, F., Tillmann, J., Ney, H. (1999). Improved alignment models for statistical machine trans- lation. In: Proc. Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora: University of Maryland, College Park, MD, 20-28.
Galley, M., Graehl, J., Knight, K., Marcu, D., DeNeefe, S., Wang, W., Thayer, I. (2006). Scalable inference and training of context-rich syntactic translation models. In: Proc. 21st Int. Conf. on Computation Linguistics, Sydney, 961-968.
Venugopal, A. (2007). Hierarchical and Syntax Structured Models, MT Marathon, Edinburgh, Scotland.
Chiang, D. (2007). Hierarchical phrase-based translation. Assoc. Comput. Linguistics, 33(2), 201-228.
Schultz, T., Black, A. (2006). Challenges with rapid adaptation of speech translation systems to new language pairs. In: Proc. ICASSP2006, Toulouse, France.
Waibel, A. (1996). Interactive translation of conversational speech. Computer, 29(7), 41-48.
Woszczyna, M., Coccaro, N., Eisele, A., Lavie, A., McNair, A., Polzin, T., Rogina, I., Rose, C., Sloboda, T., Tomita, T., Tsutsumi, J., Aoki-Waibel, N., Waibel, A., Ward, W. (1993). Recent advances in JANUS: A speech translation system. In: Proc. Workshop on Human Language Technology, Princeton, NJ.
Wahlster, W. (2002). Verbmobil: Foundations of Speech-to-Speech Translation, Springer, Berlin.
Rayner, M. H., Alshawi, I., Bretan, D., Carter, V., Digalakis, B., Gambck, J., Kaja, J., Karlgren, B., Lyberg, P., Price, S., Pulman, S., Samuelsson, C. (1993). A speech to speech translation system built from standard components. In: Proc. 1993 ARPA workshop on Human Language Technology, Princeton, NJ.
Rayner, M., Carter, D. (1997). Hybrid language processing in the spoken language translator. In: Proc. ICASSP'97, Munich, Germany.
Rayner, M., Carter, D., Bouillon, P., Wiren, M., Digalakis, V. (2000). The Spoken Language Translator, Cambridge University Press, Cambridge.
Isotani, R., Yamabana, K., Ando, S., Hanazawa, K., Ishikawa, S., Iso, K. (2003). Speech- to-Speech Translation Software on PDAs for Travel Conversation. NEC Research and Development.
Yasuda, K., Sugaya, F., Toshiyuki, T., Seichi, Y., Masuzo, Y. (2003). An automatic evaluation method of translation quality using translation answer candidates queried from a paral- lel corpus. In: Proc. Machine Translation Summit VIII, 373-378. Santiago de Compostela, Spain.
Metze, F., McDonough, J., Soltau, H., Waibel, A., Lavie, A., Burger, S., Langley, C., Laskowski, K., Levin, L., Schultz, T., Pianesi, F., Cattoni, R., Lazzari, G., Mana, N., Pianta, E., Besacier, L., Blanchon, H., Vaufreydaz, D., Taddei, L. (2002). The NESPOLE! Speech-to-speech translation system. In: Proc. HLT 2002, San Diego, CA.
Bangalore, S., Riccardi, G. (2000). Stochastic finite-state models for spoken language machine translation. In: NAACL-ANLP 2000 Workshop on Embedded Machine Translation Systems, Seattle, WA, 52-59.
Zhang, Y. (2003). Survey of Current Speech Translation Research. Unpublished. Available online, May 2010: http://projectile.sv.cmu.edu/research/public/talks/speechTranslation/sst- survey-joy.pdf
Agnas, M. S., Alshawi, H., Bretan, I., Carter, D. M., Ceder, K., Collins, M., Crouch, R., Digalakis, V., Ekholm, B., Gambäck, B., Kaja, J., Karlgren, J., Lyberg, B., Price, P., Pulman, S., Rayner, M., Samuelsson, C., Svensson, T. (1994). Spoken language translator: first year report. SRI Technical Report CRC-043.
Digalakis, V., Monaco, P. (1996). Genones: Generalized mixture tying in continuous hidden Markov model-based speech recognizers. IEEE Trans. Speech Audio Process., 4(4), 281-289.
Alshawi, H. (1992). The Core Language Engine. MIT Press, Cambridge, MA.
Alshawi, H., van Eijck, J. (1989). Logical forms in the core language engine. In: Proc. 27th Annual Meeting on Association for Computational Linguistics, Vancouver, British Columbia, Canada, 25-32.
Alshawi, H., Carter, D. (1994). Training and scaling preference functions for disambiguation. Comput. Linguistics, 20(4), 635-648.
Rayner, M., Samuelsson, C. (1994). Grammar Specialisation. In: [39], 39-52.
Samuelsson, C. (1994). Fast natural-language parsing using explanation-based learning. PhD thesis, Royal College of Technology, Stockholm, Sweden.
Frederking, R., Nirenburg, S. (1994). Three heads are better than one. In: Proc. 4th Conf. on Applied Natural Language Processing, Stuttgart, Germany.
Rayner, M., Bouillon, P. (2002). A flexible speech to speech phrasebook translator. In: Proc. ACL Workshop on Speech-to-Speech Translation, Philadelphia, PA.
Rayner, M., Hockey, B. A., Bouillon, P. (2006). Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler. CSLI Press, Stanford, CA.
Rayner, M., Bouillon, P., Santaholma, M., Nakao, Y. (2005). Representational and architec- tural issues in a limited-domain medical speech translator. In: Proc. TALN 2005, Dourdan, France.
Chatzichrisafis, N., Bouillon, P., Rayner, M., Santaholma, M., Starlander, M., Hockey, B. A. (2006). Evaluating task performance for a unidirectional controlled language medical speech translation system. In: Proc. 1st Int. Workshop on Medical Speech Translation, HLT-NAACL, New York, NY.
Sarich, A. (2004). Development and fielding of the phraselator phrase translation system. In: Proc. 26th Conf. on Translating and the Computer, London.
Frederking, R., Rudnicky, A., Hogan, C., Lenzo, K. (2000). Interactive speech translation in the Diplomat project. Machine Translation J., Special Issue on Spoken Language Translation, 15(1-2), 27-42.
Huang X., Alleva F., Hon H. W., Hwang K. F., Lee M. Y., Rosenfeld R. (1993). The SPHINX- II Speech Recognition System: An overview. Comput. Speech Lang., 2, 137-148.
Lenzo, K., Hogan, C., Allen, J. (1998). Rapid-deployment text-to-speech in the DIPLOMAT system. In: Proc. 5th Int. Conf. on Spoken Language Processing (ICSLP-98), Sydney, Australia.
Frederking, R., Brown, R. (1996). The Pangloss-Lite machine translation system. In: Proc. Conf. Assoc. for Machine Translation in the Americas (AMTA).
Nielsen, J. (1993). Usability Engineering. AP Professional, Boston, MA.
Rudnicky, A. (1995). Language modeling with limited domain data. In: Proc. ARPA Workshop on Spoken Language Technology, Morgan Kaufmann, San Francisco, CA, 66-69.
Gates, D., Lavie, A., Levin, L., Waibel, A., Gavaldà, M., Mayfield, L., Woszczyna, M., Zhan, P. (1996). End-to-end evaluation in JANUS: A speech-to-speech translation system. In: Workshop on Dialogue Processing in Spoken Language Systems. Lecture Notes in Computer Science, Springer, Berlin.
Black, A., Brown, R., Frederking, R., Lenzo, K., Moody, J., Rudnicky, A., Singh, R., Steinbrecher, E. (2002). Rapid development of speech-to-speech translation systems. In: Proc. ICSLP-2002, Denver.
Black, A., Lenzo, K. (2000). Building voices in the festival speech synthesis system. Unpublished. Available online, May 2010: http://www.festvox.org/festvox/index.html.
Joachims, T. (2002). Learning to Classify Text Using Support Vector Machines. Dissertation, Kluwer.
Klinkenberg, R., Joachims, T. (2000). Detecting concept drift with support vector machines. In: Proc. 17th Int. Conf. on Machine Learning (ICML), Morgan Kaufmann, San Francisco, CA.
Zens, R., Ney, H. (2004). Improvements in phrase-based statistical machine translation. In: Proc. Human Language Technology Conf. (HLT-NAACL), Boston, MA, 257-264.
Tillmann, C., Zhang, T. (2005). A localized prediction model for statistical machine trans- lation. In: Proc. 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, 557-564.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., Flannery, B. P. (2007). Numerical Recipes 3rd Edition: The Art of Scientific Computing. Cambridge University Press, Cambridge.
Schlenoff et al. (2007). Transtac July 2007 Evaluation Report, NIST Internal Document. Published in September 2007.
Baker, D. W., Parker, R. M., Williams, M. V., Coates, W. C., Pitkin, K. (1996). Use and effectiveness of interpreters in an emergency department. JAMA, 275, 783-788.
Cameron, H. (2000). Speech at the interface. In: Workshop on "Voice Operated Telecom Services". Ghent, Belgium, COST 249.
Heisterkamp, P. (2001). Linguatronic -Product-level speech system for Mercedes-Benz Cars. In: Proc. HLT, San Diego, CA, USA.
Hamerich, S. W. (2007). Towards advanced speech driven navigation systems for cars. In: 3rd IET Int. Conf. on Intelligent Environments, IE07, Sept. 24-25, Ulm, Germany.
Goose, S., Djennane, S. (2002). WIRE3: Driving around the information super-highway. Pers. Ubiquitous Comput., 6, 164-175.
Nass, C., Jonsson, I.-M., Harris, H., Reaves, B., Endo, J., Brave, S., Takayama, L. (2005). Improving automotive safety by pairing driver emotion and car voice emotion. In: CHI '05 Extended Abstracts on Human factors in Computing Systems. ACM Press, New York, NY.
Nass, C., Brave, S. B. (2005). Wired for Speech: How Voice Activates and Enhance the Human Computer Relationship. MIT Press, Cambridge, MA.
Bishop, R. (2005). Intelligent Vehicle Technology and Trends. Artech House, Boston.
van de Weijer, C. (2008). Keynote 1: Dutch connected traffic in practice and in the future. In: IEEE Intelligent Vehicles Sympos. Eindhoven, The Netherlander, June 4-6.
Gardner, M. (2008). Nomadic device integration in Aide. In: Proc. AIDE Final Workshop and Exhibition. April 15-16, Göteborg, Sweden.
Johansson, E., Engström, J., Cherri, C., Nodari, E., Toffetti, A., Schindhelm, R., Gelau, C. (2004). Review of existing techniques and metrics for IVIS and ADAS assessment. EU Information Society Technology (IST) program IST-1-507674-IP: Adaptive Integrated Driver-Vehicle Interface (AIDE).
Lee, J. D., Caven, B., Haake, S., Brown, T. L. (2001). Speech-based interaction with in- vehicle computer: The effect of speech-based e-mail on driver's attention to the roadway. Hum. Factors, 43, 631-640.
Barón, A., Green, P. (2006). Safety and Usability of Speech Interfaces for In-Vehicle Tasks while Driving: A Brief Literature Review. Transportation Research Institute (UMTRI), The University of Michigan.
Saad, F., Hjälmdahl, M., Cañas, J., Alonso, M., Garayo, P., Macchi, L., Nathan, F., Ojeda, L., Papakostopoulos, V., Panou, M., Bekiaris. E. (2004). Literature review of behavioural effects. EU Information Society Technology (IST) program: IST-1-507674-IP, Adaptive Integrated Driver-Vehicle Interface (AIDE).
Treffner, P. J., Barrett, R. (2004). Hands-free mobile phone speech while driving degrades coordination and control. Transport. Res. F, 7, 229-246.
Esbjörnsson, M., Juhlin, O., Weilenmann, A. (2007). Drivers using mobile phones in traffic: An ethnographic study of interactional adaption. Int. J. Hum. Comput. Inter., Special Issue on: In-Use, In-Situ: Extending Field Research Methods, 22 (1), 39-60.
Jonsson, I.-M., Chen, F. (2006). How big is the step for driving simulators to driving a real car? In: IEA 2006 Congress, Maastricht, The Netherlands, July 10-14.
Chen, F., Jordan, P. (2008). Zonal adaptive workload management system: Limiting sec- ondary task while driving. In: IEEE Intelligent Transportation System, IVs' 08, Eindhoven, The Netherlander, June 2-6.
Esbjörnsson, M., Brown, B., Juhlin, O., Normark, D., Östergren, M., Laurier, E. (2006). Watching the cars go round and round: designing for active spectating. In: Proc. SIGCHI Conf. on Human Factors in computing systems, Montréal, Québec, Canada, 2006.
Recarte, M. A., Nunes, L. M. (2003). Mental workload while driving: Effects on visual search, discrimination, and decision making. J. Exp. Psychol.: Appl., 9 (2), 119-137.
Victor, T. W., Harbluk, J. L., Engström, J. A. (2005). Sensitivity of eye-movement measures to in-vehicle task difficulty. Transport. Res. Part F, 8 (2), 167-190.
Hart, S. G., Staveland, L. E. (1988). Development of NASA-TLX (task Load Index): Results of empirical and theoretical research. In: Meshkati (ed) Human Mental Workload, P. A. H. a. N. Elsevier Science Publishers B.V., North-Holland, 139-183.
Pauzie, A., Sparpedon, A., Saulnier, G. (2007). Ergonomic evaluation of a prototype guidance system in an urban area. Discussion about methodologies and data collection tools, in Vehicle Navigation and Information Systems Conference. In: Proc. in conjunction with the Pacific Rim TransTech Conf. 6th Int. VNIS. "A Ride into the Future", Seattle, WA, USA.
Wang, E., Chen, F. (2008). A new measurement for simulator driving performance in situation without interfere from other vehicles, International Journal of Transportation Systems F. AEI 2008. In: Applied Human Factors and Ergonomics 2008, 2nd Int. Conf., Las Vegas, USA, July 14-17.
Wilson, G. F., Lambert, J. D., Russell, C. A. (2002). Performance enhancement with real- time physiologically controlled adaptive aiding. In: HFA Workshop: Psychophysiological Application to Human Factors, March 11-12, 2002. Swedish Center for Human Factors in Aviation.
Wilson, G. F. (2002). Psychophysiological test methods and procedures. In: HFA Workshop: Psychophysiological Application to Human Factors, March 11-12, 2002. Swedish Center for Human Factors in Aviation.
Lai, J., Cheng, K., Green, P., Tsimhoni, O. (2001). On the road and on the web? Comprehension of synthetic and human speech while driving. In: Conf. on Human Factors and Computing Systems, CHI 2001, 31 March-5 April 2001. Seattle, Washington, USA.
Hermansky, H., Morgan, N. (1994). RASTA processing of speech. IEEE Trans. Speech Audio Process., 2 (4), 578-589.
Kermorvant, C. (1999). A comparison of noise reduction techniques for robust speech recog- nition. IDIAP research report, IDIAP-RR-99-10, Dalle Molle Institute for perceptual Artificial Intelligence, Valais, Switzerland.
Furui, S. (1986). Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoustics, Speech Signal Process., 34 (1), 52-59.
Mansour, D., Juang, B.-H. (1989). The short-time modified coherence representation and noisy speech recognition. IEEE Trans. Acoustics Speech Signal Process., 37 (6), 795-804.
Hernando, J., Nadeu, C. (1997). Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition. IEEE Trans. Speech Audio Process., 5 (1), 80-84.
Chen, J., Paliwal, K. K., Nakamura, S. (2003). Cepstrum derived from differentiated power spectrum for robust speech recognition. Speech Commun., 41 (2-3), 469-484.
Yuo, K.-H., Wang, H.-C. (1998). Robust features derived from temporal trajectory filtering for speech recognition under the corruption of additive and convolutional noises. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, April 21-24, 1997, Munich, Bavaria, Germany.
Yuo, K.-H., Wang, H.-C. (1999). Robust features for noisy speech recognition based on temporal trajectory filtering of short-time autocorrelation sequences. Speech Commun., 28, 13-24.
Lebart, K., Boucher, J. M. (2001). A new method based on spectral subtraction for speech dereverberation. Acta Acoustic ACUSTICA, 87, 359-366.
Lee, C.-H., Soong, F. K., Paliwal, K. K. (1996). Automatic Speech and Speaker Recognition. Kluwer, Norwell.
Gales, M. J. F., Young, S. J. (1995). Robust speech recognition in additive and convolutional noise using parallel model combination. Comput. Speech Lang., 9, 289-307.
Gales, M. J. F., Young, S. J. (1996). Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process., 4 (5), 352-359.
Acero, A., Deng, L., Kristjansson, T., Zhang, J. (2000). HMM adaptation using vector Taylor series for noisy speech recognition. In: Proc. ICASSP, June 05-09, 2000, Istanbul, Turkey.
Kim, D. Y., Un, C. K., Kim, N. S. (1998). Speech recognition in noisy environments using first-order vector Taylor series. Speech Commun., 24 (1), 39-49.
Visser, E., Otsuka, M., Lee, T.-W. (2003). A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments. Speech Commun., 41, 393-407.
Farahani, G., Ahadi, S. M., Homayounpour, M. M. (2007). Features based on filtering and spectral peaks in autocorrelation domain for robust speech recognition. Comput. Speech Lang., 21, 187-205.
Choi, E. H. C. (2004). Noise robust front-end for ASR using spectral subtraction, spectral flooring and cumulatie distribution mapping. In: Proc. 10th Australian Int. Conf. on Speech Science & Technology. Macquarie University, Sydney, December 8-10.
Fernandez, R., Corradini, A., Schlangen, D. Stede, M. (2007). Towards reducing and man- aging uncertainty in spoken dialogue systems. In: The Seventh International Workshop on Computational Semantics (IWCS-7). Tilburg, The Netherlands, Jan 10-12.
Skantze, G. (2005). Exploring human error recovery strategies: Implications for spoken dialogue systems. Speech Commun., 45 (3), 325-341.
Gellatly, A. W. a. D., T. A. (1998). Speech recognition and automotive applications: using speech to perform in-vehicle tasks. In: Proc. Human Factors and Ergonomics Society 42nd Annual Meeting, October 5-9, 1998, Hyatt Regency Chicago, Chicago, Illinois.
Greenberg, J., Tijenna, L. Curn, R., Artz, B., Cathey, L., Grant P, Kochhar, D., Koxak, K., Blommer, M. (2003). Evaluation of driver distraction using an event detection paradigm. In: Proc. Transportation Research Board Annual Meetings, January 12-16, 2003, Washington, DC.
McCallum, M. C., Campbell, J. L., Richman, J. B., Brown, J. (2004). Speech recognition and in-vehicle telematics devices; Potential reductions in driver distraction. Int. J. Speech Technol., 7, 25-33.
Bernsen, N. O., Dybkjaer, L. (2002). A multimodal virtual co-driver's problems with the driver. In: ISCA Tutorial and Research Workshop on Multi-Modal Dialogue in Mobile Environments Proceedings. Kloster Irsee, Germany, June 17-19.
Geutner, P., Steffens, F. Manstetten, D. (2002). Design of the VICO Spoken Dialogue System: Evaluation of User Expectations by Wizard-of-Oz Experiments. In: Proc. 3rd Int. Conf. on Language Resources and Evaluation (LREC 2002). Las Palmas, Spain, May.
Villing, J.a.L., S. (2006). Dico: A multimodal menu-based in-vehicle dialogue system. In: The 10th Workshop on the Semantics and Pragmatics of Dialogue, brandial'06 (Sem-Dial 10). Potsdam, Germany, Sept 11-13.
Larsson, S. (2002). Issue-based dialogue management. PhD Thesis, Goteborg University.
Bringert, B., Ljunglöf, P., Raanta, A.and Cooper, R. (2005). Multimodal dialogue systems grammars. In: The DIALOR'05, 9th Workshop on the Semantics and Pragmatics of Dialogue. Nancy (France), June 9-11, 2005.
Oviatt, S. (2004). When do we interact multimodally? Cognitive load and multimodal com- munication patterns. In: Proc. 6th Int. Conf. on Multimodal Interfaces. Pennsylvania, Oct 14-15.
Bernsen, O., Dybkjaer, L. (2001). Exploring natural interaction in the car. In: Proc. CLASS Workshop on Natural Interactivity and Intelligent Interactive Information Representation, Verona, Italy, Dec 2001.
Esbjörnsson, M., Juhlin, O., Weilenmann, A. (2007). Drivers using mobile phones in traffic: An ethnographic study of interactional adaption. Int. J. Hum Comput Interact., Special Issue on In-Use, In-Situ: Extending Field Research Meth., 22 (1), 39-60.
Jonsson, I.-M., Nass, C., Endo, J., Reaves, B., Harris, H., Ta, J. L., Chan, N., Knapp, S. (2004). Don t blame me I am only the driver: Impact of blame attribution on attitudes and attention to driving task. In: CHI 04 extended Abstracts on Human Factors in Computing Systems, Vienna, Austria.
Jonsson, I.-M., Zajicek, M. (2005). Selecting the voice for an in-car information system for older adults. In: Human Computer Interaction Int. Las Vegas, Nevada, USA.
Jonsson, I.-M., Zajicek, M., Harris, H., Nass, C. I. (2005). Thank you I did not see that: In-car speech-based information systems for older adults. In: Conf. on Human Factors in Computing Systems. ACM Press, Portland, OR.
Jonsson, I. M., Nass, C. I., Harris, H., Takayama, L. (2005). Got Info? Examining the con- sequences of inaccurate information systems. In: Int. Driving Symp. on Human Factors in Driver Assessment, Training, and Vehicle Design. Rockport, Maine.
Gross, J. J. (1999). Emotion and emotion regulation. In: John, L. A. P. O. P. (ed) Handbook of Personality: Theory and Research. New York: Guildford, 525-552.
Picard, R. W. (1997). Affective Computing. MIT Press, Cambridge, MA.
Clore, G. C., Gasper, K. (2000). Feeling is believing: Some affective influences on belief. In: Frijda, A. S. R. M. N. H., Bem, S. (eds) Emotions and Beliefs: How Feelings Influence Thoughts, Editions de la Maison des Sciences de l Homme and Cambridge University Press (jointly published), Paris/Cambridge, 10-44.
Gross, J. J. (1998). Antecedent-and response-focused emotion regulation: Divergent con- sequences for experience, expression, and physiology. J. Personality Social Psychol., 74, 224-237.
Davidson, R. J. (1994). On emotion, mood, and related affective constructs. In: Davidson, P. E. R. J. (ed) The Nature of Emotion, Oxford University Press, New York, 51-55.
Bower, G. H., Forgas, J. P. (2000). Affect, memory, and social cognition. In: Eich, J. F. K. E., Bower, G. H., Forgas, J. P., Niedenthal, P. M. (eds) Cognition and Emotion. Oxford University Press, Oxford, 87-168.
Groeger, J. A. (2000). Understanding Driving: Applying Cognitive Psychology to a Complex Everyday Task. Psychology Press, Philadelphia, PA.
Lunenfeld, H. (1989). Human factor considerations of motorist navigation and information systems. In: Proc. Vehicle Navigation and Information Systems, September 11-13, Toronto, Canada.
Srinivasan, R., Jovanis, P. (1997). Effect of in-vehicle route guidance systems on driver workload and choice of vehicle speed: Findings from a driving simulator experiment. In: Ian Noy, Y. (ed) Ergonomics and Safety of Intelligent Driver Interfaces, Lawrence Erlbaum Associates Inc., Publishers, Mahwah, New Jersey, 97-114.
Horswill, M., McKenna, F. (1999). The effect of interference on dynamic risk-taking judgments. Br. J. Psychol., 90, 189-199.
Strayer, D., Drews, F., Johnston, W. (2003). Cell phone induced failures of visual attention during simulated driving. J. Exp. Psychol.: Appl., 9 (1), 23-32.
Merat, N., Jamson, A. H. (2005). Shut up I'm driving! Is talking to an inconsiderate passenger the same as talking on a mobile telephone. In: 3rd Int. Driving Symp.on Human Factors in Driver Assessment, Training, and Vehicle Design. Rockport, Maine.
Nass, C. et al. (2005). Improving automotive safety by pairing driver emotion and car voice emotion. In: CHI '05 Extended Abstracts on Human Factors in Computing Systems. ACM Press, New York, NY.
Brouwer, W. H. (1993). Older drivers and attentional demands: consequences for human fac- tors research. In: Proc. Human Factors and Ergonomics Society-Europe, Chapter on Aging and Human Factors. Soesterberg, Netherlands, 93-106.
Ponds, R. W., Brouwer, W. H., Wolffelaar, P. C. (1988). Age differences in divided attention in a simulated driving task. J. Gerontol., 43 (6), 151-156.
Zajicek, M., Hall, S. (1999). Solutions for elderly visually impaired people using the Internet. In: The 'Technology Push' and The User Tailored Information Environment, 5th Eur. Research Consortium for Informatics and Mathematics -ERCIM. 2000. Dagstuhl, Germany, November 28-December 1.
Zajicek, M.a.M., W. (2001). Speech output for older visually impaired adults. In: Blandford, A., Vanderdonckt, J., Gray, P. (eds) People and Computers XV -Interacting without Frontiers, Spring Verlag, 503-513.
Fiske, S., Taylor, S. (1991). Social Cognition. McGraw-Hill, New York, NY.
Lazarsfeld, P., Merton, R. (1948). Mass communication-popular taste and organized social action. In: Bryson, L. (ed) Institute for Religious and Social Studies, Nueva York.
Rogers, E., and Bhowmik, D. (1970). Homophily-Heterophily: Relational concepts for communication research. Public Opinion Q., 34, 523.
Dulude, L. (2002). Automated telephone answering systems and aging. Behav. Inform. Technol., 21, 171-184.
Van Der Laan, J., Heino, A., De Waard, D. (1997). A simple procedure for the assessment of acceptance of advanced transport telematics. Transport Res. C, 5 (1), 1-10.
Dybkjaer, L., Bernsen, N. O., Minker, W. (2004). Evaluation and usability of multimodal spoken language dialogue systems. Speech Commun., 43, 33-54.
Graham, R., Aldridge, L., Carter, C., Lansdown, T. C. (1999). The design of in-car speech recognition interfaces for usability and user acceptance. In: Harris, D. (ed) Engineering Psychology and Cognitive Ergonomics, Ashgate, Aldershot, 313-320.
Larsen, L. B. (2003). Assessment of spoken dialogue system usability -what are we really measuring? In: 8th Eur. Conf. on Speech Communication and Technology -Eurospeech 2003. September 1-4, Geneva, Switzerland.
Zajicek, M., Jonsson, I. M. (2005). Evaluation and context for in-car speech systems for older adults. In: The 2nd Latin American Conf. on Human-Computer Interaction, CLIHC, Cuernavaca, México, October 23-26, 2005.
Chen, F. (2004). Speech interaction system -how to increase its usability. In: The 8th Int. Conf. on Spoken Language Processing, Interspeech. ICSL, Jeju Island, Korea, Oct 4-8, 2004.
Norman, D. (2007). The Design of Future Things. Basic Books, New York.
Jordan, P. W. (2000). Designing Pleasurable Products. Taylor & Francis, London and New York. References
Aist, G., Dowding, J., Hockey, B. A., Hieronymus, J. (2002). An intelligent procedure assis- tant for astronaut training and support. In: Proc. 40th Annual Meeting of the Association for Computational Linguistics (demo track), Philadelphia, PA, 5-8.
Martin, D., Cheyer, A., Moran, D. (1999). The open agent architecture: a framework for building distributed software systems. Appl. Artif. Intell., 13 (1-2), 92-128.
Nuance (2006). http://www.nuance.com. As of 15 November 2006.
Knight, S., Gorrell, G., Rayner, M., Milward, D., Koeling, R., Lewin, I. (2001). Comparing grammar-based and robust approaches to speech understanding: a case study. In: Proc. Eurospeech 2001, Aalborg, Denmark, 1779-1782.
Rayner, M., Hockey, B. A., Bouillon, P. (2006). Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler. CSLI, Chicago, IL.
Regulus (2006). http://sourceforge.net/projects/regulus/. As of 15 November 2006.
Pulman, S. G. (1992). Syntactic and semantic processing. In: Alshawi, H. (ed) The Core Language Engine, MIT, Cambridge, MA, 129-148.
van Harmelen, T., Bundy, A. (1988). Explanation-based generalization -partial evaluation (research note). Artif. Intell., 36, 401-412.
Rayner, M. (1988). Applying explanation-based generalization to natural-language process- ing. In: Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Japan, 1267-1274.
Rayner, M., Hockey, B. A. (2003). Transparent combination of rule-based and data-driven approaches in a speech understanding architecture. In: Proc. 10th Conf. Eur. Chapter of the Association for Computational Linguistics, Budapest, Hungary, 299-306.
Yarowsky, D. (1994). Decision lists for lexical ambiguity resolution. In: Proc. 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 88-95.
Carter, D. (2000). Choosing between interpretations. In: Rayner, M., Carter, D., Bouillon, P., Digalakis, V., Wirén, M. (eds) The Spoken Language Translator, Cambridge University Press, Cambridge, MA, 78-97.
Dowding, J., Hieronymus, J. (2003). A spoken dialogue interface to a geologist's field assistant. In: Proc. HLT-NAACL 2003: Demo Session, Edmonton, Alberta, 9-10.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In: Proc. 10th Eur. Conf. on Machine Learning, Chemnitz, Germany, 137-142.
Joachims, T. (2006). http://svmlight.joachims.org/. As of 15 November 2006.
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C. (2002). Text classifica- tion using string kernels. J. Machine Learn. Res., 2, 419-444.
Shawe-Taylor, J., Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge.
Navia-Vázquez, A., Pérez-Cruz, F., Artés-Rodríguez, A., Figueiras-Vidal, A. R. (2004). Advantages of unbiased support vector classifiers for data mining applications. J. VLSI Signal Process. Syst., 37 (1-2), 1035-1062.
Bennett, P. (2003). Using asymmetric distributions to improve text classifier probability esti- mates. In: Proc. 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Toronto, Ontario, 111-118.
Zadrozny, B., Elkan, C. (2002). Transforming classifier scores into accurate multiclass prob- ability estimates. In: Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, 694-699.
Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., Stent, A. (2000). An archi- tecture for a generic dialogue shell. Natural Language Engineering, Special Issue on Best Practice in Spoken Language Dialogue Systems Engineering, 6, 1-16.
Larsson, S., Traum, D. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language Engineering, Special Issue on Best Practice in Spoken Language Dialogue Systems Engineering, 6, 323-340.
Stent, A., Dowding, J., Gawron, J., Bratt, E., Moore, R. (1999). The CommandTalk spo- ken dialogue system. In: Proc. 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, Maryland, VA, 183-190.
Haffner, P., Cortes, C., Mohri, M. (2003). Lattice kernels for spoken-dialog classification. In: Proc. 2003 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP '03), Hong Kong, 628-631.
Cortes, C., Haffner, P., Mohri, M. (2004). Rational kernels: Theory and algorithms. J. Machine Learn. Res., 5, 1035-1062.
Anderson, T., Pigeon, S., Swail, C., Geoffrois, E., Bruckner, C. (2004). Implications of multilingual interoperability of speech technology for military use. NATO Research and Technology Organization, Report RTO-TR-IST-011, AC/323(IST-011)TP/26.
Baber, C., Noyes, J. M. (1996). Automatic speech recognition in adverse environments. Hum. Factors, 38, 142-155.
Benincasa, D. S., Smith, S. E., Smith, M. J. (2004). Impacting the war on terrorism with language translation. In: Proc. IEEE Aerospace Conf., Big Sky, MT, USA, 3283-3288.
Bolia, R. S., Slyh, R. E. (2003). Perception of stress and speaking style for selected elements of the SUSAS database. Speech Commun., 40, 493-501.
Calhoun, G., Draper, M. H. (2006). Multi-sensory interfaces for remotely operated vehi- cles. In: Cooke, N. J., Pringle, H. L., Pedersen, H. K., Connor, O. (eds) Advances in Human Performance and Cognitive Engineering Research, vol. 7: Human Factors of Remotely Operated Vehicles, 149-163.
Canadian Broadcasting Corporation (CBC) PlNews (2006). Women in the mili- tary -International. In: CBC News Online, May 30, 2006. Available online http://www.cbc.ca/news/background/militry-international/
Carr, O. (2002). Interfacing COTS speech recognition and synthesis software to a Lotus notes military command and control database. Defence Science and Technology Organisation, Information Sciences Laboratory, Edinburgh, Australia. Research Report AR-012-484. Available online, May 2006: http://www.dsto.defence.gov.au/corporate/reports/DSTO-TR- 1358.pdf.
Chengguo, L., Jiqing, H., Wang, C. (2005). Stressful speech recognition method based on difference subspace integrated with dynamic time warping. Acta Acoust., 30 (3), 229-234.
Cresswell, Starr, A. F. (1993). Is control by voice the right answer for the avionics environ- ment? In: Baber, C., Noyes, J. M. (eds) Interactive Speech Technology: Human Factors Issues in the Application of Speech Input/Output to Computers. Taylor & Francis, London, 85-97.
Deng, L., Huang, X. (2004). Challenges in adopting speech recognition. Commun. ACM, 47 (1), 69-73.
Deng, L., O'Shaughnessy, D. (2003). Speech Processing -A Dynamic and Optimization- Oriented Approach. Marcel Dekker, NY.
Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D. (1998). SHEEP, GOATS, LAMBS and WOLVES: A statistical analysis of speaker performance in the NEST 1998 speaker recognition evaluation. In: Proc. IEEE Int. Conf. on Spoken Language Processing, ICSLP '98 Sydney, Australia, 608-611.
Draper, M., Calhoun, G., Ruff, H., Williamson, D., Barry, T. (2003). Manual versus speech input for unmanned aerial vehicle control station operations. In: Proc. 47th Annual Meeting of the Human Factors and Ergonomics Society, Denver, CO, USA, 109-113.
Francis, A. L., Nusbaum, H. C. (1999). Evaluating the quality of synthetic speech. In: Gardner-Bonneau, D. (ed) Human Factors and Voice Interactive Systems. Kluwer, Norwell, MA, 63-97.
Frederking, R. E., Black, A. W., Brown, R. D., Moody, J., Steinbrecher, E. (2002). Field testing the Tongues speech-to-speech machine translation system, 160-164. Available online, May 2006: http://www.cs.cmu.edu/˜awb/papers/lrec2002/tongues-eval.pdf.
Frigola, M., Fernandez, J., Aranda, J. (2003). Visual human machine interface by gestures. In: Proc. IEEE Int. Conf. on Robotics & Automation, Taipei, Taiwan, 386-391.
Fuegen, C., Rogina, I. (2000). Integrating dynamic speech modalities into context decision trees. In: Proc. IEEE Int. Conf. of Acoustic Speech Signal Processing, Istanbul, Turkey. ICASSP 2000, vol. 3, 1277-1280.
Goffin, V., Allauzen, C., Bocchieri, E., Hakkani-Tür, D., Ljolje, A., Parthasarathy, S., Rahim, M., Riccardi, G., Saraclar, M. (2005). The AT&T Watson speech recogniser. In: Proc. IEEE Int. Conf. on Spoken Language Processing, ICLSP 2005, Philadelphia, PA, I-1033-I-1036.
Haas, E., Shankle, R., Murray, H., Travers, D., Wheeler, T. (2000). Issues relating to automatic speech recognition and spatial auditory displays in high noise, stressful tank environments. In: Proc. IEA 2000/HFES 2000 Congress. Human Factors and Ergonomics Society, Santa Monica, CA, vol. 3, 754-757.
Halverson, C. A., Horn, D. B., Karat, C. M., Karat, J. (1999). The beauty of errors: Patterns of error correction in desktop speech systems. In: Sasse, M. A., Johnson, C. (eds) Proc. Human- Computer Interaction -INTERACT '99. IOS Press, Amsterdam.
Hu, C., Meng, M. Q., Liu, P. X., Wang, X. (2003). Visual gesture recognition for human- machine interface of robot teleoperation. In: Proc. 2003 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Las Vegas, NV, USA, 1560-1565.
Huang, S. D., Acero, A., Hon, H. (2001). Spoken Language Processing -A Guide to Theory, Algorithms, and System Development. Prentice Hall, NY.
Jokinen, K. (2006). Constructive dialogue management for speech-based interaction systems. In: Proc. Intelligent User Interfaces'06, Sydney, Australia. ACM Press, New York, NY.
Junqua, J. (2000). Robust Speech Recognition in Embedded Systems and PC Applications. Kluwer, Norwell, MA.
Kane, T. (2006). Who are the recruits? The demographics characteristics of U.S. Military enlistment, 2003-2005. The Heritage Foundation, Washington, DC.
Kirchoff, K., Vegyri, D. (2004). Cross-dialectal acoustic data sharing for Arabic speech recog- nition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2004, vol. 1, 765-768.
Kudo, I., Nakama, T., Watanabe, T., Kameyama, R. (1996). Data collection of Japanese dialects and its influence into speech recognition. In: Proc. 4th Int. Conf. on Spoken Language Processing (ICSLP), vol. 4, 2021-2024.
Lai, J., Wood, D., Considine, M. (2000). The effect of task conditions on the comprehensibility of synthetic speech. CHI Lett., 2, 321-328.
Leeks, C. (1986). Operation of a speech recogniser under whole body vibration (Technical Memorandum FDS(F) 634). RAE, Farnborough, UK.
Leggatt, A. P., Noyes, J. M. (2004). A holistic approach to the introduction of automatic speech recognition technology in ground combat vehicles. Mil. Psychol., 16, 81-97.
Lippmann, R. (1997). Speech recognition by machines and humans. Speech Commun., 22, 1-15.
Littlefield, J., Hashemi-Sakhtsari, A. (2002). The effects of background noise on the perfor- mance of an Automatic Speech Recogniser. Defence Science and Technology Organisation, Information Sciences Laboratory, Edinburgh, Australia. Research Report AR-012-500. Available online, May 2006: http://www.dsto.defence.gov.au/corporate/reports/DSTO-RR- 0248.pdf.
Marshall, S. L. (2005). Concept of operations (CONOPS) for foreign language and speech translation technologies in a coalition military environment. Unpublished Master's Thesis, Naval Postgraduate School, Monterey, CA.
McCarty, D. (2000). Building the business case for speech in call centers: Balancing customer experience and cost. In: Proc. SpeechTEK, New York, 15-26.
Minker, W., Bühler, D., Dybkjaer, L. (2005). Spoken Multimodal Human-Computer Dialogue in Mobile Environments. Springer, Dordrecht.
Mitsugami, I., Ukita, N., Kidode, M. (2005). Robot navigation by eye pointing. In: Proc. 4th Int. Conf. on Entertainment Computing (ICEC), Sanda, Japan, 256-267.
Moore, T. J., Bond, Z. S. (1987). Acoustic-phonetic changes in speech due to environmen- tal stressors: Implications for speech recognition in the cockpit. In: Proc. 4th Int. Symp. on Aviation Psychology, Aviation Psychology Laboratory, Columbus, OH.
Murray, I. R., Baber, C., South, A. (1996). Towards a definition and working model of stress and its effects on speech. Speech Commun., 20, 3-12.
Myers, B., Hudson, S. E., Pausch, R. (2000). Past, present, and future of user interface software tools. ACM Trans. Comput. Hum. Interact., 7, 3-28.
Neely, H. E., Belvin, R. S., Fox, J. R., Daily, J. M. (2004). Multimodal interaction techniques for situational awareness and command of robotic combat entities. In: Proc. IEEE Aerospace Conf., Big Sky, MT, USA, 3297-3305.
Newman, D. (2000). Speech interfaces that require less human memory. In: Basson, S. (ed) AVIOS Proc. Speech Technology & Applications Expo, San Jose, CA, 65-69.
North, R. A., Bergeron, H. (1984). Systems concept for speech technology applica- tion in general aviation. In: Proc. 6th Digital Avionics Systems Conf. (A85-17801 06- 01). American Institute of Aeronautics and Astronautics, New York, AIAA-84-2639, 184-189.
North Atlantic Treaty Organisation (NATO) Committee for Women in the NATO Forces (2006). Personnel comparison in deployments 2006. Available online, December 2006: http://www.nato.int/issues/women_nato/index.html
Noyes, J. M., Hellier, E., Edworthy, J. (2006). Speech warnings: A review. Theor. Issues Ergonomics Sci., 7 (6), 551-571.
Oberteuffer, J. (1994). Commercial applications of speech interface technology: An industry at the threshold. In: Roe, R., Wilpon, J. (eds) Voice Communication Between Humans and Machines. National Academy Press, Washington DC, 347-356.
Oviatt, S. L. (2000). Multimodal system processing in mobile environments. CHI Lett., 2 (2), 21-30.
Paper, D. J., Rodger, J. A., Simon, S. J. (2004). Voice says it all in the Navy. Commun. ACM, 47, 97-101.
Pearce, D., Hirsch, H. G. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. 6th Int. Conf. on Spoken Language Processing, ICSLP 2000, Beijing, China.
Pellom, B., Hacioglu, K. (2003). Recent improvements in the CU sonic ASR system for noisy speech: The SPINE task. In: IEEE Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Hong Kong, China, ICASSP 2003, I-4-I-7.
Perzanowski, D., Brock, D., Blisard, S., Adams, W., Bugajska, M., Schultz, A. (2003). Finding the FOO: A pilot study for a multimodal interface. In: Proc. 2003 IEEE Conf. on Systems, Man and Cybernetics, vol. 4, 3218-3223.
Picone, J. (1990). The demographics of speaker independent digit recognition. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 1990, vol. 1, 105-108.
Ralston, J. V., Pisoni, D. B., Lively, S. E., Greene, B. G., Mullennix, J. W. (1991). Comprehension of synthetic speech produced by rule: Word monitoring and sentence-by- sentence listening times. Hum. Factors, 33, 471-491.
Rodger, J. A., Pendharkar, P. C., Paper, D. C., Trank, T. V. (2001). Military applications of nat- ural language processing and software. In: Proc. 7th Americas Conf. on Information Systems, Boston, MA, USA, 1188-1193.
Rodger, J. A., Pendharkar, P. C. (2004). A field study of the impact of gender and user's technical experience on the performance of voice-activated medical tracking application. Int. J. Hum. Comput. Studies, 60 (5-6), 529-544.
Rodger, J. A., Trank, T. V., Pendharkar, P. C. (2002). Military applications of natural language processing and software. Ann. Cases Inf. Technol., 5, 12-28.
Sawhney, N., Schmandt, C. (2000). Nomadic Radio: Speech and audio interaction for con- textual messaging in nomadic environments. ACM Trans. Comput. Hum. Interact., 7 (3), 353-383.
Shneiderman, B. (2000). The limits of speech recognition. Commun. ACM, 43, 63-65.
Singh, R., Seltzer, M. L., Raj, B., Stern, R. M. (2001). Speech in noisy environments: Robust automatic segmentation, feature extraction, and hypothesis combination. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP 2001, Salt Lake City, UT, vol. 1, 273-276.
Strand, O. M., Holter, T., Egeberg, A., Stensby, S. (2003). On the feasibility of ASR in extreme noise using the PARAT earplug communication terminal. IEEE Workshop on Automatic Speech Recognition and Understanding, St. Thomas, Virgin Islands, 315-320.
Tashakkori, R., Bowers, C. (2003). Similarity analysis of voice signals using wavelets with dynamic time warping. Proc. SPIE, 5102, 168-177.
Viswanathan, M., Viswanathan, M. (2005). Measuring speech quality for text-to-speech sys- tems: Development and assessment of a modified mean opinion score (MOS) scale. Comput. Speech Lang., 19, 55-83.
Wagner, M. (1997). Speaker characteristics in speech and speaker recognition. In: Proc. 1997 IEEE TENCON Conf., Brisbane, Australia, part 2, 626.
Weimer, C., Ganapathy, S. K. (1989). A synthetic visual environment with hand gesturing and voice input. In: HCI International 89: 3rd International Conference on Human-Computer Interaction September 18-22, 1989, Boston, MA, USA.
Weinstein, C. J. (1995). Military and government applications of human-machine commu- nication by voice. Proc. Natl Acad. Sci. USA, 92, 10011-10016. (Reprint of Military and government applications of human-machine communications by voice. In: Roe, R., Wilpon, J. (eds) Voice communication between humans and machines. National Academy Press, Washington DC, 357-370).
White, R. W., Parks, D. L., Smith, W. D. (1984). Potential flight applications for voice recognition and synthesis systems. In: Proc. 6th AIAA/IEEE Digital Avionics System Conf., 84-2661-CP.
Williamson, D. T., Draper, M. H., Calhoun, G. L., Barry, T. P. (2005). Commercial speech recognition technology in the military domain: Results of two recent research efforts. Int. J. Speech Technol., 8, 9-16.
Wilpon, J. G., Jacobsen, C. N. (1996). A study of speech recognition for children and the elderly. In: Proc. IEEE Conf. Acoustics, Speech and Signal Processing (ICASSP), Atlanta, GA, USA, vol. 1, 349-352.
Yoshizaki, M., Kuno, Y., Nakamura, A. (2002). Human-robot interface based on the mutual assistance between speech and vision. In: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Swiss Federal Institute of Technology, Lausanne, Switzerland, 1308-1313.
Zhou, G., Hansen, J. H. L., Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process., 9 (3), 201-216.
Zue, V. (2004). Eighty challenges facing speech input/output technologies. In: Proc. from Sound to Sense: 50+ Years of Discovery in Speech Communication, MIT, Boston, MA, USA, B179-B195.
Raman, T. V. (1997). Auditory User Interfaces. kluwer, Dordrecht.
Raman, T. V. (1998). Conversational gestures for direct manipulation on the audio desktop., In: Proc. 3rd Int. ACM SIGACCESS Conf. on Assistive Technologies: Marina del Rey, CA, 51-58.
Potamianos, A., Narayanan, S. S. (2003). Robust recognition of children's speech. In: IEEE Transactions on Speech and Audio Processing 11(6), 603-616.
Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust., Speech Signal Process.
Junqua, J. C., Haton, J. P. (1996). Robutness in Automatic Speech Recognition - Fundamentals and Applications. Kluwer Academic Publishers, Dordrecht.
Gales, M., Young, S. (1996). Robust Continuous Speech Recognition using Parallel Model Combination. In: IEEE Trans Speech and Audio Processing 4(5), 352-359.
Moura, A., Pêra, V., Freitas, D. (2006). An automatic speech recognition system for persons with disability. In: Proc. Conf. IBERDISCAP'06: Vitória-ES, Brasil, 20-22.
Roe, P. (ed) (2007). Towards an Inclusive Future -Impact and Wider Potential of Information and Communication Technologies. COST, European Commission.
Neti, C., Potamianos, G., Luettin, J., Mattheus, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J. (2000). Audio-visual Speech Recognition. Final Workshop 2000 report. Baltimore MD Centre for Language and Speech Processing, The Johns Hopkins University.
Turunen, M., Hakulinen, J., Räihä, K-J., Salonen, E-P., Kainulainen, A., Prusi, P. (2005). An architecture and applications for speech-based accessibility systems, In: IBM Systems Journal, 44(3), 485-504.
Jokinen, K., Kerminen, A., Kaipainen, M., Jauhiainen, T., Wilcock, G., Turunen, M., Hakulinen, J., Kuusisto, J., Lagus, K. (2002). Adaptive dialogue systems -interaction with Interact. In: Jokinen, K., McRoy, S. (eds) Proc. 3rd SIGdial Workshop on Discourse and Dialogue, Philadelphia, 64-73.
Ferreira, H. (2005). Audiomath 2005, developed as part of the Graduation Thesis and MSc thesis, LSS, FEUP.
Freitas, D., Ferreira, H., Carvalho, V., Fernandes, D., Pedrosa, F. (2003). A prototype appli- cation for teaching numbers. Proc. 10th Int. Conf. on Human-Computer, HCII-2003, Crete, Greece.
Freitas, D., Ferreira, H., Fernandes, D. (2003). A. Q. N., A Quinta dos Números, Um projecto em desenvolvimento. Proc. "8o Baú da Matemática": Ermesinde, Portuguese.
Roe, P. (ed) (2001). Bridging the Gap? COST219bis, European Commission. References
Hirschman L., Thompson, H. (1997). Overview of evaluation in speech and natural language processing. In: Survey of the State of the Art in Human Language Technology, Cambridge University Press and Giardini Editori, Pisa, 409-414.
Mariani, J. (2002). The Aupelf-Uref evaluation-based language engineering actions and related projects. In: Proc. 1st Int. Conf. on Language Resources and Evaluation (LREC'98), Granada, 123-128.
Steeneken, H., van Leeuwen, D. (1995). Multi-lingual assessment of speaker independent large vocabulary speech-recognition systems: The SQALE-project. In: Proc. 4th Eur. Conf. on Speech Communication and Technology (EUROSPEECH'95), Madrid, 1271-1274.
Young, S., Adda-Decker, M., Aubert, X., Dugast, C., Gauvain, J., Kershaw, D., Lamel, L., Leeuwen, D., Pye, D., Robinson, A., Steeneken, H., Woodland, P. (1997). Multilingual large vocabulary speech recognition: The European SQALE project. Comput. Speech Lang, 11(1), 73-89.
Jacquemin, C., Mariani, J., Paroubek, P. (eds) (2005). Parameters describing the interaction with spoken dialogue systems using evaluation within HLT programs: Results and trends. In: Proc. CLASS Pre-Conf. Workshop to LREC 2000, Geneva, Athens.
Ernsen, N., Dybkjaer, L. (1997). The DISC concerted action. In: Proc. Speech and Language Technology (SALT) Club Workshop on Evaluation in Speech and Language Technology, Sheffield, 35-42.
Gibbon, D., Moore, R., Winski, R. (eds) (1997). Handbook on Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin.
Fraser, N. (1997). Assessment of Interactive Systems. Handbook on Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin, 564-615.
Leeuwen, D., van Steeneken, H. (1997). Assessment of Recognition Systems. Handbook on Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin, 381-407.
Bimbot, F., Chollet, G. (1997). Assessment of Speaker Verification Systems. Handbook on Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin, 408-480.
van Bezooijen, R., van Heuven, V. (1997). Assessment of Synthesis Systems. Handbook on Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin, 481-563.
Gibbon, D., Mertins, I., Moore, R. (2000). Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation. Kluwer, Boston, MA.
ITU-T Recommendation P. 85 (1994). A Method for Subjective Performance Assessment of the Quality of Speech Voice Output Devices. International Telecommunication Union, Geneva.
ITU-T Recommendation P. 851 (2003). Subjective Quality Evaluation of Telephone Services Based on Spoken Dialogue Systems. International Telecommunication Union, Geneva.
ITU-T Supplement 24 to P-Series Recommendations (2005). Parameters Describing the Interaction With Spoken Dialogue Systems. International Telecommunication Union, Geneva.
Jekosch, U. (2000). Sprache hören und beurteilen: Ein Ansatz zur Grundlegung der Sprachquälitatsbeurteilung. Habilitation thesis (unpublished), Universität/Gesamthochschule Essen.
Jeksoch, U. (2005). Voice and Speech Quality Perception. Assessment and Evaluation. Springer, Berlin.
ISO 9241-11 (1998). Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs). Part 11: Guidance on Usability. International Organization for Standardization, Geneva.
Möller, S. (2005). Quality of Telephone-based Spoken Dialogue Systems. Springer, New York, NY.
Möller, S. (2002). A new taxonomy for the quality of telephone services based on spoken dialogue systems. In: Proc. 3rd SIGdial Workshop on Discourse and Dialogue. Philadelphia, PA, 142-153.
Pallett, D., Fourcin, A. (1997). Speech input: Assessment and evaluation. In: Survey of the State of the Art in Human Language Technology, Cambridge University Press and Giardini Editori, Pisa, 425-429.
Pallett, D., Fiscus, J., Fisher, W., Garofolo, J. (1993). Benchmark tests for the DARPA spoken language program. In: Proc. DARPA Human Language Technology Workshop, Princeton, NJ, 7-18.
Young, S. (1997). Speech recognition evaluation: A review of the ARPA CSR programme. In: Proc. Speech and Language Technology (SALT) Club Workshop on Evaluation in Speech and Language Technology, Sheffield, 197-205.
Pallett, D. (1998). The NIST role in automatic speech recognition benchmark tests. In: Proc. 1st Int. Conf. on Language Resources and Evaluation (LREC'98), Granada, 327-330.
Picone, J., Goudie-Marshall, K., Doddington, G., Fisher, W. (1986). Automatic text align- ment for speech system evaluation. IEEE Trans. Acoust., Speech, Signal Process. 34(4), 780-784.
Picone, J., Doddington, G., Pallett, D. (1990). Phone-mediated word alignment for speech recognition evaluation. IEEE Trans. Acoust., Speech, Signal Process. 38(3), 559-562.
Strik, H., Cucchiarini, C., Kessens, J. (2000). Comparing the recognition performance of CSRs: In search of an adequate metric and statistical significance test. In: Proc. 6th Int. Conf. on Spoken Language Processing (ICSLP2000), Beijing, 740-743.
Strik, H., Cucchiarini, C., Kessens, J. (2001). Comparing the performance of two CSRs: How to determine the significance level of the differences. In: Proc. 7th Eur. Conf. on Speech Communication and Technology (EUROSPEECH 2001 -Scandinavia), Aalborg, 2091-2094.
Price, P. (1990). Evaluation of spoken language systems: The ATIS domain. In: Proc. DARPA Speech and Natural Language Workshop, Hidden Valley, PA, 91-95.
Glass, J., Polifroni, J., Seneff, S., Zue, V. (2000). Data collection and performance evaluation of spoken dialogue systems: The MIT experience. In: Proc. 6th Int. Conf. on Spoken Language Processing (ICSLP 2000), Beijing, 1-4.
Grice, H. (1975). Logic and Conversation. Syntax and Semantics. Academic, New York, NY, 41-58.
Bernsen, N., Dybkjaer, H., Dybkjaer, L. (1998). Designing Interactive Speech Systems: From First Ideas to User Testing. Springer, Berlin.
Francis, A., Nusbaum, H. (1999). Evaluating the Quality of Synthetic Speech. Human Factors and Voice Interactive Systems. Kluwer, Boston, MA, 63-97.
Sityaev, D., Knill, K., Burrows, T. (2006). Comparison of the ITU-T P.85 standard to other methods for the evaluation of Text-to-Speech systems. In: Proc. 9th Int. Conf. on Spoken Language Processing (Interspeech 2006 -ICSLP), Pittsburgh, PA, 1077-1080.
Viswanathan, M. (2005). Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (MOS) scale. Comput. Speech Lang. 19(1), 55-83.
Oulasvirta, A., Möller, S., Engelbrecht, K., Jameson, A. (2006). The relationship of user errors to perceived usability of a spoken dialogue system. In: Proc. 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Berlin, 61-67.
ISO 9241-110 (2006). Ergonomics of human-system interaction. Part 110: Dialogue princi- ples. International Organization for Standardization, Geneva.
Constantinides, P., Rudnicky, A. (1999). Dialog analysis in the Carnegie Mellon Communicator. In: Proc. 6th Eur. Conf. on Speech Communication and Technology (EUROSPEECH'99), Budapest, 243-246.
Billi, R., Castagneri, G., Danieli, M. (1996). Field trial evaluations of two different infor- mation inquiry systems. In: Proc. 3rd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications (IVTTA'96), Basking Ridge, NJ, 129-134.
Boros, M., Eckert, W., Gallwitz, F., Gorz, G., Hanrieder, G., Niemann, H. (1996). Towards understanding spontaneous speech: Word accuracy vs. concept accuracy. In: Proc. 4th Int. Conf. on Spoken Language Processing (ICSLP'96) IEEE, Piscataway, NJ, 1009-1012.
Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistics. Comput. Linguist. 22(2), 249-254.
Cookson, S. (1988). Final evaluation of VODIS -Voice Operated Database Inquiry System. In: Proc. SPEECH'88, 7th FASE Symposium, Edinburgh, 1311-1320.
Danieli, M., Gerbino, E. (1995). Metrics for evaluating dialogue strategies in a spo- ken language system. Empirical Methods in Discourse Interpretation and Generation. Papers from the 1995 AAAI Symposium, Stanford, CA. AAAI Press, Menlo Park, CA, 34-39.
Gerbino, E., Baggia, P., Ciaramella, A., Rullent, C. (1993). Test and evaluation of a spo- ken dialogue system. In: Proc. Int. Conf. on Acoustics Speech and Signal Processing (ICASSP'93), IEEE, Piscataway, NJ, 135-138.
Goodine, D., Hirschman, L., Polifroni, J., Seneff, S., Zue, V. (1992). Evaluating interac- tive spoken language systems. In: Proc. 2nd Int. Conf. on Spoken Language Processing (ICSLP'92), Banff, 201-204.
Hirschman, L., Pao, C. (1993). The cost of errors in a spoken language system. In: Proc. 3rd Eur. Conf. on Speech Communication and Technology (EUROSPEECH'93), Berlin, 1419-1422.
Kamm, C., Litman, D., Walker, M. (1998). From novice to expert: The effect of tutorials on user expertise with spoken dialogue systems. In: Proc. 5th Int. Conf. on Spoken Language Processing (ICSLP'98), Sydney, 1211-1214.
Polifroni, J., Hirschman, L., Seneff, S., Zue, V. (1992). Experiments in evaluating interac- tive spoken language systems. In: Proc. DARPA Speech and Natural Language Workshop, Harriman, CA, 28-33.
Price, P., Hirschman, L., Shriberg, E., Wade, E. (1992). Subject-based evaluation measures for interactive spoken language systems. In: Proc. DARPA Speech and Natural Language Workshop, Harriman, CA, 34-39.
San-Segundo, R., Montero, J., Colás, J., Gutiérrez, J., Ramos, J., Pardo, J. (2001). Methodology for dialogue design in telephone-based spoken dialogue systems: A Spanish train information system. In: Proc. 7th Eur. Conf. on Speech Communication and Technology (EUROSPEECH 2001-Scandinavia), Aalborg, 2165-2168.
Simpson, A., Fraser, N. (1993). Black box and glass box evaluation of the SUNDIAL system. In: Proc. 3rd Eur. Conf. on Speech Communication and Technology (EUROSPEECH'93), Berlin, 1423-1426.
Skowronek, J. (2002). Entwicklung von Modellierungsansätzen zur Vorhersage der Dienstequalität bei der Interaktion mit einem natürlichsprachlichen Dialogsystem. Diploma thesis (unpublished), Institut für Kommunikationsakustik, Ruhr-Universität Bochum.
Walker, M., Litman, D., Kamm, C., Abella, A. (1997). PARADISE: A framework for evaluat- ing spoken dialogue agents. In: Proc. of the ACL/EACL 35th Ann. Meeting of the Assoc. for Computational Linguistics, Madrid, 271-280.
Walker, M., Litman, D., Kamm, C., Abella, A. (1998). Evaluating spoken dialogue agents with PARADISE: Two case studies. Comput. Speech Lang. 12(4), 317-347.
Zue, V., Seneff, S., Glass, J., Polifroni, J., Pao, C., Hazen, T., Hetherington, L. (2000). JUPITER: A telephone-based conversational interface for weather information. IEEE Trans. Speech Audio Process. 8(1), 85-96.
Hone, K., Graham, R. (2000). Towards a tool for the subjective assessment of speech system interfaces (SASSI). Nat. Lang. Eng. 6(3-4), 287-303.
Hone, K. S., Graham, R. (2001). Subjective assessment of speech-system interface usability. In: Proc. 7th Eur. Conf. on Speech Communication and Technology (EUROSPEECH 2001- Scandinavia), Aalborg, 2083-2086.
Möller, S., Smeele, P., Boland, H., Krebber, J. (2007). Evaluating spoken dialogue systems according to de-facto standards: A case study. Comput. Speech Lang. 21(1), 26-53.
Möller, S. Smeele, P., Boland, H., Krebber, J. (2006). Messung und Vorhersage der Effizienz bei der Interaktion mit Sprachdialogdiensten. In: Fortschritte der Akustik -DAGA 2006: Plenarvortr., Braunschweig, 463-464.
Walker, M., Kamm, C., Litman, D. (2000). Towards developing general models of usability with PARADISE. Nat. Lang. Eng. 6(3-4), 363-377.
Walker, M., Kamm, C., Litman, D. (2005). Towards generic quality prediction models for spoken dialogue systems -A case study. In: Proc. 9th Eur. Conf. on Speech Communication and Technology (Interspeech 2005), Lisboa, 2489-2492.
Dybkjaer, L., Bernsen, N. O., Minker, W. (2004). Evaluation and usability of multimodal spoken language dialogue systems. Speech Commun. 43(1-2), 33-54.
Beringer, N., Louka, K., Penide-Lopez, V., Türk, U. (2002). End-to-end evaluation of mul- timodal dialogue systems: Can we transfer established methods? In: Proc. 3rd Int. Conf. on Language Resources and Evaluation (LREC 2002), Las Palmas, 558-563.
Bernsen, N., Dybkjaer, L., Kiilerich, S. (2004). Evaluating conversation with Hans Christian Andersen. In: Proc. 4th Int. Conf. on Language Resources and Evaluation (LREC 2004), Lisbon, 1011-1014.
Araki, M., Doshita, S. (1997). Automatic evaluation environment for spoken dialogue sys- tems. In: Dialogue Processing in Spoken Language Systems. Proc. ECAI'96 Workshop, Budapest. Springer, Berlin, 183-194.
López-Cozar, R., de la Torre, A., Segura, J., Rubio, A. (2003). Assessment of dialogue systems by means of a new simulation technique. Speech Commun. 40(3), 387-407.
Walker, M. (1994). Experimentally evaluating communicative strategies: The effect of the task. In: Proc. Conf. Am. Assoc. Artificial Intelligence (AAAI'94), Assoc. for Computing Machinery (ACM), New York, NY, 86-93.
Walker, M. (1992). Risk Taking and Recovery in Task-Oriented Dialogue. PhD thesis, University of Edinburgh.
Möller, S., Englert, R., Engelbrecht, K., Hafner, V., Jameson, A., Oulasvirta, A., Raake, A., Reithinger, N. (2006). MeMo: Towards automatic usability evaluation of spoken dialogue services by user error simulations. In: Proc. 9th Int. Conf. on Spoken Language Processing (Interspeech 2006 -ICSLP), Pittsburgh, PA, 1786-1789.
Möller, S., Heimansberg, J. Estimation of TTS quality in telephone environments using a reference-free quality prediction model. In: Proc. 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Berlin, 56-60.
Compagnoni, B. (2006). Development of Prediction Models for the Quality of Spoken Dialogue Systems. Diploma thesis (unpublished), IfN, TU Braunschweig. Index Bayes' theorem, bayesian, 28 BDI (Belief-Desire-Intention) agent, 36, 38, 49, 96
Behaviour, 35, 41, 43, 45, 47-49, 90-100, 123-124, 129-131, 133-137, 139-144, 152, 156-157, 162, 233, 239, 263, 287, 293, 296-297
Behaviour expressivity, 135
Beliefs, 34, 36-37, 44-45, 49, 96-97, 127, 130, 212 Bilingual text, 170-171, 186-188
Blizzard Challenge, 25-26, 28-29 Body posture, 41, 48, 130-131 Brazen head, 21 Broadcast news (BN), 6
G Gaussian mixture model (GMM), 29, 156
Gaze, 41, 81, 130-133, 136-137, 139, 142-143, 156, 254-255, 265, 281-283, 296
Gender, 172, 209, 214, 252
Generalized probabilistic descent (GPD), 6, 8 General public, 253
G-forces, 256, 264-265
Globalisation, 267 GMM-based voice conversion, 30
Grammar, 5, 13, 35, 43-45, 63-69, 71-72, 118, 169-170, 173-178, 182, 185, 224, 228-232, 244, 247-248, 253, 310
Grammar-based language model (GLM), 169, 224, 228, 233, 237, 247-249
Grammar specification language (GSL), 228
Graphical user interface (GUI), 34, 50, 182, 185, 273-274
Grounding, 38, 47, 91, 103, 136-137, 142
Headset, 182, 188, 201, 263, 283
Helicopters, 256
Hidden Markov Model (HMM), 4, 13, 28, 155-156, 169, 201, 318 Hierarchical control, 93
Higher-order logic, 174 HMM-based speech synthesis, 29
Hosting, 69-70 How may I help you (HMIHY), 40, 64
Human-computer interaction (HCI), 33-35, 50, 100, 142, 151-157, 209
Human factors, 181-182, 198, 251-267, 313
Human-machine interface, 1, 34, 273
Human-machine speech communication, 262
Human speech, 1, 6, 10-11, 15, 90, 106, 117-119, 171, 251, 256
Hybrid architecture, 173, 185
I Ideal Cooperation, see Cooperation, cooperativity Imitation, 94-96, 98
Information State Update (ISU), 204
Infotainment, 195, 197, 199, 206
Integration, 74, 99, 105, 154, 159, 161, 168, 185, 197, 233, 244, 264-267, 291, 294, 296 Intelligent interactive systems, 52
Intelligibility, 7, 20, 29, 254, 264, 303-304, 308 Intention, 36, 43-45, 93, 98, 117, 259
Interaction, 13-14, 33-53, 63, 66-67, 69, 71-74, 89-92, 97-100, 105-108, 116-117, 123, 125, 128, 130, 136, 142-143, 151-157, 159-162, 182, 184, 195-199, 202, 204-205, 207-210, 212, 215, 225, 272-274, 287-288, 291, 294, 305, 308-318 parameter, 309, 311, 313-318 problem, 309-310, 314
Interactive system, 33-35, 50, 52, 123, 181, 251, 257, 259, 264, 267, 273, 301-318 Interface design, 50, 312 speech-enabled command, 33 tactile, 34
Interlingua, 170, 178-179, 189
International Standardisation Organisation (ISO), 40, 310
Internet, 30, 35, 66, 70, 74, 162, 189, 197, 267, 284, 286, 292
Machine translation, statistical (SMT), 170-171, 182-189
Machine translation, transfer-based, 170
Maximum Likelihood Linear Regression (MLLR), 6, 12
Maximum Mutual Information (MMI), 6, 8 Meaning representation, 170
Measure, measurement, 82, 155, 200, 301-303, 311, 316 Medical applications, 189 Medical domain, 179, 189-190
Memory, 19, 27, 46, 83-84, 91-92, 94-96, 198, 253-254, 264, 274-275, 287, 292-293, 296, 313, 317
Meta-communication, 307-308, 311 Microphone arrays, 202, 205, 263
Microphone-based technologies, 263
Microphones, close-talking noise can- celling, 263
Microtremors, 84
Military applications, 251-267 Military domain, 189-190, 251-257, 260, 262, 266-267 Military personnel, 252, 255, 262, 265 Military police, 251
Mimics, 123
Minimum Classification Error (MCE), 6, 8 Mirror neurons, 92, 94-96
Misrecognition, 176, 202, 223, 264
Mixed initiative, 38, 49, 63-65, 67, 73, 204
Mobility, 259, 261, 264-265, 271, 273, 280-282, 290, 294
Mock-theft paradigm, 80
Model/modelling excitation, 28 harmonic + noise, 28 probabilistic, 13, 39 statistical, 1, 4, 8, 13, 29, 61, 89, 169
Mood, 110, 114, 157, 162, 208-210
Moore's law, 189 Multimodal information, 151-162 Multimodal interaction, 34, 37, 41, 51, 204, 207-208, 265
Multimodality, 205, 207-208, 264-265 Multimodal system, 36, 151-152, 156-157, 207-208, 315
Multiparty dialogue, 41
Multiparty interaction, 41
Index Multi-Role Fighter aircraft, 258 MUMIN annotation scheme, 48
N Native speakers, 85, 168, 172, 182, 263, 265 Natural communication, 34, 41 Natural disasters, 70, 259 Natural interaction, 33, 42, 50-53 Natural language, 6, 33, 35-37, 40, 42, 50, 62-65, 73-75, 91, 100, 157, 170, 198, 204, 253, 272, 302, 304, 306-308 Natural language understanding, 62, 64-65, 198, 204, 302, 304, 306-307, 314
Naturalness, 20, 41, 64, 153, 182, 202, 254, 272, 303, 305, 308
Navy personnel, 262 Neural net, 5, 43, 46, 92, 155-156, 169, 186, 318
N-gram, 5, 8, 13, 46, 177, 186, 231-233, 236, 246-248
Noise, 6, 10-12, 23, 28, 135, 155, 161, 182, 195-196, 200-202, 205, 243-244, 256-257, 261, 263-265, 267, 283, 284, 305, 310, 316
Non-native speakers, 263, 265
Non-speech warnings, 258
Non-Verbal Behaviour (NVB) coding, 81, 130, 143
O Objectivity, 20, 302
Open Agent Architecture (OAA), 38, 223
Open Source, 41, 177, 228-229
P Parallel corpus, 171, 182, 186-187 Parallel model composition (PMC), 6
Parameter, 9, 11-12, 99, 124, 135, 153, 176, 235-237, 248-249, 288, 308, 314
Paraphrase, 111, 179, 184, 186, 188, 225 Parse preferences, 175
Parser, 13, 173, 175, 186-187, 228
Parser, robust, 187 Parse tree, 170 Passive noise reduction, 263
Perception, 15, 51, 80-81, 86, 91-93, 97, 137, 141-142, 151-152, 156, 160, 162, 199, 212, 273, 302, 304, 312-313
Perceptive behaviour, 136-137 Perceptual control system, single-layered, 93
Personality, 127, 129, 140, 142, 208-209, 211, 213-214, 305
Personification, 123 Phonetic analysis, 252-253 Phrase alignment, 171
Platform, 21, 26, 41, 65-66, 68-71, 73, 172, 174, 177, 181-183, 188-189, 206, 224, 228, 235, 292
Portability, 40, 172, 204 Portable devices, 197
PREdictive SENsorimotor Control and Emulation (PRESENCE), 98-99
Preference, 141
Probabilistic training, 177, 230
Procedures, navigation of, 221, 225
Procedures, rendering in voice, 244 Professional services, 69
Prolog, 224-225, 240
Prototype, 75, 176-177, 184, 204-205, 208, 222, 245, 260, 266-267, 294
Pulse Code Modulation (PCM), 23, 28
Push-to-talk, 195, 202, 206, 254
W W3C, 40, 68, 276-277, 288, 292 White lies, 80 Wireless technology, 261 Wizard-of-Oz experiments, 182
Word Error Rate (WER), 169, 177, 233, 247, 252, 303, 306
Workload, 198-201, 203, 212, 257, 265, 267

Speech Technology

Sign up for access to the world's latest research

Abstract

Related papers

References (1,014)

Related papers