ROBOT AUDITION: ITS RISE AND PERSPECTIVES
2015, Proceedings of 2015 International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015)
https://doi.org/10.1109/ICASSP.2015.7179045Abstract
The ability of robots to listen to several things at once with their own “ears”, that is, robot audition, is an important factor in improving interaction and symbiosis between humans and robots. The critical issue in robot audition is real-time processing and robustness against noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper first overviews activities and issues related to robot audition. Then, it presents the “HARK” robot audition software, which provides three primary functions for robot audition, sound source localization, sound source separation, and separated sound recognition, and then reports their performance. Finally, it discusses future directions in new promising areas as well as robotics.
References (44)
- REFERENCES
- Asano F, et al. (1999) Sound source localization and signal separation for office robot "Jijo-2". in IEEE Intern'l Conf. on Multisensor Fusion and Integration for Intelligent Systems, 243-248.
- Bando Y, et al. (2013) Posture Estimation of Horse-Shaped Robot using Microphone Array Localization. in IEEE/RSJ IROS-2013, 3446-3451.
- Bando Y, et al. (2015) Recognition of In-field Frog Chorusing using Bayesian Nonparametric Microphone Array Processing. Tech. Report of AAAI-2015 Workshop on Computational Sus- tainability.
- Bando Y, et al. (2015) Baysian Nonparametic Simultneous Localization and Separation of Unknown Time-Varying Num- ber of Sources with Big Volume Difference. in IEEE ICASSP- 2015, in print.
- Barker J, et al. (2001) Robust ASR Based on Clean Speech Models: An Evaluation of Missing Data Techniques for Con- nected Digit Recognition in Noise. in EuroSpeech-2001, 213- 216.
- Barroso V, and Moura JMF (1991) Maximum likelihood beamforming in the presence of outliers. in ICASSP-1991, Vol.2, 1409-1312.
- Bonnal J, et al. (2010) The EAR Project. J. of RSJ, special issue on "robot audition", 28(1):10-13.
- Breazeal C, and Scassellati B (1999) A context-dependent attention system for a social robot. in IJCAI-1999, 1146-1151.
- Breazeal C (2001) Emotive Qualities in Robot Speech. in IEEE/RSJ IROS-2001, 1389-1394.
- Brooks RA, et al. (1998) Alternative essences of intelligence. in AAAI-1998, 961-968.
- Frost OL (1972) An algorithm for linearly constrained adap- tive array processing. Proc. of IEEE, 60(8):926-935.
- Griffth LJ and Jim CW (1982) An Alternative Approach to Linearly Constrained Adaptive Beamforming. IEEE TAP, 30(1):27-34.
- Hara I, et al. (2004) Robust speech interface based on au- dio and video information fusion for humanoid HRP-2. in IEEE/RSJ IROS-2004, 2404-2410.
- Ince G and Nakadai K (2011) Assessment of single-channel ego noise estimation methods. in IEEE/RSJ IROS-2011, 106- 111.
- Kim H-D, et al. (2009) Human Tracking System Integrating Sound and Face Localization using EM Algorithm in Real En- vironments. Advanced Robotics, 23(6):629-653.
- Knaak M, et al. (2007) Geometrically Constrained Indepen- dent Component Analysis. IEEE TSAP, 15(2):715-726.
- Lim A and Okuno HG (2014) The MEI Robot: Towards Us- ing Motherese to Develop Multimodal Emotional Intelligence. IEEE TAMD, 6(2):126-138.
- Martinson E and Brock D (2007) Auditory Perspective Tak- ing, IEEE Tr. Cybernetics, 43(3):957-969.
- Matsusaka Y, et al. (1999) Multi-person conversation via multi-modal interface -a robot who communicates with multi-user. in EUROSPEECH-99, 1723-1726.
- Michaud F, et al. (2007) Spartacus attending the 2005 AAAI Conference, Autonomous Robots, 22(4):369-383, 2007.
- Monzingo RA and Miller TW (1980) Introduction to Adaptive Arrays. SciTech Pub., 543p.
- Nakadai K, Okuno HG (2000) Active audition for humanoid. in AAAI-2000, 832-839.
- Nakadai K, et al. (2004) Improvement of recognition of si- multaneous speech signals using AV integration and scattering theory for humanoid robots. Speech Comm., 44(4):97-112.
- Nakadai K, et al. (2010) Design and Implementation of Robot Audition SYstem "HARK" -Open Source Software for Lis- tening to Three Simultaneous Speakers. Advanced Robotics, 24(5-6):739-761.
- Nakajima H, et al. (2010) Sound Source Separation and Auto- matic Speech Recognition. in IEEE/RSJ IROS-2010, 976-981.
- Nakajima H, et al. (2010) Blind Source Separation With Parameter-Free Adaptive Step-Size Method for Robot Audi- tion. IEEE TASLP, 18(6): 1476-1485.
- Nakamura K, et al. (2009) Intelligent sound source localiza- tion for dynamic environments. in IEEE/RSJ IROS-2009, 664- 669.
- Nishimura R, et al. (2004) Public Speech-Oriented Guidance System with Adult and Child Discrimination Capability. in IEEE ICASSP-2004, Vol.I, 433-436.
- Ohata T, et al. (2014) Improvement in Outdoor Sound Source Detection Using a Quadrotor-Embedded Microphone Array. in IEEE/RSJ IROS-2014, 1902-1907.
- Otsuka T, et al. (2014) Bayesian Nonparametrics for Micro- phone Array Processing. IEEE/ACM TASLP, 22(2):493-504.
- Parra LC and Alvino CV (2002) Geometric source separation: Margin convolutive source separation with geometric beam- forming. IEEE TSAP, 10(6):352-362.
- Raj H and Sterm RM (2005) Missing-feature approaches in speech recognition. IEEE Signal Proc. Mag., 22(5):101-116.
- Rosenthal D and Okuno HG (1998) Computational Auditory Scene Analysis, CRC Press, Harshey, NJ.
- Sasaki Y, et al. (2013) Nested iGMM recognition and multiple hypothesis tracking of moving sound sources for mobile robot audition. in IEEE/RSJ IROS-2013, 3930-3936.
- Schmidt RO (1986) Multiple Emitter Location and Signals Parameter Estimation. IEEE TAP, AP-34:276-280.
- Selzer ML, et al. (2004) A Bayesian Framework for Spectro- graphic Mask Estimation for Missing Feature Speech Recog- nition, Speech Comm., 43(4):379-393.
- Valin J-M, et al. (2004) Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter. in IEEE/RSJ IROS-2004, 2123-2128.
- Valin J-M, et al.(2007) Robust Recognition of Simultaneous Speech by a Mobile Robot. IEEE Tr. Robotics, 23(4):742-752.
- Valin J-M, et al. (2007) Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robotics and Autonomous Systems J., 55(3):216-228.
- Waldherr S, et al. (1998) Template-Based Recoginition of Pose and Motion Gestures On a Mobile Robot. in AAAI-1998, 977-982.
- Wang D and Brown GJ (2006) Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley- IEEE Press.
- Yamamoto S, et al. (2005) Enhanced Robot Speech Recogni- tion Based on Microphone Array Source Separation and Miss- ing Feature Theory. in IEEE ICRA-2005, 1477-1482.
- Yamamoto S, et al. (2006) Real-time robot audition system that recognizes simultaneous speech in the real world. in IEEE/RSJ IROS-2006, 5333-5338.