Academia.eduAcademia.edu

Outline

Discriminative kernel-based phoneme sequence recognition

2006

Abstract

We describe a new method for phoneme sequence recognition given a speech utterance, which is not based on the HMM. In contrast to HMM-based approaches, our method uses a discriminative kernel-based training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the correct sequence. The phoneme sequence predictor is devised by mapping the speech utterance along with a proposed phoneme sequence to a vector-space endowed with an inner-product that is realized by a Mercer kernel. Building on large margin techniques for predicting whole sequences, we are able to devise a learning algorithm which distills to separating the correct phoneme sequence from all other sequences. We describe an iterative algorithm for learning the phoneme sequence recognizer and further describe an efficient implementation of it. We present initial encouraging experimental results with the TIMIT and compare the proposed m...

References (19)

  1. K.-F. Lee and H.-W. Hon, "Speaker independent phone recognition using hidden markov models," IEEE Trans. Acoustic, Speech and Signal Proc., 37(2), pp. 1641-1648, 1989.
  2. V.V. Digalakis, M. Ostendorf, and J.R. Rohlicek, "Fast algorithms for phone classification and recognition using segment-based models," IEEE Trans. on Signal Proc., vol. 40, pp. 2885-2896, 1992.
  3. R. Chengalvarayan and L. Deng, "Speech trajectory dis- crimination using the minimum classification error learning," IEEE Trans. Speech and Audio Proc., 6(6), pp. 505-515, 1998.
  4. M. Ostendorf, V.V. Digalakis, and O.A. Kimball, "From hmm's to segment models: A unified view to stochastic mod- eling for speech recognition," IEEE Trans. Speech and Audio Proc., 4(5), pp. 360-378, 1996.
  5. S. Young, "A review of large-vocabulary continuous speech recognition," IEEE Signal Proc. Mag., pp. 45-57, Sept. 1996.
  6. V. N. Vapnik, Statistical Learning Theory, Wiley, 1998.
  7. N. Cristianini and J. Shawe-Taylor, An Introduction to Sup- port Vector Machines, Cambridge University Press, 2000.
  8. B. Taskar, C. Guestrin, and D. Koller, "Max-margin markov networks," in NIPS, 2003.
  9. S. Shalev-Shwartz, J. Keshet, and Y. Singer, "Learning to align polyphonic music," in ISMIR, 2004.
  10. J. Keshet, D. Chazan, and B.-Z. Bobrovsky, "Plosive spotting with margin classifiers," in Eurospeech, 2001.
  11. J. Salomon, S. King, and M. Osborne, "Framewise phone classification using support vector machines," in ICSLP, 2002.
  12. M. Collins, "Discriminative training methods for hidden markov models: Theory and experiments with perceptron al- gorithms," in EMNLP, 2002.
  13. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, "Support vector machine learning for interdependent and structured output spaces," in ICML, 2004.
  14. K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer, "Online passive aggressive algorithms," Journal of Machine Learning Research, vol. 7, Mar 2006.
  15. J. Keshet, S. Shalev-Shwartz, Y. Singer, and D. Chazan, "Phoneme alignment based on discriminative learning," in Interspeech, 2005.
  16. N. Cesa-Bianchi, A. Conconi, and C. Gentile, "On the gener- alization ability of on-line learning algorithms," IEEE Trans- actions on Information Theory, 50(9), pp. 2050-2057, 2004.
  17. L. Rabiner and B.H. Juang, Fundamentals of Speech Recog- nition, Prentice Hall, 1993.
  18. R. Collobert, S. Bengio, and J. Mariéthoz, "Torch: a modular machine learning software library," IDIAP-RR 46, IDIAP, 2002.
  19. O. Dekel, J. Keshet, and Y. Singer, "Online algorithm for hi- erarchical phoneme classification," in Proc. of MLMI. 2004.