Machine Learning for Speech Processing

Phonetic Recognition

Title

Large Margin Discriminative Semi-Markov Model for Phonetic Recognition

Author

Sungwoong Kim, Sungrack Yun and Chang D. Yoo

Abstract

This paper considers a large margin discriminative semi-Markov model (LMSMM) for phonetic recognition. The hidden Markov model (HMM) framework that is often used for phonetic recognition assumes only local statistical dependencies between adjacent observations, and it is used to predict a label for each observation without explicit phone segmentation. On the other hand, the semi-Markov model (SMM) framework allows simultaneous segmentation and labeling of sequential data based on a segment-based Markovian structure that assumes statistical dependencies among all the observations within a phone segment. For phonetic recognition which is inherently a joint segmentation and labeling problem, the SMM framework has the potential to perform better than the HMM framework at the expense of slight increase in computational complexity. The SMM framework considered in this paper is based on a non-probabilistic discriminant function that is linear in the joint feature map which attempts to capture long-range statistical dependencies among observations. The parameters of the discriminant function are estimated by a large margin learning framework for structured prediction. The parameter estimation problem in hand leads to an optimization problem with many margin constraints, and this constrained optimization problem is solved using a stochastic gradient descent algorithm. The proposed LMSMM outperformed the large margin discriminative HMM in the TIMIT phonetic recognition task.

phonetic_1.jpg
An undirected graph of discriminative SMM

phonetic_2.jpg
The circle, rectangle and triangle denote the discriminant function given the correct segment sequence and the other two incorrect segment sequences, respectively. By scaling the margin, the rectangle which has a high loss is further away from the circle than the triangle which has a low loss is from the circle.

phonetic_3.jpg
Evolutions of phone error rates on the development set according to the hard-max and soft-max (LMSMM, 1-mix)

Related Papers

1. Sungwoong Kim, Sungrack Yun, and Chang D. Yoo, “Large Margin Discriminative Semi-Markov Model for Phonetic Recognition”, to appear for publication in IEEE Transactions on Audio, Speech and Language processing, November 2011.

2. Sungwoong Kim, Sungrack Yun and Chang D. Yoo, "Large margin training of semi-Markov model for phonetic recognition", in IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, USA, 2010.

3. Sungwoong Kim,Sungrack Yun, and Chang D. Yoo, "Margin-Enhanced Maximum Mutual Information Estimation for Hidden Markov Models",In Proceedings of IEEE Internal Symposium on Industrial Electronics(ISIE 2009).