Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification
Sungrack Yun, Chang D. Yoo
This paper considers a learning framework for
speech emotion classification using a discriminant function based
on Gaussian mixture models (GMMs). The GMM parameter set
is estimated by margin scaling with a loss function to reduce the
risk of predicting emotions with high loss. Here, the loss function
is defined as a function of a distance metric using the Watson and
Tellegen’s emotion model. Margin scaling is known to have good
generalization ability and can be considered appropriate for emotion
modeling where the parameter set is likely to be over-fitted to
the training data set whose characteristics may differ from those
of the testing data set. Our learning framework is formulated as a
constrained optimization problem which is solved using semi-definite
programming. Three tasks were evaluated: acted emotion
classification, natural emotion classification, and cross database
emotion classification. In each task, four loss functions were evaluated.
In all experiments, results consistently show that margin
scaling improves the classification accuracy over other learning
frameworks based on the maximum-likelihood, maximum mutual
information and max-margin framework without margin scaling.
Experiment results also show that margin scaling substantially
reduces the overall loss compared to the max-margin framework
without margin scaling.
The discriminant functions of
true label and other two incorrect labels are denoted by circle,
rectangle, and triangle, respectively. Let the loss between circle
and rectangle be larger than that between circle and triangle. By
scaling the separation margin with a loss, the rectangle is placed
further away from the circle than the placement of the triangle
with respect to the circle. Thus, we reduce the risk of predicting
the rectangle which has high loss.
Watson and Tellegen’s model (WTM). It shows the trait or tendency of a person in expressing an emotion.
AVERAGE ACCURACY(%) OF CORRECT CLASSIFICATION ON THE TESTING DATA SET OF THE EMO-DB
AVERAGE ACCURACY(%) OF CORRECT CLASSIFICATION ON THE TESTING DATA SET OF THE SUSAS
AVERAGE ACCURACY(%) OF CORRECT CLASSIFICATION ON THE TESTING DATA SET OF THE DES
Sungrack Yun and Chang D. Yoo, "Loss-scaled Large Margin Gaussian Mixture Models for Speech Emotion Classification", IEEE Transactions on Audio, Speech and Language processing, vol.20, no.2, pp.585-598, February 2012.