Machine Learning for Speech Processing

Emotion Recognition

Title

Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification

Authors

Sungrack Yun, Chang D. Yoo

Abstract

This paper considers a learning framework for speech emotion classification using a discriminant function based on Gaussian mixture models (GMMs). The GMM parameter set is estimated by margin scaling with a loss function to reduce the risk of predicting emotions with high loss. Here, the loss function is defined as a function of a distance metric using the Watson and Tellegen’s emotion model. Margin scaling is known to have good generalization ability and can be considered appropriate for emotion modeling where the parameter set is likely to be over-fitted to the training data set whose characteristics may differ from those of the testing data set. Our learning framework is formulated as a constrained optimization problem which is solved using semi-definite programming. Three tasks were evaluated: acted emotion classification, natural emotion classification, and cross database emotion classification. In each task, four loss functions were evaluated. In all experiments, results consistently show that margin scaling improves the classification accuracy over other learning frameworks based on the maximum-likelihood, maximum mutual information and max-margin framework without margin scaling. Experiment results also show that margin scaling substantially reduces the overall loss compared to the max-margin framework without margin scaling.

 

 imageRecog.jpg

 The discriminant functions of true label and other two incorrect labels are denoted by circle, rectangle, and triangle, respectively. Let the loss between circle and rectangle be larger than that between circle and triangle. By scaling the separation margin with a loss, the rectangle is placed further away from the circle than the placement of the triangle with respect to the circle. Thus, we reduce the risk of predicting the rectangle which has high loss.

 

 imageRecog2.jpg

 Watson and Tellegen’s model (WTM). It shows the trait or tendency of a person in expressing an emotion.

 

 imageRecog3.jpg

 AVERAGE ACCURACY(%) OF CORRECT CLASSIFICATION ON THE TESTING DATA SET OF THE EMO-DB

 

 imageRecog4.jpg

 AVERAGE ACCURACY(%) OF CORRECT CLASSIFICATION ON THE TESTING DATA SET OF THE SUSAS

 

 imageRecog5.jpg

 AVERAGE ACCURACY(%) OF CORRECT CLASSIFICATION ON THE TESTING DATA SET OF THE DES

Related Papers

Sungrack Yun and Chang D. Yoo, "Loss-scaled Large Margin Gaussian Mixture Models for Speech Emotion Classification", IEEE Transactions on Audio, Speech and Language processing, vol.20, no.2, pp.585-598, February 2012.