Machine Learning for Speech Processing

Phonetic Recognition

Title

Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

Author

Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim and Chang D. Yoo

Abstract

This paper describes an acoustic model based on variational Gaussian process dynamical system (VGPDS) for phoneme classification. The proposed model overcomes the limitations of the classical HMM in modeling the speech data, by adopting a nonlinear and nonparametric model. The GP prior on the dynamics function enables the complex dynamic structure of speech to be better represented than that by an HMM, while the GP prior on the emission function models the global dependency over the observations. Additionally, we introduce variance constraint on the original VGPDS for mitigating sparse approximation error of the kernel matrix. The effectiveness of the proposed model is demonstrated with three experimental results including parameter estimation, classification performance on the synthetic and benchmark datasets.

123.jpg

Graphical representations of (left) the left-to-right HMM and (right) the VGPDS: In the left figure, y_n and x_n are observation and discrete latent state. In the right figure, y_ni, f_ni, x_nj, g_nj, and t_n are observation, emission function point, latent state, dynamics function point, and time, respectively. All function points in the same plate are fully connected.


Phonetic classification result on the TIMIT DB

Classification accuracy [%]

HMM

VGPDS

CVGPDS

100 segments (10-fold CV)

49.19

48.17

49.36

TIMIT core test

57.83

61.44

61.54




Related Papers


1. Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim and Chang D. Yoo, “Phoneme Classification using Constrained Variational Gaussian Process Dynamical System,“ to appear for publication in NIPS 2012.