Phoneme Classification using Constrained Variational Gaussian Process Dynamical System
Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim and Chang D. Yoo
This paper describes an acoustic model based on variational Gaussian process dynamical system (VGPDS) for phoneme classification. The proposed model overcomes the limitations of the classical HMM in modeling the speech data, by adopting a nonlinear and nonparametric model. The GP prior on the dynamics function enables the complex dynamic structure of speech to be better represented than that by an HMM, while the GP prior on the emission function models the global dependency over the observations. Additionally, we introduce variance constraint on the original VGPDS for mitigating sparse approximation error of the kernel matrix. The effectiveness of the proposed model is demonstrated with three experimental results including parameter estimation, classification performance on the synthetic and benchmark datasets.
Graphical representations of (left) the left-to-right HMM and (right) the VGPDS: In the left figure, y_n and x_n are observation and discrete latent state. In the right figure, y_ni, f_ni, x_nj, g_nj, and t_n are observation, emission function point, latent state, dynamics function point, and time, respectively. All function points in the same plate are fully connected.
Phonetic classification result on the TIMIT DB
100 segments (10-fold
TIMIT core test
1. Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim and Chang D. Yoo, “Phoneme Classification using Constrained Variational Gaussian Process Dynamical System,“ to appear for publication in NIPS 2012.