Machine Learning for Speech Processing

Speech Enhancement

Title

Psychoacoustically constrained and distortion minimized speech enhancement

Authors

Seokhwan Jo, Chang D. Yoo

Abstract

This paper considers a psychoacoustically constrained and distortion minimized speech enhancement algorithm. Noise reduction, in general, leads to speech distortion, and a balanced tradeoff between noise reduction and speech distortion must be attained. A constrained optimization problem is set to reduce noise so that speech distortion is minimized while the sum of speech distortion and residual noise is kept below the masking threshold of the clean speech. Obtaining a solution to the optimization problem may be infeasible under certain conditions, and a slack variable is introduced to allow certain deviation from the constraint conditions. To estimate the power spectral density and also the masking threshold of clean speech, a speech model that assumes coexisting deterministic and stochastic components in speech is used. Experimental results show that the considered algorithm outperforms some of the more popular algorithms in terms of improvement in segmental signal-to-noise ratio (SegSNR), spectral distance (SD), modified Bark spectral distortion (MBSD), and mean opinion score (MOS).

 

 speech_enhan_3.jpg

 MBSD of the considered algorithm and other algorithms in (a) white
Gaussian noise, (b) F16 cockpit noise, (c) babble noise, (d) car noise, and (e)
helicopter noise.

Related Papers

1. Seokhwan Jo and Chang D. Yoo, “Psychoacoustically constrained and distortion minimized speech enhancement,” IEEE Transactions on Audio, Speech and Language processing, vol. 18, pp. 2099?2110, November 2010. (IF: 1.848)
2. Seokhwan Jo and Chang D. Yoo, “Psychoacoustically constrained and distortion minimized speech enhancement algorithm,” In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, pp. 4669-4672, April, 2009.

Title

Speech enhancement based on the decomposition of speech into deterministic and stochastic components and psychoacoustic model

Authors

Seokhwan Jo and Chang D. Yoo

Abstract

A novel speech enhancement algorithm based on both a decomposition of speech into coexisting deterministic and stochastic components and a psychoacoustic model is proposed. Noisy speech is first decomposed into deterministic and stochastic components, and then each component is enhanced preserving its individual characteristics. A psychoacoustic model is taken into account when enhancing the stochastic component which usually has much lower energy than the deterministic component. Simulation results show that the proposed algorithm performs better than some of the more popular algorithms in terms of segmental signal-to-noise ratio (SNR) and speech recognition rate.


speech_enhan_1.jpg

Block diagram for dual excitation speech enhancement using
psychoacoustic model.

 speech_enhan_2.jpg
(a) SegSNR improvement of proposed algorithm and other algorithms in white Gaussian noise.
(b) SegSNR improvement in f16 cockpit noise.

Related Papers

1. Seokhwan Jo and Chang D. Yoo, “Speech enhancement based on the decomposition of speech into deterministic and stochastic components and psychoacoustic model,” In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, USA, vol. 4, pp. 897-900, May, 2007.