Feature-based speech intelligibility enhancement in high noise levels

Robert James Conway, Marquette University

Abstract

The presence of background noise can reduce the intelligibility of speech to the point where it is no longer understandable. Conventional signal restoration techniques have not proven adequate for enhancing intelligibility. Therefore, four techniques which exploit the unique properties of speech are proposed. The techniques utilize features extracted from the noise-corrupted speech which are generally accepted as important factors affecting intelligibility. The proposed techniques emphasize the features in the noise-corrupted speech with the goal of enhancing its intelligibility. The first technique, referred to as the "cepstral algorithm," utilizes the cepstrum to isolate the vocal tract and pitch components of the speech. Based on estimates of the first three formants, the vocal tract spectrum is processed to emphasize the presence of the formants. Based on an estimate of the pitch, the high-time portion of the cepstrum is processed to emphasize the "pitch peaks." The second technique, referred to as the "bandpass algorithm," employs a bank of three bandpass filters to emphasize the formants in the speech spectrum, again based on the estimates of the formants. The pitch estimate is used to construct a comb filter which is applied to the spectrum to emphasize the pitch harmonics. The third technique, referred to as the "postfiltering algorithm," utilizes the feature estimates to compute a pole-zero model for the uncorrupted speech. A filter based on this model is then applied to the corrupted speech to emphasize the formants and pitch. The fourth algorithm, referred to as the "Wiener filtering algorithm," employs a Wiener filter to process the speech. The Wiener filter is constructed to minimize the mean square error with respect to a pole-zero model computed from the feature estimates. The techniques have been used to process speech corrupted with Gaussian white noise at SNRs ranging from $-$12 dB to +12 dB. Results of extensive intelligibility tests indicate that with accurate estimates of the features (particularly the formants), all four techniques will enhance the intelligibility of the noise-corrupted speech, providing improvements as high as 25%. The results also show that intelligibility enhancement can be achieved by processing only the spectral magnitude of the speech.

This paper has been withdrawn.