Date of Award
Spring 1994
Document Type
Dissertation - Restricted
Degree Name
Doctor of Philosophy (PhD)
Department
Electrical and Computer Engineering
First Advisor
Niederjohn, Russell J.
Second Advisor
Belfore, Lee A.
Third Advisor
Josse, Fabien
Abstract
In many situations where speech is used as a means of transmitting information, the presence of background noise has the effect of reducing the intelligibility of the speech. Often, the speech signal becomes corrupted to the point where it is no longer understandable and the intended message is lost. It is desirable in these situations to have a means of enhancing the noise-corrupted speech so that its intelligibility is restored. Past and current research has indicated that conventional signal restoration techniques are not adequate for this purpose. Therefore, four techniques which exploit the unique properties of speech are proposed. The techniques utilize information about features extracted from the noise-corrupted speech. These features are generally accepted as important factors affecting the intelligibility of speech. The proposed techniques emphasize these features in the noise-corrupted speech signal with the goal of enhancing its intelligibility. The first technique, referred to as the "cepstral algorithm," utilizes the power cepstrum to isolate the vocal tract and pitch components of the speech. Based on estimates of the first three formants, the vocal tract spectrum is processed to emphasize the presence of the formants. Based on an estimate of the pitch, the hightime portion of the cepstrum is processed to emphasize the "pitch peaks." The second technique, referred to as the "bandpass algorithm," employs a bank of three bandpass filters to emphasize the formants in the speech spectrum, again based on the estimates of the formants. The pitch estimate is used to construct a comb filter which is applied to the spectrum to emphasize the pitch harmonics. The third technique, referred to as the "postfiltering algorithm," utilizes the formant and pitch estimates to compute a pole-zero model for the uncorrupted speech. A filter based on this model is then applied to the corrupted speech to emphasize the formants and pitch. The fourth algorithm, referred to as the "Wiener filtering algorithm," employs a Wiener filter to process the speech. The Wiener filter is constructed to minimize the mean square error with respect to a pole-zero model computed from the formant and pitch estimates. The techniques have been used to process speech corrupted with Gaussian white noise at SNRs ranging from -12 dB to + 12 dB. Results of extensive intelligibility tests indicate that with accurate estimates of the speech features (particularly the formants), all four techniques will enhance the intelligibility of the noise-corrupted speech. Of the four algorithms, the cepstral algorithm is the most flexible and exhibits the greatest potential as an intelligibility enhancement technique. At an SNR of -12 dB, the cepstral algorithm is shown to provide a 25% increase in intelligibility. This finding is particularly noteworthy considering that, as far as this author is aware, no technique developed thus far has been reported to provide significant levels of intelligibility enhancement for speech which has been degraded by high levels of noise. In addition, the results show that intelligibility enhancement can be achieved by processing only the spectral magnitude of the speech signal. Other techniques have been developed based upon this assumption, but again, as far as this author is aware, no previous work has provided evidence to support the notion that intelligibility enhancement can be achieved without any phase processing. The work presented here provides such evidence.