Feature-based speech enhancement techniques based on spectral subtraction and Wiener filtering
Abstract
Speech is a means of communicating information. The presence of background noise may reduce the intelligibility of the speech, sometimes even to the extent that it can no longer be understood, and hence the information to be conveyed is lost. It is therefore beneficial to construct some means of enhancing the intelligibility of such corrupted speech so that its information is maintained. A number of techniques have been proposed in the area of enhancing speech in a noisy environment. Some of the more successful techniques are spectral subtraction, Wiener filtering, iterative Wiener filtering, and constrained iterative Wiener filtering. While these techniques concentrate either on eliminating the noise associated with the corrupted signal, on estimating the clean speech model, or on restoring the clean speech model from the noisy signal, they do not take into consideration the importance to intelligibility of speech features such as the formant frequencies, their bandwidths and amplitudes. This dissertation presents four new feature-based speech enhancement techniques and demonstrates (both objectively and, in some cases, subjectively) their improvement over the existing methods. (Hence, it also addresses the significance of using such speech features in the enhancement process.) These new techniques include feature-based spectral subtraction, feature-based Wiener filtering, iterative feature-based Wiener filtering and constrained iterative feature-based Wiener filtering. In addition, this dissertation addresses two important speech enhancement issues. The first is the usage and limitation of line-spectrum frequencies in speech enhancement. It is shown in this dissertation that with decreasing signal-to-noise ratios the line-spectrum frequencies converge to a predictable set of values, determined by the order of estimation, corresponding to the pure noise case. Heuristically, this study also provides a range of signal-to-noise ratios in which meaningful speech information can be retrieved, and a range of signal-to-noise ratios in which no processing is necessary. Previously proposed iterative Wiener filtering techniques terminate after a pre-determined fixed number of iterations. The result may or may not be the best available. The second study involves the extension of these techniques to include a termination criterion. The results indicate that in most cases, these self-terminating techniques combined with the existing iterative processes perform better than simply using a fixed number of iterations.
This paper has been withdrawn.