THE EXTRACTION OF FEATURES FROM A SPEECH SIGNAL CORRUPTED BY ADDITIVE NOISE AND THEIR USE FOR SPEECH ENHANCEMENT

JACEK STANISLAW WALICKI, Marquette University

Abstract

The accurate extraction of two particular features from the speech signal affected by additive white noise is investigated. Reliable detection of the fundamental frequency of the vocal cord vibration provides important information on characteristics of the input excitation in the assumed digital model of the speech production. The determination of the fundamental (pitch) frequency is meaningful only for the voiced segments of speech; therefore the categorization of speech becomes another important issue. If the speech signal is affected by interfering noise, extraction of these features becomes a very difficult problem. The pitch tracking method consists of determining an integer frequency, between 70 and 300 Hz, which maximizes the weighted sum of the magnitudes of spectral lines in the Discrete Fourier Transformation of each analyzed frame of speech. Some additional conditions are added to compensate for the influence of the format structure on the harmonic sum function. The processing is dependent on the signal-to-noise ratio of the speech material. Extensive testing (for different signal-to-noise ratios) has shown that the pitch detector performs very well for both clean and noisy speech. The voiced/unvoiced classifier uses a pattern recognition approach, more specifically--the nearest neighbor technique. The parameters used in V/UV decision are obtained as a byproduct of the pitch tracking procedure. The energy of the noisy signal is one parameter and the variance of the modified harmonic sum function is the second. The variance of this function has a remarkable property that is very resistant to the addition of noise to the speech signal. These parameters create a two-dimensional pattern space in which the two-class discrimination process takes place. The reference sets of measurements are obtained for different signal-to-noise ratios and are then used as the prototype sets in the classification process. The categorization technique was tested on non-noisy speech and on speech affected by additive noise. . . . (Author's abstract exceeds stipulated maximum length. Discontinued here with permission of school.) UMI

This paper has been withdrawn.

e-Publications@Marquette

THE EXTRACTION OF FEATURES FROM A SPEECH SIGNAL CORRUPTED BY ADDITIVE NOISE AND THEIR USE FOR SPEECH ENHANCEMENT

Abstract

Browse

Information about e-Pubs@MU