The application of moment and cumulant spectra to formant tracking of speech embedded in noise
Abstract
An investigation of the problem of the estimation of formant frequencies of speech embedded in noise has been conducted. Higher-order spectra were chosen as the spectral estimators. Moments and cumulants are defined and the theoretical foundation of moment and cumulant spectra (polyspectra) is presented. Initial investigation of the properties of polyspectra were conducted in the context of computer-generated, synthetic speech signals of known pitch and formant content. In particular, the computation and interpretation of the bispectrum and trispectrum were studied. The processing of speech using both pitch-synchronous and fixed-duration analysis frames was investigated in detail. An interpretation of spectra of fixed-duration analysis frames was developed, and an heuristic algorithm for formant frequency estimation using three spectral peaks was established. Several formant tracking algorithms were designed and tested using formant frequency data from three of the utterances in the Marquette University speech data base. Both theoretical and empirical bases for using a weighted average (over the analysis frame) of known formant frequencies as target frequencies were developed. Six algorithms were then evaluated using thirty utterances from the speech data base with known formant content. The algorithms developed for this work proved to be superior to existing algorithms in their ability to correctly estimate formant frequencies, in particular at low signal-to-noise ratios (+6 to $-$30 dB). Two of the algorithms were then used for generating formant data for use in a speech enhancement algorithm. The results of listening tests indicated that the algorithms developed for the work reported here are superior to the existing algorithms.
This paper has been withdrawn.