Dr. Dolittle Project: A Framework for Classification and Understanding of Animal Vocalizations
IEEE International Conference on Acoustics, Speech and Signal Processing, 2007: ICASSP; Honolulu, Hawaii, April 15-20, 2007
In this paper, we evaluate the use of appended jitter and shimmer speech features for the classification of human speaking styles and of animal vocalization arousal levels. Jitter and shimmer features are extracted from the fundamental frequency contour and added to baseline spectral features, specifically Mel-frequency cepstral coefficients (MFCCs) for human speech and Greenwood function cepstral coefficients (GFCCs) for animal vocalizations. Hidden Markov models (HMMs) with Gaussian mixture models (GMMs) state distributions are used for classification. The appended jitter and shimmer features result in an increase in classification accuracy for several illustrative datasets, including the SUSAS dataset for human speaking styles as well as vocalizations labeled by arousal level for African elephant and Rhesus monkey species.
Li, Xi; Tao, Jidong; Johnson, Michael T.; Soltis, Joseph; Savage, Anne; Leong, Kirsten; and Newman, John D., "Stress and Emotion Classification Using Jitter and Shimmer Features" (2007). Dr. Dolittle Project: A Framework for Classification and Understanding of Animal Vocalizations. 9.