Document Type
Conference Proceeding
Language
eng
Format of Original
5 p.
Publication Date
2014
Publisher
Institute of Electrical and Electronics Engineers
Source Publication
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Original Item ID
doi: 10.1109/ICASSP.2014.6853886
Abstract
This paper introduces the use of three physiologically-motivated features for speaker identification, Residual Phase Cepstrum Coefficients (RPCC), Glottal Flow Cepstrum Coefficients (GLFCC) and Teager Phase Cepstrum Coefficients (TPCC). These features capture speaker-discriminative characteristics from different aspects of glottal source excitation patterns. The proposed physiologically-driven features give better results with lower model complexities, and also provide complementary information that can improve overall system performance even for larger amounts of data. Results on speaker identification using the YOHO corpus demonstrate that these physiologically-driven features are both more accurate than and complementary to traditional mel-frequency cepstral coefficients (MFCC). In particular, the incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks.
Recommended Citation
Wang, Jianglin and Johnson, Michael T., "Physiologically-motivated Feature Extraction for Speaker Identification" (2014). Electrical and Computer Engineering Faculty Research and Publications. 61.
https://epublications.marquette.edu/electric_fac/61
Comments
Accepted version. Published as part of the proceedings of the conference, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014: 1690-1694. DOI. © 2014 Institute of Electrical and Electronics Engineers (IEEE). Used with permission.