Electrical and Computer Engineering Faculty Research and Publications

Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition

Wei-Qiang Zhang, Tsinghua UniversityFollow
Liang He, Tsinghua University
Yan Deng, Tsinghua UniversityFollow
Jia Liu, Tsinghua UniversityFollow
Michael T. Johnson, Marquette UniversityFollow

Document Type

Article

Language

eng

Format of Original

11 p.

Publication Date

2-2011

Publisher

Institute of Electrical and Electronics Engineers

Source Publication

IEEE Transactions on Audio, Speech, and Language Processing

Source ISSN

1558-7916

Original Item ID

doi: 10.1109/TASL.2010.2047680

Abstract

The shifted delta cepstrum (SDC) is a widely used feature extraction for language recognition (LRE). With a high context width due to incorporation of multiple frames, SDC outperforms traditional delta and acceleration feature vectors. However, it also introduces correlation into the concatenated feature vector, which increases redundancy and may degrade the performance of backend classifiers. In this paper, we first propose a time-frequency cepstral (TFC) feature vector, which is obtained by performing a temporal discrete cosine transform (DCT) on the cepstrum matrix and selecting the transformed elements in a zigzag scan order. Beyond this, we increase discriminability through a heteroscedastic linear discriminant analysis (HLDA) on the full cepstrum matrix. By utilizing block diagonal matrix constraints, the large HLDA problem is then reduced to several smaller HLDA problems, creating a block diagonal HLDA (BDHLDA) algorithm which has much lower computational complexity. The BDHLDA method is finally extended to the GMM domain, using the simpler TFC features during re-estimation to provide significantly improved computation speed. Experiments on NIST 2003 and 2007 LRE evaluation corpora show that TFC is more effective than SDC, and that the GMM-based BDHLDA results in lower equal error rate (EER) and minimum average cost (Cavg) than either TFC or SDC approaches.

Comments

Accepted version. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 2 (February 2011): 266-276. DOI. © 2011 Institute of Electrical and Electronics Engineers (IEEE). Used with permission.

Recommended Citation

Zhang, Wei-Qiang; He, Liang; Deng, Yan; Liu, Jia; and Johnson, Michael T., "Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition" (2011). Electrical and Computer Engineering Faculty Research and Publications. 57.
https://epublications.marquette.edu/electric_fac/57

Download

Find in your library

Included in

Computer Engineering Commons, Electrical and Computer Engineering Commons

COinS

e-Publications@Marquette

Electrical and Computer Engineering Faculty Research and Publications

Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition

Document Type

Language

Format of Original

Publication Date

Publisher

Source Publication

Source ISSN

Original Item ID

Abstract

Comments

Recommended Citation

Included in

Browse

Information about e-Pubs@MU

Links

e-Publications@Marquette

Electrical and Computer Engineering Faculty Research and Publications

Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition

Authors

Document Type

Language

Format of Original

Publication Date

Publisher

Source Publication

Source ISSN

Original Item ID

Abstract

Comments

Recommended Citation

Included in

Share

Browse

Information about e-Pubs@MU

Links