Document Type

Article

Language

eng

Format of Original

10 p.

Publication Date

5-2014

Publisher

Acoustical Society of America

Source Publication

Journal of the Acoustical Society of America

Source ISSN

0001-4966

Original Item ID

doi: 10.1121/1.4869088

Abstract

Recent studies on binary masking techniques make the assumption that each time-frequency (T-F) unit contributes an equal amount to the overall intelligibility of speech. The present study demonstrated that the importance of each T-F unit to speech intelligibility varies in accordance with speech content. Specifically, T-F units are categorized into two classes, speech-present T-F units and speech-absent T-F units. Results indicate that the importance of each speech-present T-F unit to speech intelligibility is highly related to the loudness of its target component, while the importance of each speech-absent T-F unit varies according to the loudness of its masker component. Two types of mask errors are also considered, which include miss and false alarm errors. Consistent with previous work, false alarm errors are shown to be more harmful to speech intelligibility than miss errors when the mixture signal-to-noise ratio (SNR) is below 0 dB. However, the relative importance between the two types of error is conditioned on the SNR level of the input speech signal. Based on these observations, a mask-based objective measure, the loudness weighted hit-false, is proposed for predicting speech intelligibility. The proposed objective measure shows significantly higher correlation with intelligibility compared to two existing mask-based objective measures.

Comments

Published version. Journal of the Acoustical Society of America, Vol. 135, No. 5 (May 2014): 3007-3016. DOI. © Acoustical Society of America 2014. Used with permission.

Share

COinS