Date of Award

Summer 1992

Degree Type

Thesis - Restricted

Degree Name

Master of Science (MS)

Department

Electrical and Computer Engineering

First Advisor

Niederjohn, Russell J.

Second Advisor

Heinen, James A.

Third Advisor

Mulligan, Michael

Abstract

The use of speech and voice recognition is becoming prevalent throughout the telephone network. Such technology is useful for automated information systems and for secure access to prevent fraudulent use of the telephone network. Automated systems using such technology provide messages to the user to prompt that individual for a response. It is desirable to allow customers to provide a response while hearing the messages. This is known as talk-through. Consequently, the customer's speech is corrupted when the messages are present and it is necessary to process this speech to enhance the recognition process. In this thesis, a method for improving talk-through for speech recognition in telephone lines is investigated. This research stems from work done by researchers at AT&T Bell Laboratories who employed echo cancellation, a common telephone signal processing technology utilizing an adaptive filter, at the front end of an automated speech recognizer. The feasibility of this method is investigated and, based upon drawbacks of the method, a modified version of echo cancellation for improved operation is proposed. The new algorithm, called the modified least-mean-square or MLMS algorithm, introduces delay into the updating of filter coefficients to avoid a condition known as divergence, i.e., maladjustment of coefficients, when both prompting messages and customer speech are present. A demonstration system constructed by US WEST Advanced Technologies implementing speaker independent word recognition was enhanced by adding versions of conventional echo cancellation and modified echo cancellation. Various tests were then conducted to evaluate the performance of the two methods, and the results are presented here. Results show that both versions of talk-through yield speech recognition results comparable to that of a clean speech benchmark, indicating that talk-through is certainly achievable for a speaker independent speech recognizer. However, test results show little difference between the two talk-through mechanisms. Also presented in this thesis is a new, novel way of measuring the performance of a speech recognizer based upon concepts from communications and information theory. The measure, called the mutual information to entropy ratio, is shown to have some advantages over conventional statistical measures of performance. Specifically, it incorporates all possible outcomes of a recognition process - correct, deleted, inserted, and substituted - into one measure.

Share

COinS

Restricted Access Item

Having trouble?