Date of Award

Spring 2017

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Electrical and Computer Engineering

First Advisor

Johnson, Michael T.

Second Advisor

Richie, James E.

Third Advisor

Berry, Jeffrey J.

Abstract

The Real-time Articulatory Speech Synthesizer (RASS) is a research tool in the Marquette Speech and Swallowing lab that simultaneously collects acoustic and articulatory data from human participants. The system is used to study acoustic-to-articulatory inversion, articulatory-to-acoustic synthesis mapping, and the effects of real-time acoustic feedback. Electromagnetic Articulography (EMA) is utilized to collect position data via sensors placed in a subject’s mouth. These kinematic data are then converted into a set of synthesis parameters that controls an articulatory speech synthesizer, which in turn generates an acoustic waveform matching the associated kinematics. Independently from RASS, the synthesized acoustic waveform can be further modified before it is returned to the subject, creating the opportunity for involuntary learning through controlled acoustic feedback. In order to maximize the impact of involuntary learning, the characteristics of the synthetically generated speech need to closely match those of the participant. There are a number of synthesis parameters that cannot be directly controlled by subjects’ articulatory movements such as fundamental frequency and parameters corresponding to physiological measures such as vocal tract length and overall vocal tract size. The goal of this work is to develop a mechanism for automatically determining RASS internal synthesis parameters that provide the closest synthesis parameter match to a subject’s acoustic characteristics, ultimately increasing the system’s positive effect on involuntary learning.. The methods detailed in this thesis examine the effects of altering both time-independent and time-dependent synthesis parameters to increase the acoustic similarity between subjects’ real and synthesized speech. The fundamental frequency and first two formant values are studied in particular across multiple vowels to determine the time-independent parameter settings. Time-dependent parameter analysis is performed through the use of a real-time parameter-tracking configuration. Results of this work provide a way of adapting the Maeda synthesis parameters in RASS to be speaker-specific and individualize the study of auditory feedback. This investigation will allow researchers to better customize the RASS system for individual subjects and alter involuntary learning outcomes.

COinS