Quantifying speech production and perception in different communication contexts
Speech communication involves multiple forms and styles as a function of different speaking contexts as well as different communicative needs of speakers and perceivers. In noisy environments, or interacting with hearing-impaired or non-native perceivers, speakers often alter their speech productions using a clarified, exaggerated speech style, resulting in changes in articulatory movements and acoustic signal. Such modifications may enhance speech intelligibility as perceivers make use of clear speech cues from speaker face and voice. However, clear speech effects may not always be positive, as excessively exaggerated speech resulting in overlap of different sound categories, or attention to incorrect speech cues, may hinder intelligibility. In this seminar, I present a series of studies exploring clear speech effects by comparing speech sounds produced in clear and casual speech styles. To quantify articulatory features, videos of speakers’ mouth and facial movements in clear and casual speech productions are examined using computer vision and image processing techniques. Acoustic measurements are conducted to characterize the physical properties of clear and causal speech. We then test auditory and visual intelligibility of clear versus casual speech sounds by presenting perceivers with either speaker voice (acoustic input) or speaker face (visual input), or both. Finally, we try to relate articulatory, acoustic, and perceptual data in statistical models to determine which articulatory and acoustic clear speech cues contribute to enhanced intelligibility and which cues may be inhibitory. Preliminary results reveal that clear speech modulates compensatory articulatory and acoustic features to increase the contrastivity of different speech sounds and thus enhance intelligibility. Challenges in quantifying articulatory and acoustics features as well as in establishing production-perception links are discussed.