November 15, 2023: Abhijit Roy

Considering language-specificity in hearing aid prescription algorithms

Current standards in hearing aid signal processing are not language-specific. A language aggregated long term averaged speech spectrum (LTASS) forms the core of much reasoning behind hearing aid amplification protocols and clinical procedures. More recent studies have found this reasoning to be contentious. Various recording procedures (among other factors) can lead to spectral coloration of the signal. The aggregated LTASS in use may suffer from such colorations as well. Here, a language aggregated LTASS was derived from the ALLSTAR corpus and also from the GoogleTTS AI speech corpus. Results were compared to the original aggregated LTASS. The impact of recording decisions on the expected speech spectrum is also discussed.

 

November 8, 2023: Lisa Davidson

The phonetic details of word-level prosodic structure: evidence from Hawaiian

Previous research has shown that the segmental and phonetic realization of consonants can be sensitive to word-internal prosodic and metrical boundaries (e.g., Vaysman 2009, Bennett 2018, Shaw 2007). At the same time, other work has shown that prosodic prominence, such as stressed or accented syllables, has a separate effect on phonetic implementation (e.g. Cho and Keating 2009, Garellek 2014, Katsika and Tsai 2021). This talk focuses on the word-level factors affecting glottal and oral stops in Hawaiian. We first investigate whether word-internal prosodic or metrical factors, or prosodic prominence such as stressed syllables account for the realization of glottal stop, and then we extend the same analysis to the realization of voice onset time (VOT) in oral stops. Data comes from the 1970s-80s radio program Ka Leo Hawaiʻi. Using a variant of Parker Jones’ (2010) computational prosodic grammar, stops were automatically coded for (lexical) word position, syllable stress, syllable weight, and Prosodic Word position. Results show that word-internal metrical structures do condition phonetic realization, but prosodic prominence does not for either kind of stop. Rather, what is often taken to be the “stronger” articulations (i.e. full closure in glottal stops and longer VOT in oral stops) are instead associated with word-internal boundaries or other prosodically weak positions, which may reflect the recruitment of phonetic correlates to disambiguate or enhance potentially less perceptible elements in Hawaiian. (Work in collaboration with ‘Ōiwi Parker Jones)

October 4, 2023: Midam Kim

Trusting Unreliable Genius

Broad availability of Large Language Models is revolutionizing how conversational AI systems can interact with humans, yet the factors that influence user trust in conversational systems, especially systems prone to errors or ‘hallucinations’, remain complex and understudied. In this talk titled “Trusting Unreliable Genius”, we delve into the nuances of trust in AI, focusing on trustability factors like competency, benevolence, and reliability. We begin by examining human conversation dynamics, including the role of interactive alignment and Gricean Maxims. These principles are then juxtaposed with Conversational AI interactions with several state-of-the-art LLM chabots, offering insights into how trust is cultivated or eroded in this context. We also shed light on the necessity for transparency in AI development and deployment, the need for continuous improvement in reliability and predictability, and the significance of aligning AI with user values and ethical considerations. Building trust in AI is a multifaceted process involving a blend of technology, sociology, and ethics. We invite you to join us as we unravel the complexities of trust in Conversational AI and explore strategies to enhance it.

September 27, 2023: Jacie McHaney, Kevin Sitek

Neural tracking of acoustic and linguistic information in challenging listening conditions and the relationship with speech perception in noise difficulties
Jacie R. McHaney

Communicating in noisy environments is a difficult task that our brains perform exceptionally well. While most listeners can perform this task with relative ease, some individuals particularly struggle with speech perception in noise, and it becomes even more difficult with advancing age. In a series of studies, we examined neural tracking of continuous speech in challenging listening conditions to understand how neural representations of acoustic and linguistic information may impact speech perception in noise difficulties in younger, middle-aged, and older adults. The findings from these studies can help to inform on the mechanism driving speech perception difficulties in aging and in adults with clinically normal hearing.

Mapping the human auditory system and its contributions to speech communication
Kevin R. Sitek

June 7, 2023: Maria Gavino

The thematic coding of heritage bilinguals’ open end responses

In order to better understand the language attitudes and experiences heritage bilinguals have, and how these may impact their performance in a behavioral task, participants in my thesis studies were asked to answer open ended questions. This talk will discuss the process of thematically coding the responses, as well as the analysis done on the thematic coding to better understand patterns within participants as well as to see if there were correlations with their attitudes and their performance in the behavioral task.

April 19, 2023: Seung-Eun Kim

Planning for the future and reacting to the present: proactive and reactive F0 adjustments in speech

A number of studies have examined whether speakers initiate longer utterances with higher F0. However, evidence for such effects is mixed and is mostly based on point estimates of F0 at the beginning of the utterance. Moreover, it is unknown whether utterance length can influence F0 control solely at the response onset or also during the response. We conducted a sentence production task to investigate how control of pitch register – F0 ceiling, floor, and span – is influenced by utterance length. Specifically, we test whether speakers adjust register both in relation to an initially planned utterance length – proactive F0 control – and in response to changes in utterance length that occur after response onset – reactive F0 control. Target sentences in the experiment had one, two, or three subject noun phrases, which were cued with visual stimuli. A novel manipulation was tested in which some visual stimuli were delayed until after participants initiated the response. Evidence for both proactive and reactive control of register was observed. Participants adopted a higher register ceiling and floor as well as a broader span in longer utterances. Furthermore, they decreased the amount of ceiling compression upon encountering delayed stimuli. These findings suggest the existence of a mechanism in which speakers continuously estimate the remaining length of the utterance and use that information to adjust pitch register.

March 29, 2023: Thomas Sostarics

Pitch Accent Variation and the Interpretation of Rising Declaratives

A well-known property of English is the encoding of pragmatic speech act meaning in the pitch pattern at the end of a prosodic phrase. A phrase with declarative syntax that ends in a falling pitch trajectory conveys an assertion, while a final rising pitch trajectory conveys a question. The pitch contour across this region of the phrase is the phonetic implementation of an abstract, phonologically specified, representation called the nuclear tune made up of the concatenation of high and low primitives: pitch accents and edge tones.

In this work from my dissertation I test competing accounts of the locus of intonational encoding of the question/assertion contrast. I examine both rising and falling tunes in the context of ongoing work on rising declaratives to determine which part of the pitch contour encodes this aspect of meaning: is it the region spanning the pitch accent or the region of the edge tones? Furthermore, in light of the pervasive variation in intonational form, I also investigate the degree to which variation in the phonetic implementation of the pitch accent and/or edge tones influences listener interpretation. Across three perception experiments, I find that the pitch accent of a tune does not contribute to assertive force. Rather, the distinction between assertive and inquisitive interpretations is cued primarily by the final F0 of the pitch contour regardless of the pitch accent, but that increased overall pitch prominence may trigger a salient focus interpretation that interferes with question/assertion interpretation, providing empirical support for leading compositional theories of intonational meaning.