Detailed program for Framing Speech
Framing Speech: Celebrating 40 years of inquiry with Stefanie Shattuck-Hufnagel
14:20 – 14:40 Welcome
14:40-16:00 Framing Speech: Oral session 1
Evidence from speech timing patterns for theories of phonology and speech motor control
This paper presents evidence that challenges intrinsic timing theories of phonology. The evidence supports theories based on symbolic phonological representations, and suggests a phonetic planning component that uses general-purpose timekeeping mechanisms to specify the timing of surface intervals and speech movements.
Kids and codas
Since 2007, Stefanie and I have been doing acoustic studies on the production of coda stop consonants by children in the age range 2;6 to 3;6. Our goal is to contribute to a developmental model of planning and production, in particular a model in which feature contrasts are signaled by a number of acoustic cues. We have shown that children are much more likely than adults to pre-aspirate unvoiced coda stops, and argued that the pre-aspiration is not due to poor motor control of the larynx. Rather we suggest that children deliberately produce it because they have trouble producing voicing during the closure for coda voiced stops. (They are much less likely to produce it than adults are.) The pre-aspiration serves to increase or enhance the voicing contrast, in a sense “filling in” for the fact that closure voicing is usually absent for voiced coda stops. On the other hand, the subjects are experts at using vowel length to signal the coda voicing contrast, just as adults do. These two results support Ken Stevens’ feature-cue-based model, and contribute to its developmental implications by showing that children master some cues to a contrast early on, while other cues take longer to master. We are now looking at other codas than /p, b, k, g/, specifically /s, t, d/, and hope to provide some data for these sounds.
What entrainment reveals about the cognitive encoding of prosody and its relation to discourse function
Jennifer Cole and Uwe Reichel
What are the units of prosodic encoding in the mental representation of words and phrases, and how do those units contribute to signaling linguistic meaning? We consider evidence from the analysis of prosodic entrainmentÑwhereby conversation partners become more similar to one another in their prosodic expression. Entrained prosody reveals those properties of a speakerÕs prosody that are perceptually salient to her conversation partner and subsequently reproduced on the basis of a stored mental representation. We report on a study of prosodic entrainment in American English from game interactions under cooperative and competitive play. Entrainment is evaluated in relation to dialog act condition and measured through coarse, utterance-level f0 measures (mean, s.d.) and more fine-grained measures of global (phrasal) and local (pitch accent) f0 contours obtained from superpositional parametric f0 stylization (Reichel 2014). Linear mixed-effects models confirm predicted effects of dialog condition, with more entrainment in cooperative dialogs for the stylization-based parameters of the f0 contour, but not for the utterance-level parameters. Entrainment effects are found primarily at the beginning of talker turns. These findings suggest that prosody is linked to dialog act in cognitive representation, with an encoding in terms of the f0 contour rather than coarse utterance-level measures.
So, how many pitch accents are there on MASSaCHUsetts?
We will discuss some of the ‘behind-the-scenes’ discussions of the numerous workshops which led to the development of the ToBI labeling conventions some years ago.
16:00 – 17:00 Framing Speech: Poster session
Speech and manual gesture coordination in a pointing task
Jelena Krivokapic, Mark Tiede, Martha Tyrone and Dolly Goldenberg
This study explores the coordination between manual pointing gestures and gestures of the vocal tract. Using a novel methodology that allows for concurrent collection of audio, kinematic body and speech articulator trajectories, we ask 1) which particular gesture (vowel gesture, consonant gesture, or tone gesture) the pointing gesture is coordinated with, and 2) with which landmarks the two gestures are coordinated (for example, whether the pointing gesture is coordinated to the speech gesture by the onset or maximum displacement). Preliminary results indicate coordination of the intonation gesture and the pointing gesture.
A cod pod thawed out: effects of superimposed prosodic structure on speech errors in alternating CVC sequences observed kinematically
Argyro Katsika, Mark Tiede, Christine Mooshammer, Louis Goldstein and Stefanie Shattuck-Hufnagel
Electromagnetic articulometry (EMA) has been used to record the production of alternating CVC sequences in differing prosodic contexts. These ranged over simple alternation, superimposed phrasal boundaries, imitation of sentential stress, and within-sentence embedding. EMA movement data were obtained at 200 Hz from sensors placed on the tongue, lips and mandible, corrected for head movement and aligned to the occlusal plane. Synchronized audio was recorded at 16 kHz. Pouplier (2008) has shown that alternating CVCs result in (possibly incomplete) inappropriate constrictions of the anti-phase articulator (“intrusions”; e.g., tongue dorsum constriction coincident with the target bilabial closure in “pod” of a “pod cod” sequence), as well as unachieved constrictions of the in-phase target articulator (“reductions”). “Substitutions” represent a combination of complete intrusion and complete reduction. The hypothesis tested in this work is that as sequences become more sentence-like, the rate of substitution errors as a percentage of all observed errors will increase. To test this we examine the phase and amplitude deviations of EMA trajectories from individual trials measured from their cross-trial aggregate, aligned using nonlinear time-warping.
Structure from Process: How Lexical Influence on Speech Perception Shape Phonotactic Constraints
This work explores the relationship between linguistic structure and language processing, a primary theme in Stefanie Shattuck-Hufnagel’s work. Top-down lexical influences on speech perception have long been a controversial topic in psycholinguistics, in large part because standard research paradigms do not support strong inferences about how different types of representations interact in processing. We will review a series of studies that use Granger causality analysis of high spatiotemporal resolution imaging data to reveal the nature of these interactions. Across a series of studies, we have identified a pattern of influence by the supramarginal gyrus, a brain area involved in lexical representation, on the posterior superior temporal gyrus, a region involved in acoustic-phonetic processing, in instances of apparent top-down lexical influence on speech perception. The same pattern is found in cases in which speech perception and word processing are influenced by phonotactic constraints on phonological structure. We hypothesize that phonotactic constraints are a byproduct of top-down influences on speech perception that facilitate the perception of spoken language.
The importance of being lexical: The case of the post-lexical Persian word accent
Differently from all previous accounts of Persian prosody, we argue that the word accent in Persian, earlier described as ‘word stress’, is exclusively governed by the morphosyntax and is assigned postlexically.
Are there boundary tones in Mandarin Chinese echo questions?
Edward Flemming and Helen Nie
Mandarin Chinese echo questions are distinguished from declaratives by intonation alone, but it is not obvious that the intonational distinction can be characterized in terms of the familiar elements of intonation, i.e. pitch accents and boundary tones, because the F0 trajectory at the end of a question is determined primarily by the lexical tone of the final syllable. Echo questions are marked by an optional increase in overall pitch range and modifications to the final tone that have been characterized as a further expansion of pitch range. We explore an account according to which these modifications to the final tone are due to the presence of a high boundary tone, but its realization differs from familiar boundary tones because it is realized simultaneously with the final lexical tone. The conflict between the simultaneous demands of lexical tone and boundary tone are resolved by compromise between their conflicting targets.
Voice Quality Changes in Words with Stød
Gert Foget Hansen
Stød is a prosodic feature occurring in Danish. The most conspicuous acoustic trait of stød in its prototypical form is a short stretch of irregular vocal fold vibrations, i.e. creak. However, creak is neither necessary nor sufficient to characterize stød: The occurrence of creak is not limited to syllables with stød and distinct and clear realizations of stød need not exhibit creak. To account for the inconsistent occurrence of irregular vocal fold vibrations in stød it is hypothesized that stød could be explained as a relative and dynamic voice quality movement in the form of a brief change from less to more compressed voice, potentially but not necessarily involving creaky voice. To test the hypothesis changes in voice quality are traced over the cause of comparable syllables with and without stød using a set of voice quality related acoustic measures. The results demonstrate that the timing of the peak level of compression need not coincide with the occurrence of irregular vibrations. As a consequence of these findings the proposed stød hypothesis is rejected. Moreover, the results challenge the underlying models of voice quality, as results do not conform to generally accepted assumptions about the relation between creaky voice and compression.
Glottalization in LAGS: Exploring a Potential Prosodic Marker in a Historical Speech Corpus
Glottalization in vowel-initial words has been shown to occur frequently at the start of intonational phrase units (IPU) and intermediate phrases (ip), and on pitch-accented words, indicating that glottalization serves as an acoustic correlate of prosodic structure (Dilley, Shattuck-Hufnagel, & Ostendorf 1996; Garrellek 2013; Pierrehumbert 1995). Building on previous work analyzing Boston radio news speech, and California lab speech, this study utilizes the Linguistic Atlas of the Gulf States (LAGS), an extensive sociolinguistic corpus (Pederson et al. 1986), to examine glottalization of vowel-initial words in conversational speech in the southern U.S. The speech examined here was produced in 1972 by 10 informants (5 M; M=63.7 years; ~36 hours of speech transcribed by Renwick & Olsen 2016) in southeast Georgia. Commonly used vowel-initial words (n=200) were annotated for glottalization (+/-), prosodic phrase position (start of IPU, start of ip, mid-phrase), and pitch accent (+/-). In line with previous work, glottalization rates closely mirror prosodic phrase prominence, with the highest rates occurring at the start of IPUs, and the lowest mid-phrase. Furthermore, glottalization at the start of IPUs occurs even on non-pitch-accented words, whereas glottalization of non-pitch-accented words is not frequent elsewhere, thus suggesting that phrase position outranks stress in determining glottalization.
Perceptual and acoustic study of voice quality in high-pitched heavy metal singing
This paper studies high registers of heavy metal singing based on the voice profile analysis scheme (VPAS) and acoustic correlates. The f0 range varied from 366 to 666 Hz. The application of VPAS for singing is unique in the phonetic literature. Two professional and two amateur singers sang Iron Maiden’s Aces High with instrumental playback through headphones. Two very high-register excerpts were selected from this song to verify the vocal strategies used by experienced singers while singing at extreme registers of vocal extension. Experienced judges (vocal coaches, speech therapists, phoneticians) analyzed their vocal productions by perceptual analysis of voice quality and voice dynamics with VPAS. The acoustic analyses were run with the software VoiceSauce that automatically extracted thirteen parameters of long-term measures (H1, H1H2, H1A3, CPP, Energy, HNR05, HNR15, HNR25, HNR35, F1, F2, B1, B2). Results indicate that the two groups of singers use distinctive articulatory strategies in singing at high registers and that settings strategies are factors that influence these measures. Although both groups of singers used tense vocal tract and larynx settings, open jaw and raised larynx settings were only found in the professional voices. These different articulatory settings were statistically corroborated by the acoustic analysis.
Charisma in business speeches: A contrastive acoustic-prosodic analysis of Steve Jobs and Mark Zuckerberg
Oliver Niebuhr, Alexander Brem, Eszter Novák-Tót and Jana Voße
Based on the prosodic features of charisma that have been identified in previous prosodic studies, we provide the first-ever acoustic profiles of Steve Jobs’ and Mark Zuckerberg’s business speeches, . We analyzed a sample of about 45 minutes from iPhone/iPad or “F8” presentations. Our results show that Jobs and Zuckerberg both stand out against a reference sample of ordinary speakers from the prosodic literature. However, Jobs stands out even more and thus significantly differs from Zuckerberg in almost all prosodic parameters that are known from previous studies to be associated with charisma. In addition, both CEOs produced significant differences differed between the customer-oriented and investor-oriented sections of their speeches, albeit mostly in opposite directions. In summary, we show that the prosodic features of charisma in political speeches also apply to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences between Jobs and Zuckerberg and the investor- and customer-related sections of their speeches support the modern understanding of charisma as a gradual, multiparametric, and context-sensitive concept.
Development of phonetic variants (allophones) in 2-year-olds learning American English: A study of alveolar stop /t, d/ codas
Jae Yung Song, Stefanie Shattuck-Hufnagel and Katherine Demuth
This study examined the emergence of the phonetic variants (often called allophones) of alveolar phonemes in the speech production of 2-year-olds. Our specific question was: Does the child start by producing a “canonical” form of a phoneme (e.g., /t/ with a clear closure and a release burst), only later learning to produce its other phonetic variants (e.g., unreleased stop, flap, and glottal stop)? Or, does the child start by producing the appropriate phonetic variants in the appropriate contexts and only later learn that they are phonetic variants of the same phoneme? In order to address this question, we investigated the production of three phonetic variants (unreleased stop, flap, and glottal stop) of the alveolar stop codas /t, d/ in the spontaneous speech of 6 American-English-speaking mother-child dyads, using both acoustic and perceptual coding. The results showed that 2-year-old children produced all three variants significantly less often than their mothers, and produced acoustic cues to canonical /t, d/ more often. This supports the view that young children start out by producing a fully articulated canonical variant of a phoneme in contexts where an adult would produce non-canonical forms. The implications of these findings for early phonological representations are discussed.
Analyzing the Prosody of Young Children
Young children acquire the complex prosodic contours necessary to transmit essential semantic, pragmatic, and affective information. Like other aspects of language acquisition, they learn these differences and produce them in their own speech with no overt instruction. While toddlers and young children are able to express intent and meaning, there is still ‘something’ distinct about their prosodic abilities in comparison to the adult model. How can we quantify these differences between child and adult prosody? This study begins to tease apart how we can analyze the intonation of young children using both an acoustic and a phonological approach.
The development of the phonetics and phonology of speech prosody in adolescents and young adults with cochlear implants
Heike Lehnert-Lehouillier and Linda Spencer
This study investigates the relationship between the acoustically analyzable aspects of speech prosody and its linguistic and cognitive organization by looking at the phonetics and phonology of prosody production in adolescents and young adults with cochlear implants (CI). Sentence productions from 24 CI users were analyzed for this study. Nine of those 24 young adults were implanted before the age of 4.0 years (early implantation group), 10 were implanted between the ages of 4.1 and 10.0 years (mid implantation group), and 5 of the 24 CI users were implanted past the age of 10.1 years (late implantation group). Utterance final F0 rise in questions as phonetic correlate of sentence prosody, and phrase accent/boundary tone combinations as correlate of the phonological aspects of sentence prosody were analyzed. While no evidence of a group difference with respect to phonetic aspects of sentence prosody were found, differences between the groups in the linguistic organization of sentence prosody may exist.
Gender Identification from Whispered Mandarin
Li Jiao, Yi Xu, Qiuwu Ma and Marjoleine Sloos
Previous studies have found that speaker sex can be identified in whispered English and Swedish. It is unknown whether listeners can also identify speaker gender from whispered Mandarin. We asked forty Mandarin listeners to judge the sex of six Mandarin speakers from phonated and whispered monosyllabic words. Results revealed a main effect of phonation, with a lower performance in whispers than in normal utterances. But the identification rate was still well above chance for whispers. There was no main effect of speaker gender, but from normal to whispered speech, female identification rates dropped whereas males’ increased. It appears that in phonated speech, some male speakers’ pitch may extend into the female range, but when pitch was naturally absent in whispers, the remaining spectral cues for female voice have greater overlap with those of males. Thus it is somewhat paradoxical that, though a whispering quality may make a female voice more feminine, true whispers may make it more male-like. In conclusion, spectral cues are still left in whispered Mandarin for gender identification despite the lack of F0. But these cues are not as effective as F0, and more female voice was heard as male-like than the other way around.
How to be kind with prosody
What was said is often interpreted relative to what was left unsaid. Evaluate statements such as ‘That’s good’ can sound negative, because the speaker could have said ‘great’ instead. ‘That’s great’, on the other hand, might be interpreted as ‘not so great’, if we believe the speaker was just being nice. How, then, can we ever credibly convey our true intentions when making evaluate statements? We present evidence showing that prosody can be used to modulate the interpretation of evaluative statements, and can specifically be used to preempt inferences about positive evaluations toward a more negative interpretation. It is less able to modulate negative evaluations. The observed asymmetry makes sense if we tend to be kind to each other, and inflate our evaluative statements toward the nicer end of the spectrum.
Does working memory predict individual differences in both implicit and explicit prosodic phrasing?
Speakers can differ with respect to how they group the same utterance into prosodic phrases. When prosody is explicit (i.e., overtly spoken), this is readily observed via analysis of the speech output itself; when prosody is implicit (i.e., generated sub-vocally during reading), it can arguably be inferred from differences in how sentences are parsed. Such variation suggests that both explicit and implicit phrasing are influenced by factors outside of the grammar, factors more specific to production and processing mechanisms. The goal of the present study is explore how individual differences in working memory capacity, which may influence both production and processing strategies, predict individual differences in prosodic phrasing. Sixty-five native English speakers participated in two reading tasks, one in which a short passage was read aloud, one in which another short passage was read silently. Explicit boundaries from the spoken passage were identified by ToBI annotators, who labeled both intermediate phrase and intonational phrase boundaries; implicit boundaries from the silently-read passage were identified by the participants themselves in an implicit version of the “Rapid Prosody Transcription” task. Preliminary results from this in-progress study are presented and implications for research on implicit prosody and planning in speech production are discussed.
Tonal targets and phonetic variability
Argyro Katsika and Amalia Arvaniti
The assumption that tonal targets are always localized and exhibit stable scaling and alignment was tested with Greek pitch accents H*, L+H* and H*+L. Speakers (N = 13) read mini-dialogues in which the accents were examined with respect to tonal crowding, phrase length and stress location. The F0 signal of the last three syllables of each test word was extracted at 10 ms steps and the Lucero et al. (1997) nonlinear time warping technique was used to compute the normalized alignment of the F0 signals; the resulting averaged signals were compared across conditions. The data show systematic differences in the scaling and alignment of the accents’ peaks, but also consistent differences that depend on the greater (non-immediate) context. Further, there is systematic variation involving non-localized effects on F0 as well as effects on other phonetic properties, such as duration. These results indicate that accents can be eminently variable and that cues to their realization are not limited to localized targets. Overall the results point towards a view of accents as distributions of values – in line with all phonetic categories – rather than as invariable prototypes or sets of discrete “allotones” as is often the practice in AM.
Representing Pitch Accents: A Case for Tonemes and Allo-Tones
Despite the general property of phonology (and perhaps all linguistic domains) that surface forms are not identical to mental representations, it seems much mainstream work on English intonation (e.g., in the ToBI framework) treats pitch accents differently. In particular, it seems to be generally assumed that the surface representation of a pitch accent simply is the mental representation of that pitch accent. By investigating the pitch accent manifestation for semantic focus in Yes/No Questions, we can find good evidence for a single underlying pitch accent toneme (/L*/) being realized as (at least) two allo-tones: [L*] and [H+L*]. This raises new possibilities in exploring tonal inventories. In particular, the discovery of an [H+L*] pitch accent in the acoustic signal above does not indicate that we need or want an /H+L*/ pitch accent in the tonal inventory of mainstream American English (cf. Beckman, Hirschberg, and Shattuck-Hufnagel 2006). Moreover, this approach may help resolve debates about whether or not H* and L+H* are the same or distinct pitch accents.
Effects of rhetorical stress on item and content recall in Spanish
Christopher Eager, José Ignacio Hualde and Jennifer Cole
A common feature of Spanish public speech is the frequent use of rhetorical stress (RS), marked by a high pitch accent on a lexically unstressed syllable. The expanded pitch range and enhanced rhythmicity of RS suggests it may serve to attract listeners’ attention. We look for evidence that RS facilitates speech comprehension by testing recall of information in heard passages with and without RS. 30 Spanish speakers listened to 20 short radio news passages from the Glissando corpus, with an average 7.5 words with RS. In half of the passages F0 contours were digitally manipulated to remove the RS pitch movement. After listening to each passage, participants wrote down everything they could remember. Responses were coded by two judges who assigned 2 points for each content word recalled verbatim, and 1 point for words recalled using a semantically related word.Regression results show significant effect of passage length on recall, but no significant effect of RS. Exploratory analysis of individual recall scores reveals the predicted effect of RS on recall accuracy, but only for those participants with overall lower recall accuracy. Further modeling and follow-up experiments will explore the effect of marked accent patterns on spoken language comprehension.
Testing the predictions of metrical theory: variability in reported word level-stress
Amelia Kimball and Jennifer Cole
There is wide agreement that the regular pattern of English word-level primary stress targets the right edge of the word, and that secondary stress can be analyzed with trochaic feet, resulting in alternating strong and weak syllables and avoiding two strong syllables next to one another (a clash). Yet empirical evidence for alternating stress patterns is limited, and studies of clash resolution through stress shift report conflicting results from acoustic measures (Shattuck-Hufnagel 1988,1991; Grabe and Warren 1995; Vogel, et al. 1995). We test the predictions of metrical theory by asking listeners to mark stressed syllables on a transcript as they listen to a phrase. Our results confirm that syllables at higher levels of metrical structure are more frequently marked, and strong and weak syllables usually alternate. However, our results also reveal variability in listeners’ ratings, even for the subset of listeners who can correctly mark stressed syllables in individual words. This variability is not predicted by metrical theory, which assigns stress deterministically. Instead, our data suggests that the projection of word-level stresses in phrasal contexts results in clash, and clash resolution is stochastic, with listeners differing in their tolerance for clash, and in the locations where clash is perceptually resolved.
The benefits of beat-prosody integration for word memorization and discourse comprehension in preschoolers
Pilar Prieto, Alfonso Igualada, Núria Esteve-Gibert, Olga Kushch and Judith Llanes
Gesture and prosody are important precursors of children’s early language development. For example, prosodic patterns have been shown to be important in the early acquisition of speech act information, and pointing gestures have been shown to be important in the transition to word acquisition and multiword speech. However, it is unclear whether gestural and prosodic integration abilities can boost preschoolers’ memory and linguistic abilities. While researchers have shown that adults can benefit from the presence of beat gestures in word recall tasks, studies have failed to conclusively replicate these findings with pre-school children.This work investigates whether accompanying words with beat gestures and prosodic prominence can help preschoolers improve word memorization in lists of words (Experiment 1) and also improve memorization and discourse comprehension of contrastively focused words in discourse (Experiment 2). Results from Experiment 1 with one hundred and six 3-to-5-year-old children showed that children recalled the target word significantly better when it was accompanied by a beat gesture than when not, indicating a local recall effect. Pilot results from Experiment 2 with 10 children also indicate clear effects of observing beat gestures and prosodic prominence on the recall of the target focused items.
17:00-18:00 Framing Speech: Oral session 2
Forays into child speech: From prosodic structure to speech planning and production
How and when do children become competent speakers of a language, prosodically speaking? Is the two-word stage of development composed of simple Prosodic Words? Or are these also represented at higher levels of structure, such as a Phonological Phrase or Intonational Phrase? Which acoustic cues do young speakers use to signal such structures? What are the processes involved in understanding how children plan larger utterances, and how might this be tested? What would a developmental model of speech planning and production look like? These are some of the many questions that have engaged my collaborations with Stefanie Shattuck-Hufnagel over the past several years, tapping our complementary skill sets to bring insight into these fundamental questions in language development. This talk will briefly explore answers to these questions, and outline areas for further research.
Effects of meter and predictability on word durations in The Cat in the Hat
The current study was designed to investigate whether hierarchical rhythmic structure and rhyme predictability account for inter-word interval and word duration over and above other linguistic features in productions of Dr. Seuss’s The Cat in the Hat. We first built two control regression models, predicting inter-word interval and word duration, respectively, as a function of 1) number of phonemes, 2) lexical frequency, 3) word class, 4) syntactic structure, and 5) font emphasis. Consistent with prior findings, factors that led to both longer inter-word intervals and longer word durations included a) more phonemes, b) lower frequency, c) open class status, d) alignment with a syntactic boundary, and e) capitalization. We then tested whether hierarchical metrical structure improved model fit by testing model parameters corresponding to metrical grid structure. Inter-word interval duration was strongly predicted by hierarchical metrical structure such that duration increased linearly with increased metrical grid height. Conversely, word duration was only weakly predicted by metrical structure but strongly by predictability, such that rhyme resolutions were significantly shorter in duration than rhyme antecedents. These results further our understanding of the interactions of factors that affect speaker’s word durations. In addition, they demonstrate the vast array of cues that children receive about lexical, syntactic, and rhythmic structure from nursery rhymes, which may begin to explain these texts’ value in reading instruction.
All the prosody work we’re not doing
Once upon a time prosody was considered “around the edge of language”, as Bolinger put it, and was seriously understudied in comparison to other linguistic topics. The Speech Prosody conference series and its ever-growing popularity convincingly demonstrate that this is no longer the case. Nonetheless, there are still many areas of prosodic research that have received very little attention and a few that are still untrodden terrain!
Page updated 5/17/16