Extraterrestials and Speech Therapy

I am somewhat disheartened to stumble across another paper, based on a case study of a single child, advising speech-language pathologists to focus treatment on speech production while ignoring the child’s obvious difficulties with speech perception (McAllister Byun, 2012). The problem that I have with this paper is not that it is a case study (these can be very useful in clinical and research contexts) but that the conclusions are based entirely on phonetic transcriptions of a child’s speech and of the speech stimuli used to assess the child’s perceptual abilities. I believe that this leads to what is probably an erroneous conclusion about the child’s speech production accuracy. This may be alarming to clinical readers since phonetic transcription is your primary tool for describing children’s phonological knowledge. However, in a recent paper Munson et al (2010) explains why it is not clear that “alien anthropologists would come up with anything remotely like phonetic transcription to characterize human speech”. Extraterrestials with a different communicative apparatus may be better placed to realize that phonetic transcription provides a highly biased and often inaccurate picture of what the child is doing when articulating the phonemes that we are testing in our assessments. Munson and colleagues present compelling data to that effect but concede in the conclusions that the extraterrestials may also come to realize that humans usually don’t have the time or resources to obtain unbiased data via instrumental analyses. However, if we must use phonetic transcription we must at least be aware of the limitations so that we can avoid the error that is made in McAllister Byun’s case study. For I believe that there is a significant error that will do harm to children in speech therapy unless we understand the points made in Munson’s fantasy about extraterrestial anthropologists.

McAllister Byun begins by acknowledging that it is now well accepted that speech perception difficulties are associated with speech production errors which is a good thing because I am real tired of devoting research time to proving that over and over again. In fact I got bored with that question a long time ago and went on to the next step – establishing direction of causality. Theoretically, difficulties with speech production accuracy could precede and cause misperception of speech sounds and in fact I was taught that this was so when I studied speech therapy at the University of Alberta in the 1970s. McAllister Byun updates the idea with an intriguing explanation for this hypothetical effect involving the role of the child’s own productions in the population of exemplars that contribute to the child’s perceptual knowledge of the target phoneme. The clinical implications of this hypothesis (if true) are clear; if the child misarticulates /k/ → [t], teach the child to articulate /k/ correctly and any misperception of the contrast will correct itself. On the other hand, misperception of the /k/-/t/ contrast could precede and cause the failure to acquire the appropriate articulatory gestures for accurate production of the /k/ phoneme. I think that this latter hypothesis makes sense because the infant’s speech perception skills begin to develop at least six months in advance of the production of speech-like articulation (in the form of babble) and therefore I think that speech perception typically precedes speech production development although there is a reciprocal relationship in the acquisition of precision in both domains throughout childhood. This hypothesis is also consistent with the DIVA model of speech motor control as described in Shiller, Rvachew & Brosseau-Lapré (2010). I have supported this hypothesis with four types of studies: (1) linear structural equation modeling showing good fit to the “perception leads production” hypothesis and poor fit for the alternative (Rvachew & Grawburg, 2006); (2) a longitudinal study showing that perception skills predict growth in articulation accuracy but not the reverse (Rvachew, 2006); (3) single subject experiments showing that treating speech perception increases speech production accuracy (Jamieson & Rvachew, 1992); and (4) randomized control trials showing that speech perception and speech production training combined is much more efficient and effective than speech production training alone (for review see Chapter 9, Rvachew & Brosseau-Lapré, 2012). Furthermore, in one of these trials I showed specifically that speech production training did not lead to improved speech perception (Rvachew, 1994). Therefore, I recommend that speech perception and speech production treatment procedures be conducted in parallel, with the “input oriented” activities preceding the “output oriented” activities to a greater or lesser extent depending upon the needs of the child. Should I reconsider these recommendations, based on over 30 years of clinical practice and research findings, after reading McAllister Bryn’s paper? Not at all – let’s look at it carefully.

McAllister Byun describes a 4-year-old boy who was given a “provisional diagnosis of CAS…based on the presence of characteristics including atypical prosody, inconsistent errors and vowel errors…” (p. 402). The child fronted velars in syllable onsets (referred to as “strong position”) but not in syllable codas (referred to as “weak position”). This is thought to be an anomaly because implicational relationships dictate that accuracy in the weak position implies accuracy in the strong position. Redford & Diehl (1999) is cited as evidence for greater perceptual prominence of the onset position (making it the strong position). If you read Redford and Diehl however you find that the adults in their study did not find perception of /k/ to be easier in the onset compared to the coda (these relationships were phoneme specific and therefore gross generalizations about positional prominence should not be made). More to the point, the child’s perception of /k/-/t/ was tested using a perceptual test based on same-different judgments of recorded natural speech stimuli. The results revealed equally poor discrimination performance for the /k/-/t/ contrast in onsets and codas. The author concluded that, in this case, production accuracy was “leading” the child’s acquisition of perceptual knowledge of the contrast. The author further concludes that, for this particular case, the deficit in perception could be attributed to “a primary deficit in production” and therefore “motor-oriented therapy may be optimal”. If you believe that speech development is an “either-or” affair where the phoneme contrast is discriminated or not discriminated in the perceptual domain and the target phoneme is produced correctly or not in the articulatory domain, I suppose that this might make sense. However, speech development is a process of gradually acquiring knowledge of multiple phonetic characteristics that are distributed in a continuous fashion across the category. Studies of children’s phonetic knowledge of phoneme categories show that it is not a safe assumption that this child had achieved articulatory accuracy for /k/ in the coda position in advance of perceptual knowledge of the /k/-t/ contrast.

In our book, Françoise and I stress repeatedly that it is not enough to ask if the child perceives any given contrast. Rather, we want to know “how” the child perceives the contrast: “Phonetic categories are an emergent property of the distribution of acoustic information across parametric phonetic space, built up over time as the language learner stores detailed memory traces of experienced words. Each language learner must discover a strategy for abstracting phonetic structure from the input that is adapted to the nature of the input that is received. Assessing the language learner’s perceptual knowledge requires sophisticated tools that reveal the listener’s perceptual strategies for making sense of highly complex and variable input …” (p. 46). The test used by McAllister Byun clearly does not meet this standard-we have no way of knowing which acoustic cues the child was attending to when completing the task. The acoustic cues for perception of /k/ include all the spectral moments (mean, variance, skewness and kurtosis) that can be measured for the stop burst (Forrest et al. 1990) as well as many acoustic characteristics of the formant transitions that tie the release burst to the vowel (Dorman et al, 1977; Nguyen et al, 2009). Adults and children with normally developing speech differentiate /k/ and /t/ in production  largely on the basis of the spectral mean. Three different patterns are seen among children with speech disorders: (1) they may not differentiate the phonemes at all (i.e., they have no contrast) or (2) they may produce a covert contrast (their /k/ targets are perceived as [t] even though they are acoustically different from /t/ targets) or (3) they may produce a perceptible /k/-/t/ contrast that is differentiated on the basis of nonstandard cues. Nonstandard cues in the latter two situations may include skewness and kurtosis in the burst; alternatively the child may ignore the burst and manipulate slope of the formant frequency transitions. Reliance on non-standard cues or cue-weighting strategies in perception may lead to variable performance in perception and production.

How might a child with incomplete knowledge of the acoustic properties of this contrast achieve perceptually accurate production in codas and inaccurate production in onsets?  Using electropalatography, Gibbon & Wood (2002) describe “articulatory drift” whereby placement of an undifferentiated lingual gesture at onset is different from the placement at release, resulting in variable perceptual outcomes for alveolar and velar targets, such that /t/ → [t, k] and /k/ → [t, k]. Gibbon (1999) demonstrated how a child can learn to control the release phase of the gesture to achieve the contrast without fundamentally changing the undifferentiated lingual gesture itself. In this case, the adult listener believes that the child has acquired the contrast productively but the child’s underlying articulatory patterns continue to be immature.

I actually think it makes sense that the child’s own productions might have some sort of downstream effect on the child’s perception of a phoneme contrasts. Perhaps McAllister Byun’s case is an example of that, especially given the “provisional diagnosis of CAS” in this case. However, the assessment information provided is inadequate to prove the hypothesis. We do not know which acoustic cues the child attended to when differentiating /k/ from /t/ in the perceptual domain. We do not know the topography of the child’s articulatory gestures when producing the contrast given that the primary data in the paper is phonetic transcription. In our book Françoise and I describe cases like McAllister Byun’s who received “motor oriented therapy” and failed to make measurable progress in therapy over three years! My interpretation of this case is that the child was probably attending to the formant frequency transitions in perception which results in erratic perceptual performance in both onset and coda. Productively the child may manipulate the timing of the release of the undifferentiated lingual gesture so as to produce [t] in the onset but a perceptually accurate but phonetically inaccurate [k] in the coda. His phonetic knowledge of the contrast is incomplete in the perceptual and articulatory domains in both onset and the coda. The treatment program needs to address his perceptual, articulatory and phonological knowledge of the /k/ phoneme. SLPs, not having access to EPG and speech synthesizers and other research tools for precisely mapping the child’s phonetic knowledge at all levels of phonological representation, can only guess as to the status of the child’s knowledge in these domains. The safest assumption is that the child’s knowledge is incomplete at all levels and the most prudent course of action is to address all three. Your therapy will be more effective and efficient in the long run.


Dorman, M., M. Studdert-Kennedy, et al. (1977). “Stop-consonant recognition: Release bursts and formant transitions as functionally equivalent, context-dependent cues.” Attention, Perception, & Psychophysics 22(2): 109-122. http://www.springerlink.com/content/8583238315777761/

Forrest, K., G. Weismer, et al. (1990). “Statistical analysis of word-initial /k/ and /t/ produced by normal and phonologically disordered children.” Clinical Linguistics & Phonetics 4(4): 327-340. http://informahealthcare.com/doi/abs/10.3109/02699209008985495

Gibbon, F. E. (1999). “Undifferentiated lingual gestures in children with articulation/phonological disorders.” Journal of Speech, Language, and Hearing Research 42: 382-397. http://bit.ly/Nj4VIf

Gibbon, F. and S. E. Wood (2002). “Articulatory drift in the speech of children with articulation and phonological disorders.” Perceptual and Motor Skills 95: 295-307.

Jamieson, D. G. and S. Rvachew (1992). “Remediation of speech production errors with sound identification training.” Journal of Speech-Language Pathology and Audiology 16: 201-210.[OPEN ACCESS]


McAllister Byun, T. (2012). “Bidirectional perception–production relations in phonological development: evidence from positional neutralization.” Clinical Linguistics & Phonetics 26(5): 397-413.


Nguyen, V. S., E. Castelli, et al. (2009). Vietnamese final stop consonants /p, t, k/ described in terms of formant transition slopes. 2009 International Conference on Asian Language Processing: Recent Advances in Asian Language Processing, IALP 2009. Singapore: 86-90. [OPEN ACCESS


Munson, B., J. Edwards, et al. (2010). “Deconstructing phonetic transcription: Covert contrast, perceptual bias, and an extraterrestrial view of Vox Humana.” Clinical Linguistics & Phonetics 24: 245-260. http://informahealthcare.com/doi/abs/10.3109/02699200903532524

Redford, M. A. and R. L. Diehl (1999). “The relative perceptual distinctiveness of initial and final consonants in CVC syllables.” The Journal of the Acoustical Society of America 106(3): 1555-1565. http://asadl.org/jasa/resource/1/jasman/v106/i3/p1555_s1

Rvachew, S. (1994). “Speech perception training can facilitate sound production learning.” Journal of Speech and Hearing Research 37: 347-357. http://bit.ly/Qt0Piv

Rvachew, S. (2006). “Longitudinal prediction of implicit phonological awareness skills.” American Journal of Speech-Language Pathology 15: 165-176. http://bit.ly/RMcfMZ

Rvachew, S. and F. Brosseau-Lapré (2012). Developmental Phonological Disorders: Foundations of Clinical Practice. San Diego, CA, Plural Publishing, Inc. http://bit.ly/vIliz2

Rvachew, S. and M. Grawburg (2006). “Correlates of phonological awareness in preschoolers with speech sound disorders.” Journal of Speech, Language, and Hearing Research 49: 74-87. http://bit.ly/RsQ2ER

Shiller, D. M., S. Rvachew, et al. (2010). “Importance of the auditory perceptual target to the achievement of speech production accuracy.” Canadian Journal of Speech-Language Pathology and Audiology 34: 181-192. (http://bit.ly/PSlmXk)