Extraterrestials and Speech Therapy

I am somewhat disheartened to stumble across another paper, based on a case study of a single child, advising speech-language pathologists to focus treatment on speech production while ignoring the child’s obvious difficulties with speech perception (McAllister Byun, 2012). The problem that I have with this paper is not that it is a case study (these can be very useful in clinical and research contexts) but that the conclusions are based entirely on phonetic transcriptions of a child’s speech and of the speech stimuli used to assess the child’s perceptual abilities. I believe that this leads to what is probably an erroneous conclusion about the child’s speech production accuracy. This may be alarming to clinical readers since phonetic transcription is your primary tool for describing children’s phonological knowledge. However, in a recent paper Munson et al (2010) explains why it is not clear that “alien anthropologists would come up with anything remotely like phonetic transcription to characterize human speech”. Extraterrestials with a different communicative apparatus may be better placed to realize that phonetic transcription provides a highly biased and often inaccurate picture of what the child is doing when articulating the phonemes that we are testing in our assessments. Munson and colleagues present compelling data to that effect but concede in the conclusions that the extraterrestials may also come to realize that humans usually don’t have the time or resources to obtain unbiased data via instrumental analyses. However, if we must use phonetic transcription we must at least be aware of the limitations so that we can avoid the error that is made in McAllister Byun’s case study. For I believe that there is a significant error that will do harm to children in speech therapy unless we understand the points made in Munson’s fantasy about extraterrestial anthropologists.

McAllister Byun begins by acknowledging that it is now well accepted that speech perception difficulties are associated with speech production errors which is a good thing because I am real tired of devoting research time to proving that over and over again. In fact I got bored with that question a long time ago and went on to the next step – establishing direction of causality. Theoretically, difficulties with speech production accuracy could precede and cause misperception of speech sounds and in fact I was taught that this was so when I studied speech therapy at the University of Alberta in the 1970s. McAllister Byun updates the idea with an intriguing explanation for this hypothetical effect involving the role of the child’s own productions in the population of exemplars that contribute to the child’s perceptual knowledge of the target phoneme. The clinical implications of this hypothesis (if true) are clear; if the child misarticulates /k/ → [t], teach the child to articulate /k/ correctly and any misperception of the contrast will correct itself. On the other hand, misperception of the /k/-/t/ contrast could precede and cause the failure to acquire the appropriate articulatory gestures for accurate production of the /k/ phoneme. I think that this latter hypothesis makes sense because the infant’s speech perception skills begin to develop at least six months in advance of the production of speech-like articulation (in the form of babble) and therefore I think that speech perception typically precedes speech production development although there is a reciprocal relationship in the acquisition of precision in both domains throughout childhood. This hypothesis is also consistent with the DIVA model of speech motor control as described in Shiller, Rvachew & Brosseau-Lapré (2010). I have supported this hypothesis with four types of studies: (1) linear structural equation modeling showing good fit to the “perception leads production” hypothesis and poor fit for the alternative (Rvachew & Grawburg, 2006); (2) a longitudinal study showing that perception skills predict growth in articulation accuracy but not the reverse (Rvachew, 2006); (3) single subject experiments showing that treating speech perception increases speech production accuracy (Jamieson & Rvachew, 1992); and (4) randomized control trials showing that speech perception and speech production training combined is much more efficient and effective than speech production training alone (for review see Chapter 9, Rvachew & Brosseau-Lapré, 2012). Furthermore, in one of these trials I showed specifically that speech production training did not lead to improved speech perception (Rvachew, 1994). Therefore, I recommend that speech perception and speech production treatment procedures be conducted in parallel, with the “input oriented” activities preceding the “output oriented” activities to a greater or lesser extent depending upon the needs of the child. Should I reconsider these recommendations, based on over 30 years of clinical practice and research findings, after reading McAllister Bryn’s paper? Not at all – let’s look at it carefully.

McAllister Byun describes a 4-year-old boy who was given a “provisional diagnosis of CAS…based on the presence of characteristics including atypical prosody, inconsistent errors and vowel errors…” (p. 402). The child fronted velars in syllable onsets (referred to as “strong position”) but not in syllable codas (referred to as “weak position”). This is thought to be an anomaly because implicational relationships dictate that accuracy in the weak position implies accuracy in the strong position. Redford & Diehl (1999) is cited as evidence for greater perceptual prominence of the onset position (making it the strong position). If you read Redford and Diehl however you find that the adults in their study did not find perception of /k/ to be easier in the onset compared to the coda (these relationships were phoneme specific and therefore gross generalizations about positional prominence should not be made). More to the point, the child’s perception of /k/-/t/ was tested using a perceptual test based on same-different judgments of recorded natural speech stimuli. The results revealed equally poor discrimination performance for the /k/-/t/ contrast in onsets and codas. The author concluded that, in this case, production accuracy was “leading” the child’s acquisition of perceptual knowledge of the contrast. The author further concludes that, for this particular case, the deficit in perception could be attributed to “a primary deficit in production” and therefore “motor-oriented therapy may be optimal”. If you believe that speech development is an “either-or” affair where the phoneme contrast is discriminated or not discriminated in the perceptual domain and the target phoneme is produced correctly or not in the articulatory domain, I suppose that this might make sense. However, speech development is a process of gradually acquiring knowledge of multiple phonetic characteristics that are distributed in a continuous fashion across the category. Studies of children’s phonetic knowledge of phoneme categories show that it is not a safe assumption that this child had achieved articulatory accuracy for /k/ in the coda position in advance of perceptual knowledge of the /k/-t/ contrast.

In our book, Françoise and I stress repeatedly that it is not enough to ask if the child perceives any given contrast. Rather, we want to know “how” the child perceives the contrast: “Phonetic categories are an emergent property of the distribution of acoustic information across parametric phonetic space, built up over time as the language learner stores detailed memory traces of experienced words. Each language learner must discover a strategy for abstracting phonetic structure from the input that is adapted to the nature of the input that is received. Assessing the language learner’s perceptual knowledge requires sophisticated tools that reveal the listener’s perceptual strategies for making sense of highly complex and variable input …” (p. 46). The test used by McAllister Byun clearly does not meet this standard-we have no way of knowing which acoustic cues the child was attending to when completing the task. The acoustic cues for perception of /k/ include all the spectral moments (mean, variance, skewness and kurtosis) that can be measured for the stop burst (Forrest et al. 1990) as well as many acoustic characteristics of the formant transitions that tie the release burst to the vowel (Dorman et al, 1977; Nguyen et al, 2009). Adults and children with normally developing speech differentiate /k/ and /t/ in production  largely on the basis of the spectral mean. Three different patterns are seen among children with speech disorders: (1) they may not differentiate the phonemes at all (i.e., they have no contrast) or (2) they may produce a covert contrast (their /k/ targets are perceived as [t] even though they are acoustically different from /t/ targets) or (3) they may produce a perceptible /k/-/t/ contrast that is differentiated on the basis of nonstandard cues. Nonstandard cues in the latter two situations may include skewness and kurtosis in the burst; alternatively the child may ignore the burst and manipulate slope of the formant frequency transitions. Reliance on non-standard cues or cue-weighting strategies in perception may lead to variable performance in perception and production.

How might a child with incomplete knowledge of the acoustic properties of this contrast achieve perceptually accurate production in codas and inaccurate production in onsets?  Using electropalatography, Gibbon & Wood (2002) describe “articulatory drift” whereby placement of an undifferentiated lingual gesture at onset is different from the placement at release, resulting in variable perceptual outcomes for alveolar and velar targets, such that /t/ → [t, k] and /k/ → [t, k]. Gibbon (1999) demonstrated how a child can learn to control the release phase of the gesture to achieve the contrast without fundamentally changing the undifferentiated lingual gesture itself. In this case, the adult listener believes that the child has acquired the contrast productively but the child’s underlying articulatory patterns continue to be immature.

I actually think it makes sense that the child’s own productions might have some sort of downstream effect on the child’s perception of a phoneme contrasts. Perhaps McAllister Byun’s case is an example of that, especially given the “provisional diagnosis of CAS” in this case. However, the assessment information provided is inadequate to prove the hypothesis. We do not know which acoustic cues the child attended to when differentiating /k/ from /t/ in the perceptual domain. We do not know the topography of the child’s articulatory gestures when producing the contrast given that the primary data in the paper is phonetic transcription. In our book Françoise and I describe cases like McAllister Byun’s who received “motor oriented therapy” and failed to make measurable progress in therapy over three years! My interpretation of this case is that the child was probably attending to the formant frequency transitions in perception which results in erratic perceptual performance in both onset and coda. Productively the child may manipulate the timing of the release of the undifferentiated lingual gesture so as to produce [t] in the onset but a perceptually accurate but phonetically inaccurate [k] in the coda. His phonetic knowledge of the contrast is incomplete in the perceptual and articulatory domains in both onset and the coda. The treatment program needs to address his perceptual, articulatory and phonological knowledge of the /k/ phoneme. SLPs, not having access to EPG and speech synthesizers and other research tools for precisely mapping the child’s phonetic knowledge at all levels of phonological representation, can only guess as to the status of the child’s knowledge in these domains. The safest assumption is that the child’s knowledge is incomplete at all levels and the most prudent course of action is to address all three. Your therapy will be more effective and efficient in the long run.


Dorman, M., M. Studdert-Kennedy, et al. (1977). “Stop-consonant recognition: Release bursts and formant transitions as functionally equivalent, context-dependent cues.” Attention, Perception, & Psychophysics 22(2): 109-122. http://www.springerlink.com/content/8583238315777761/

Forrest, K., G. Weismer, et al. (1990). “Statistical analysis of word-initial /k/ and /t/ produced by normal and phonologically disordered children.” Clinical Linguistics & Phonetics 4(4): 327-340. http://informahealthcare.com/doi/abs/10.3109/02699209008985495

Gibbon, F. E. (1999). “Undifferentiated lingual gestures in children with articulation/phonological disorders.” Journal of Speech, Language, and Hearing Research 42: 382-397. http://bit.ly/Nj4VIf

Gibbon, F. and S. E. Wood (2002). “Articulatory drift in the speech of children with articulation and phonological disorders.” Perceptual and Motor Skills 95: 295-307.

Jamieson, D. G. and S. Rvachew (1992). “Remediation of speech production errors with sound identification training.” Journal of Speech-Language Pathology and Audiology 16: 201-210.[OPEN ACCESS]


McAllister Byun, T. (2012). “Bidirectional perception–production relations in phonological development: evidence from positional neutralization.” Clinical Linguistics & Phonetics 26(5): 397-413.


Nguyen, V. S., E. Castelli, et al. (2009). Vietnamese final stop consonants /p, t, k/ described in terms of formant transition slopes. 2009 International Conference on Asian Language Processing: Recent Advances in Asian Language Processing, IALP 2009. Singapore: 86-90. [OPEN ACCESS


Munson, B., J. Edwards, et al. (2010). “Deconstructing phonetic transcription: Covert contrast, perceptual bias, and an extraterrestrial view of Vox Humana.” Clinical Linguistics & Phonetics 24: 245-260. http://informahealthcare.com/doi/abs/10.3109/02699200903532524

Redford, M. A. and R. L. Diehl (1999). “The relative perceptual distinctiveness of initial and final consonants in CVC syllables.” The Journal of the Acoustical Society of America 106(3): 1555-1565. http://asadl.org/jasa/resource/1/jasman/v106/i3/p1555_s1

Rvachew, S. (1994). “Speech perception training can facilitate sound production learning.” Journal of Speech and Hearing Research 37: 347-357. http://bit.ly/Qt0Piv

Rvachew, S. (2006). “Longitudinal prediction of implicit phonological awareness skills.” American Journal of Speech-Language Pathology 15: 165-176. http://bit.ly/RMcfMZ

Rvachew, S. and F. Brosseau-Lapré (2012). Developmental Phonological Disorders: Foundations of Clinical Practice. San Diego, CA, Plural Publishing, Inc. http://bit.ly/vIliz2

Rvachew, S. and M. Grawburg (2006). “Correlates of phonological awareness in preschoolers with speech sound disorders.” Journal of Speech, Language, and Hearing Research 49: 74-87. http://bit.ly/RsQ2ER

Shiller, D. M., S. Rvachew, et al. (2010). “Importance of the auditory perceptual target to the achievement of speech production accuracy.” Canadian Journal of Speech-Language Pathology and Audiology 34: 181-192. (http://bit.ly/PSlmXk)

Leave a comment


  1. Hi Susan—Thank you for your comments on my paper. I do regret to see that it was a “disheartening” read for you, since I admire your work and have always encouraged my students to include perception-oriented therapies in their intervention for clients with phonological disorder. If I may, I’d like to clarify a few points about my paper.

    I am concerned that your readers might take away a mistaken impression of my paper based on your statement that I “advis[e] speech-language pathologists to focus treatment on speech production while ignoring the child’s obvious difficulties with speech perception.” As you note elsewhere in your discussion, I do not question the finding that perceptual limitations can give rise to deficits in production. I undertook this study to investigate the possibility that the opposite direction of causation might also be at play in the very specific context of “neutralization in strong position” (where children produce overt errors on a contrast like /t/-/k/ in initial position but not in final position.) The results of my investigation did suggest that my case study subject’s pattern of neutralization in strong position arose from a primary deficit in the production domain—more on this below. However, I made sure to state that “further investigation is needed to establish whether the pattern of perception documented in [case study subject] Ben can be observed in children with neutralization in strong position more generally,” and I did not make any comment at all on patterns that do not involve neutralization in strong position. I would urge your readers to regard my paper as an exploration of a theoretical issue—with implications for clinical practice in a highly circumscribed subset of cases—rather than a blanket statement of support for output-oriented approaches to intervention.

    Your readers might be wondering why I am so interested in this topic of neutralization in strong position in child speech. If you look at it from the perspective of adult phonological typology, it is a truly strange phenomenon. Besides Redford & Diehl (1999), numerous experimental studies have documented that—independent of place or manner of articulation—speech sound contrasts in prevocalic contexts have greater perceptual salience than postvocalic or coda contrasts (Dorman, Raphael, Liberman, & Repp, 1975; Fujimura, Macchi & Streeter, 1978; Ohala, 1990). Adult phonologies generally behave accordingly, preserving contrast in contexts where it is easiest to perceive and neutralizing in contexts of low perceptual salience. From a perceptual standpoint, it is intriguing that child patterns like positional velar fronting (e.g. Inkelas & Rose, 2007) turn this bias on its head, preferentially neutralizing contrast in a context where it is has the greatest perceptual salience. One possible explanation, pursued by Dinnsen & Farris-Trimble (2008), is that children with neutralization in strong position have a qualitatively different perceptual bias than typical adult listeners. This hypothesis was not supported by my investigation, since case study subject Ben (who exhibited neutralization in strong position for both velar-alveolar and fricative-glide contrasts) discriminated phonemic contrasts with significantly greater accuracy in initial than final position. The other logical possibility is that neutralization in strong position is the product of limitations in the production domain. Regrettably, there was not enough room in the Clinical Linguistics & Phonetics paper for me to lay out my reasons for believing that patterns of neutralization in strong position are driven by characteristics of the immature motor control system. However, I make a complete case for an articulatory interpretation of Ben’s pattern of positional velar fronting in a paper in the Journal of Child Language (McAllister Byun, 2012); readers may also be interested in the original articulatory account of positional velar fronting put forward by Inkelas & Rose (2007).

    Regarding Munson’s extraterrestrial listener, I should mention that I did measure both burst properties and formant transitions in my case study subject’s velar and alveolar productions, and I failed to uncover evidence of covert contrast. However, I did not discuss this in my paper, since I don’t regard my own failure to detect a measurable contrast as proof that no contrast was present! Fortunately, my analysis does not depend on the assumption of total neutralization. Even if my case study subject’s fronted velars were in fact intermediate between /t/ and /k/ targets, these productions would still have the effect of bringing the distribution of acoustic outcomes associated with /k/ targets closer to the acoustic distribution associated with /t/ targets. Since targets that are acoustically well-separated are easier to learn than targets that are close together (e.g. Kuhl et al., 1997), this change in the acoustic distribution could have a negative impact on perceptual discrimination accuracy, even if we are looking at a case of near-merger rather than complete neutralization.

    Lastly, I do agree with you that “the safest assumption is that the child’s knowledge is incomplete at all levels and the most prudent course of action is to address all three.” However, I still believe that it is theoretically interesting to explore the possibility that different children embody different directions of causality, and as technological advances like EPG make us more accurate and more efficient in interpreting the evidence from any given child, I hope that we will be able to fine-tune our intervention to a child’s particular profile.

    Thank you for providing a forum to discuss these fascinating issues. I look forward to hearing you and Francoise present on input-oriented intervention at ASHA.

    Tara McAllister Byun

    Dinnsen, D. A., & Farris-Trimble, A. W. (2008). The prominence paradox. In D. A. Dinnsen & J. A. Gierut (Eds.), Optimality Theory, Phonological Acquisition and Disorders (pp. 277-308). London: Equinox Publishing Ltd.

    Dorman, M. F., Raphael, L. J., Liberman, A. M., & Repp, B. (1975). Some maskinglike phenomena in speech perception. Haskins Laboratories Status Report on Speech Research, SR-42/43, 265-276.

    Fujimura, O., Macchi, M. J., and Streeter, L. (1978). Perception of stop consonants with conflicting transitional cues: A cross-linguistic study. Language and Speech, 21, 337-346.

    Inkelas, S., & Rose, Y. (2007). Positional Neutralization: A Case Study from Child Language. Language, 83, 707-736.

    Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., et al. (1997). Cross-language analysis of phonetic units in language addressed to infants. Science, 277, 684-686.

    McAllister Byun, T. (2012). Positional velar fronting: An updated articulatory account. Journal of Child Language, doi:10.1017/S0305000911000468.

    Ohala, J. J. (1990). The phonetics and phonology of aspects of assimilation. In J. Kingston & M. Beckman (Eds.), Papers in Laboratory Phonology I: Between the grammar and the physics of speech (pp. 258-275). Cambridge: Cambridge University Press.

    • Hello Tara, Thank you for your extended reply to my post. I do think that it is interesting for readers to have a look at the theoretical perspective within which you framed your argument that this particular child presented with a problem that was primarily motoric in origin, leading in turn to the very clear difficulties with perception that you documented. I did indeed find the theoretical parts of your paper to be a very interesting read and it appears that we agree on many points. Ultimately however I believe that the final conclusion you came to was that neutralization in strong position is a sign of a motoric problem, requiring a motoric approach to treatment on the part of the speech-language pathologist. On this point I cannot agree. Susan Nittrouer has shown that acoustic cues to phoneme identity and cue weighting strategies differ within phonemes and across position – I don’t think that it is helpful at all to designate the onset as “strong position” for all phonemes, all languages and all listeners. Secondly, for the reasons mentioned in my post, I do not think that your perception test or the articulatory data, as presented in the paper, confirms that the problem is fundamentally motoric. I have acknowledged that there may be cases where consistent misarticulations due to a motoric problem contribute to fuzzy perceptual representations but more adequate perceptual testing and a different research design would be required to establish that for any given child. With respect to your discussion about covert contrast, I would be interested to see how those acoustic cues were distributed across /k/ and /t/ targets in the onset and coda positions. Munson et al. present covert contrast as an example of the limitations of phonetic transcription as a description of a child’s articulatory knowledge. In the covert contrast example, phonetic transcription suggests no contrast but acoustic analysis or epg imaging shows that the child has systematically different articulatory gestures for contrasting targets (thus demonstrating that indeed the child does have knowledge of the contrast in question). In Ben’s case we have kind of the opposite of covert contrast. We have an apparent contrast in “weak position” (the coda) despite the apparent absence of a contrast in the “strong position” (the onset). I have questioned whether the child is producing the /k/ correctly in either position (onset or coda) – without epg and other data describing the topography of his articulatory gestures it is difficult to be sure. I have suggested that he may have partial knowledge of the contrast in both perceptual and articulatory domains in both positions. This is quite a different conclusion to yours with a different clinical implication. It does worry me that clinicians might avoid perceptual approaches to intervention every time that they see neutralization in strong position – a pattern that is very common for the velars. Thank you again for your detailed response. Susan

  2. Hi Susan–Thanks for your reply, and just a quick follow-up: My colleague Adam Buchwald and I are launching a study that will collect ultrasound and acoustic data from children with positional velar fronting. If your readers don’t mind staying tuned for a year or so, we can revisit this question with a bit more data!



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: