Speech Therapy and Speech Motor Control: Part 2

Speech Therapy and Theories of Speech Motor Control: Part 2

In Part 1 of this blog series I described the theoretical basis of Dynamic Temporal and Tactile Cueing as recently published by Edy Strand. Specifically, the treatment is founded on Schmidt’s Schema Theory in which generalized motor programs are learned. During speech production the child must select the right program and apply the correct parameters before implementing it all at once. If the parameters are selected incorrectly, a speech error will occur. It is rather like making toast. If you forget to reset your settings after toasting bagels, your Wonderbread will come out black! The problem as stated by Schmidt is that by the time you realize that your toast settings are wrong and your motor gestures are off track, it’s too late— the toast is burned and you have said “Trat! Doast!” Learning occurs by “trial and error” — after much experience with your toaster you learn the settings (parameters) for getting the right amount of toastiness for different items. Learning to operate your toaster is similar to acquiring one “generalized motor program.” Speech motor learning is assumed to operate this way because sensory feedback is too slow to support on-line adjustments to the parameters in a direct way. I used a different analogy in the previous blog — once you have committed to swinging your golf club, you tend to follow through.

The problem with this model of speech motor control is that we know for certain that real time modification of vocal tract movements occurs in response to somatosensory and auditory feedback. Strangely we have known since the early eighties that the speech system is highly sensitive to error on-line; therefore, I don’t know why this idea of open-loop control persists. The proof comes from studies in which (typically) an adult is asked to repeatedly produce a particular syllable or disyllable and then experiences a perturbation in sensory feedback (either somatosensory feedback or auditory feedback). An early example of this paradigm involved productions of “aba”: during 15% of trials a mechanism placed an unexpected load on the talker’s lower lip. Here is where it gets interesting: the research participants corrected for this perturbation in the articulatory trajectory of the bottom lip very rapidly with compensatory actions of the top and the bottom lip (the bottom lip would need to exert greater upward force and the top lip would need to produce greater downward extent in order to produce the labial closure and the expected transitions into and out of the consonantal closure). Decades of experiments have followed involving many other perturbations in the domain of articulatory gestures, somatosensory (skin) sensations, and auditory feedback. For example, while the research participants are repeatedly saying “bed” you can trick their ear into thinking they are saying “bad” which leads to compensatory adjustments in articulation to get the expected auditory percept.

This kind of dynamic compensation across the entire vocal tract is made possible by an “internal model” — a neural model that simulates the behavior of a sensorimotor system in relation to its environment. The internal model can generate a prediction of the sensory consequences of implementing a motor plan via simulation. For speech, future outputs in the somatosensory and auditory domains are simulated; furthermore, the simulator takes into account delayed sensory feedback, noise in the perceptual system and other variables so that when feedback arrives it can be compared with the prediction and provide reliable error messages. Continuous tracking of the vocal tract state is thus permitted and forms the basis for ongoing planning of movements as speech unfolds. If an unexpected event occurs, as in the perturbation experiments that I have described, error corrections are dynamic across the entire system; therefore, if the predicted trajectory of acoustic formant transitions from the [a] into the [b] closure is not occurring, lower lip, upper lip, jaw and tongue movements can all be harnessed to produce the desired outcome.

As Houde and Nagarajan (2011) explain, “speech motor control is not an example of pure feedback control or feedforward control” (p. 11). The acquisition of speech motor control is dependent upon the development of the internal model of vocal tract function as well as detailed knowledge of auditory targets. This understanding has implications for the treatment of childhood apraxia of speech. I will explore these implications further in the next and final blog in this series.


Abbs, J. H., & Gracco, V. L. (1983). Sensorimotor actions in the control of multi-movement speech gestures. Trends in Neurosciences, 6, 391-395.

Houde, J. F., & Jordan, M. I. (2002). Sensorimotor adaptation of speech I: Compensation and adaptation. Journal of Speech, Language & Hearing Research, 45(2), 295-310.

Houde, J. F., & Nagarajan, S. S. (2011). Speech production as state feedback control. Frontiers in Human Neuroscience, 5, doi: 10.3389/fnhum.2011.00082.

Tourville, J. A., Reilly, K. J., & Guenther, F. H. (2008). Neural mechanisms underlying auditory feedback control of speech. NeuroImage, 39, 1429-1443.

Speech Therapy and Theories of Speech Motor Control: Part I

Edy Strand recently published a detailed description of her Dynamic Temporal and Tactile Cueing treatment strategy. As she says this is a hugely valuable paper because it provides a complete description of a treatment designed for severe speech sound disorders, especially Childhood Apraxia of Speech, and more importantly, it summarizes in one place the theoretical foundation for the treatment. I think that, on the whole, this is an efficacious treatment although there are some procedures, derived directly from the outdated theoretical underpinnings, that are questionable however, and therefore I am going to devote several blogs to more recent theory and basic science research on the development of speech motor control and apraxia of speech. In this first blog, I review Schema Theory, even though this theory is just not right! But it has a long history and remains currently popular across almost all clinically-oriented papers on motor speech disorders.

The theory that is referenced in Edy Strand’s paper is Richard Schmidt’s “Schema Theory of Discrete Motor Skill Learning,” published in Psychological Review in 1975 and subsequently brought to speech-language pathology by Ray Kent and others as a useful framework for thinking about speech therapy. The important idea underlying this theory is that motor skills are made up of brief, discrete motor acts that are executed all-at-once as open-loop generalized motor programs, adapted with specific response specifications (called parameters) for the current conditions. The theory assumes “open-loop” control because sensory feedback is often too slow to impact movement after it has started. According to this theory feedback is processed after the movement is over and incorporated into the schema for the future execution of the generalized motor program. I have used golf as an example before; even though I haven’t played much in years let’s do it again: if we are adopting this theory we would think of practice sessions as developing different generalized motor programs for each type of shot, a long drive, a short 7-iron shot, the up-and-down pitch onto the green, and the putt into the hole. Which shot you choose depends upon your recall schema: what is your target and which type of shot is likely to achieve it? I personally recall that when close to the green my pitch is better than my chip (whereas my husband has the opposite preference). How you address the ball depends upon the initial conditions (flat ground, hill, tall grass etc.). The motor control parameters (also known as response specifications) depend upon the distance to the target (how high to lift the club, speed of follow through, force applied and so on). Based on the initial conditions and the desired outcome, I launch the shot with my wedge, expecting a certain “feel” as I hit the ball based on past experience with the sensory consequences of hitting this shot; I can always “recognize” a good hit even before I see the ball land (often I just turn my back on the ball, I don’t even want to see it land!). But in any case, the actual outcome is important for updating the “recall” schema; specifically, if I have actually achieved my target, I add all this information, the initial conditions, the response specifications, the recognition schema and the recall schema to my memory. The generalized motor program is an abstraction across all these remembered practice trials, permitting correct specification of the response parameters in future shots. Furthermore, I should be able to adapt the generalized motor program to similar shots, even if the ball is a little further or closer to the green for example.

When applied to CAS, in which current research suggests unreliable or degraded somatosensory feedback, the use of this model focuses attention on the child’s processing of initial conditions, inaccurate planning or programming of the movement due to poor selection of response specifications, and/or poor recognition schema (not knowing when the movement “feels right”). Therefore, certain procedures are recommended. DTTC providers use manual or gestural cues to shape the child’s articulators into the “initial position” and encourage the child to “hold” the position momentarily so as to fully process those initial conditions before launching the movement. During the initial stages of therapy, the SLP uses a slow rate and co-production so that the child is getting extra feedback during the practice trial, presumably with the goal of stabilizing the recognition schema. Imitative models support the child’s knowledge of the target which, when combined with copious knowledge of results feedback should support the development of recall schema. And finally, a great deal of practice with an errorless approach ensures that the child lays down many memory traces of correctly executed motor programs.

The recommendations that are provided make a certain amount of sense given the context of schema theory (even though there is in fact no evidence for the specific efficacy any one of these particular procedures). The problem is that it is not clear that schema theory is a reasonable foundation for modern speech therapy practice.

First, citing Richard Schmidt himself, he cautioned in 2003 that “schema theory was intended to be an account of discrete actions. Hence, continuous actions, such as steering a car or juggling, which are both of longer duration (allowing time for response-produced feedback to have a role) and more based on the performer’s interactions with the environment were outside the area for schema theory…long-duration actions might be based on interplay between open-loop subactions and feedback-based corrections… . Interestingly, tasks such as juggling seem appropriate for analysis in terms of the dynamical systems perspective” (p. 367). I would argue that our understanding of, not only juggling, but speech motor control has benefited immensely from the dynamical systems perspective and I will come back to that in the next blog. If juggling is considered too complex and continuous to be explained by schema theory, probably speech is not a good fit either.

Second, modern theories of speech motor control have shown that on-line correction of motor action even over short durations occurs despite the limitations of feedback control. The explanation lies in the continuous operation of feedforward control mechanisms. More on feedforward control in another blog.


Rvachew, S., & Brosseau-Lapré, F. (2012). Developmental Phonological Disorders: Foundations of Clinical Practice. San Diego, CA: Plural Publishing.

Schmidt, R. A. (1975). A schema theory of discrete motor skill learning. Psychological Review, 82(4), 225-260. doi:10.1037/h0076770

Schmidt, R. A. (2003). Motor schema theory after 27 years: Reflections and implications for a new theory. Research Quarterly for Exercise and Sport, 74(4), 366-375.

Strand Edythe, A. (2019, Early View). Dynamic Temporal and Tactile Cueing: A Treatment Strategy for Childhood Apraxia of Speech. American Journal of Speech-Language Pathology. doi:10.1044/2019_AJSLP-19-0005

Using Phonetics to Teach Phonology

Francoise and I have been working on the second edition of our book for some time now and the book is finally in the production stage – counting down to a December 2016 release date. One of the decisions we have had to make is whether to keep all the figures that were in the first edition – we must pay the copyright holders (note: not the authors!) in order to gain the right to reproduce all those figures and tables in our book. It is a difficult decision for each and every figure given that the costs vary from approximately $100 to $1000 per figure and there are 99 of them in the book!

Consider the figure shown at the bottom of this post – it illustrates data from some research by Goffman and Malin (1999) in which adults and children produced nonsense words with either a trochaic stress pattern (strong-weak) or an iambic stress-pattern (weak-strong). Kinematic tracings of lower lip movements are shown. The surprise was that the children modulated the stress pattern of the iambic words in a fairly adult-like manner, albeit with less consistency than the adults. The children did not modulate the stress pattern of the trochaic word, producing it like a spondee, with equal stress on both syllables, which was an unexpectedly immature pattern. Why did I choose to keep this figure in a book on phonology? Surely the whole point of phonology is to convert speech to an abstract form like this: [ˈpʌpəp] and [pəˈpʌp]? In the end I decided that I wanted to keep it because I so much want my students to see it – it encapsulates so many primary themes in our book, as follows:

  1. Basic concepts are essential to understand, and for multilingual students in particular, the figure provides a beautiful visual representation of trochee, spondee and iamb that is much more effective than a string of phonetic symbols.
  2. What you get is not always what you hear! If you were to transcribe the child saying the word “puppet” with the kinematics shown in the lower left quadrant of the figure, the odds are that you would produce [ˈpʌpət] which would represent what you expect to hear rather than exactly what the child said. I spend quite a bit of time talking about the limits of phonetic transcription in the first chapter of the book.
  3. The development of prosody is fundamental to the development of phonology: prosodic frames – word templates made up of syllable shapes and stress patterns that are characteristic of ambient language – emerge early and support the acquisition of phonemes. These two levels of the phonological hierarchy are intimately interconnected – it really is time to stop teaching linear phonology.
  4. Phonology is fully dependent upon phonetics – you cannot understand phonological development without understanding the articulatory and perceptual substrates.
  5. Having said that, it is not true that phonological development is determined by maturation of the motor system. If it were, the trochaic pattern would emerge first, before the iambic stress pattern, whereas the reverse is shown in the figure. This demonstration can be the trigger for an interesting discussion of competing approaches to intervention.
  6. The figure is a beautiful illustration of the operation of lexical contrast. Why does the child learn to modulate the strong-strong stress pattern to produce a weak-strong iamb before properly mastering the (for English) canonical strong-weak pattern? Because they must do that in order to produce a contrast between these two word templates in the minds of the listener.
  7. The figure is a lovely illustration of how phonology emerges from the dynamic interplay of phonetic, semantic, and social factors with a dynamic systems approach to development being a coherent thread throughout the book.

The thing about a book however is I can only build possibilities into it – the teaching and the learning is constrained by the imagination of the teachers and the learners. I don’t know how many readers will discover in a paragraph on the development of “interarticulator coordination” a plethora of important messages about the development of phonology.

Figure 3-7

Figure 3–7. Time and amplitude normalized kinematic tracings of displacement of the lower lip during productions of the nonsense words [ˈpʌpəp] (left) and [pəˈpʌp] (right), recorded from an adult (top) and child (bottom). The corresponding spatiotemporal indexes for the repeat productions shown are: (A) adult trochee STI = 8.56, (B) adult iamb STI = 8.99, (C) child trochee STI =18.15, and (D) child iamb STI = 14.24. Adapted from Goffman & Malin (1999). Metrical effects on speech movements in children and adults. Journal of Speech, Language, and Hearing Research, 42, Figure 5, p. 1009. Used with permission of the American Speech-Hearing-Language Association.


Rvachew, S., & Brosseau-Lapré, F. (2018). Developmental Phonological Disorders: Foundations of Clinical Practice (Second Edition). San Diego, CA: Plural Publishing. http://pluralpublishing.com/publication_dpd2e.htm

Goffman, L., & Malin, C. (1999). Metrical effects on speech movements in children and adults. Journal of Speech, Language and Hearing Research, 42, 1003-1015

(edited on August 26, 2016 to correct copy-right date for DPD2e. The second edition will be released in December 2016)

Support for Speech Perception Interventions in Speech Therapy

I am writing a third blog on this strange experimental protocol in which the talker produces a syllable repeatedly and the talker’s speech output is altered in a systematic fashion so that the talker hears him or herself say something that does not correspond to their own articulatory gestures. I am fascinated by these experiments because they are a window onto feedback control which is essential for a successful speech therapy outcome. Initially in traditional speech therapy the SLP is providing a lot of external feedback about the child’s articulatory gestures (knowledge of performance feedback) and the correctness of the child’s speech output (knowledge of results feedback). But given that the SLP cannot follow the child around outside the clinic room, eventually the child must learn to use self-generated feedback for speech motor learning to occur. Can children use auditory feedback to change their own speech?

In a previous blog, On Birds and Speech Therapy, I discussed interesting work from Queen’s University  suggesting that toddlers do not use feedback control like adults do during speech motor learning.  These researchers found that adults will compensate for perturbations of their own speech by adjusting their articulation to get the desired auditory feedback. In contrast, very young children do not compensate in this way. I suggested that this may be because toddlers do not perceive speech with the same degree of precision as adults. This hypothesis was supported by another study in which speakers of French and English did not show the same compensation effect to a perturbation that made their vowels sound like a French vowel. The English talkers did not respond to a perturbation to which they were not perceptually sensitive (see Feedback Control and Speech Therapy Revisited).

Recently, I was delighted to find another study involving children provides even stronger confirmation that perceptual representations play a key role in the child’s ability to use feedback for speech motor learning. Shiller and Rochon (2014)  randomly assigned 5- to 7-year-old children with typical speech to two training conditions: the control group received speech perception training for the /b/-/d/ contrast; the experimental group received speech perception training for the /ɛ/-/æ/ contrast. Prior to and subsequent to this training both groups experienced the perturbation experiment: both groups repeated said “Beb” while their own speech was altered to sound more like “Bab”. Prior to perceptual training, both groups showed a small compensation for this perturbation in the feedback of their own speech. After speech perception training the experimental group showed twice as much compensation as before whereas the control group showed no change in the amount of compensation.  The results show that children can indeed use feedback for speech motor adaptation; furthermore, this ability improves as perceptual boundaries between phoneme categories become better defined —with age or with training.

The conclusions of the study are very gratifying. Citing my own work on the importance of speech perception training as a strategy to facilitate speech production learning by children with speech sound disorders, the authors conclude:

“The results of the present study complement this work nicely, demonstrating that improvements in children’s auditory perceptual abilities do not simply improve motor performance, but also alter the capacity for auditory-feedback based speech motor learning—a process that is central to the clinical treatment of speech production disorders.” (p. 1314)

No surprise that I like this study a lot!

Feedback Control and Speech Therapy Revisited

In August 2012 I posted a comment about MacDonald, E. N., Johnson, E. K., Forsyth, J., Plante, P., & Munhall, K. G. (2012). Children’s development of self-regulation in speech production. Current Biology, 22, 113-117. (see On Birds and Speech Therapy). In this paper the authors reported that toddlers did not compensate for perturbations of their own vowel formants and they concluded that toddlers “do not monitor their own voice when speaking in the same way as adults do”. I was skeptical of this claim since it is hard to imagine how children learn to talk at all if they do not have access to feedback control mechanisms. I suggested that perceptual explanations would make more sense and now there is published evidence that this is indeed the case, interestingly from a paper including Munhall as author, specifically, Mitsuya, T., Samson, F., Ménard, L., & Munhall, K. (2013). Language dependent vowel representation in speech production. Journal of the Acoustical Society of America, 133, 2993-3003.

The paper is fascinating because it shows that English and French talkers to not show the same compensation effect when participating in this experimental paradigm and when the vowels involve French rounded vowel categories (i.e., English talkers do not change their own speech to compensate to a perturbation that makes their own speech sound more like a French vowel whereas French speakers do). Furthermore, the amount of compensation that a talker produces is related to the talker’s underlying phonological representation of the vowel space, as represented in acoustic-phonetic terms. In this study, when the English listeners did not respond to the particular perturbation of their vowel formants that was used, the researchers did not conclude that English people are incapable of using feedback control mechanisms! Rather they concluded that “the function of error reduction itself appears to be language universal, while detection of error is language specific.” However, the use of feedback for error reduction is dependent upon the talker’s perception of the feedback which in turn is related to the listener’s phonological representations (previously this was not clear because the research participants are not always consciously aware of the way that the experimenters are manipulating their speech).

Obviously the same logic should be applied to the toddlers’ apparent failure to use feedback control in a similar experimental manipulation in which the toddler’s speech was changed from one English vowel to sound a little bit more like another English vowel. In fact, a perceptually motivated interpretation is favoured in Mitsuya et al.; when referring back to McDonald et al. they say “a stable phonemic representation is required for error detection and correction in speech, and sometime between 2 and 4 yr of age such a representation emerges and stabilizes.” This is not the interpretation that made the headline in Science Daily but it is the conclusion that makes more sense to me.

What are the implications for speech therapy? The research clearly supports my view that it is essential to ensure that your clients with speech sound errors have stable perceptual and phonological representations – this is a critical component of a treatment program aimed at establishing speech motor control and speech articulation accuracy As Mitsuya et al suggest, the acoustic target for speech is not just the phonetic category itself but the target category in relation to its neighbors. The treatment approach that I have always advocated is focused on phonemic perception: the important procedures include presenting the child with a large population of variable exemplars of the target category. These exemplars should identify the centre of the category, highlighting the important cues and the prototypical characteristics, while also allowing the child to explore the edges of the category so that the child can experience it in relation to similar but contrasting categories. Thus SAILS  presents the child with a task in which highly variable stimuli are judged to be the TARGET or NOT THE TARGET and some of the stimuli are rather ambiguous. SLPs do not always like the fact that not all of the stimuli are prototypical exemplars of the target category but in fact this amount of variability is important for the establishment of phonological representations. Mitsuya et al.’s paper is important because it reinforces the point that stable acoustic-phonetic representations for speech targets are essential for the use of feedback control in speech motor learning.

On Golf and Speech Therapy

Last weekend I didn’t write a blog post because the weather was spectacular and I was having too much fun enjoying the new deck at my cabin and playing golf. This weekend it is raining so I have plenty of time to reflect on the decline in my golf skills since I gave it up three years ago to devote my weekends to writing a book. My daughter says I should go get a lesson but I am too embarrassed to do that because I am in such poor physical shape. I am not convinced it would help in any case because my husband and I used to go get tune up lessons in the spring and I was never convinced that these were a good investment. The instructors would give me a bucket of balls and leave me by myself while concentrating on my husband. This used to annoy me no end – I thought it was some sort of sex discrimination thing until I complained one time and the instructor says, “no, no – you really don’t need my help, you have a perfect swing, just keep practicing as you are”. Imagine my surprise! If I had a perfect swing, why was my score so awful (even before I gave up golf to write a book my scores were pretty awful but I had a terrific 200 yard drive so my score didn’t bother me so much). Anyway, when I was writing chapter three, I had to study up on theories of speech motor control and I figured out why my perfect swing wasn’t much good to me.

Practice Conditions

The problem is related to the vast difference in the practice conditions for golf relative to the actual playing conditions. Golf lessons and most practice sessions occur at a driving range or a golf dome as illustrated here: the terrain is perfectly flat and the practice mats are positioned to ensure that your body is aligned square to the target line. During practice it is common to hit many balls with the same club, concentrating on executing the same motor plan with a high degree of precision.

Play Conditions

Play Conditions

Our playing conditions are vastly different since our cabin is located 5 minutes from a course where, for $1500 a year the whole family can play as often as we have time for, with carts and no tee times – just show up and play nine holes when we feel like it, it’s wonderful. There is a hitch though and that is that it’s pretty much the worse golf course in the world except for the scenery. There isn’t a flat spot on it and that includes the tee boxes. Every time you hit your ball you are likely to end up with a bad lie like this one (ball below feet, basically hitting off gravel). All the precision in the world with my perfect swing is not going to help me hit this ball. What I need to do is process the initial conditions accurately and select a motor plan that is going to get the club to the ball given those conditions. Looking at the picture I can tell that my ball is too far back in my stance but at the time I was quite unaware that I had positioned myself incorrectly relative to the ball – often my problem is one of poor information processing that leads me to essentially select the wrong initial conditions for the purpose of predicting which motor plan will have the desired effect.  Poor execution of my swing is not the problem. Unfortunately when I achieve the inevitable bad result I start to adjust my swing which just makes everything worse. Instead, I need to focus on processing aspects of the context so that I can adapt my set-up to the initial conditions: First, what is the gradient of the slope between my feet and the ball? How much of an incline is there in the lie of the ball? Where is the target relative to the ball? And then, where are my feet relative to the ball? How wide is my stance? How bent are my knees? Where is my centre of gravity? Are my shoulders aligned with the slope of the hill?

So what has golf got to do with speech therapy? Given that speech is also a motor learning problem, the same principles of motor learning that apply to golf apply to speech learning. I have spent a bit of time this summer watching therapy videos as part of the treatment fidelity process in the randomized control trial that Françoise and I recently completed. I see student clinicians and sometimes the experienced SLPs conducting therapy sessions like practice at the golf dome – aiming for precision rather than dynamic stability. I think the students should know better because I taught them the principles of motor learning using Maas et al.’s (2008) excellent Tutorial http://bit.ly/Ta9STv in which the authors stress the distinction between learning and performance. Performance during practice may or may not transfer to untrained movements in nonpractice contexts. Maas et al. discuss a number of different strategies to enhance transfer of training to similar but unpracticed movements. Although the research findings are complex and often difficult to interpret, it appears that overall it is best to practice under conditions that afford a wide range of experience with varying initial conditions and movement outcomes. At the golf dome the best one can do is switch clubs and targets often. In speech therapy, practicing the target phoneme in many different words so as to vary phonetic context is often a good strategy. I think that novice and experienced SLPs know that variable practice conditions are important but it is not always easy to implement this principle for two reasons. The first is that performance levels are higher under constant than variable practice conditions and it is reinforcing to both clinician and patient to achieve high levels of accuracy during therapy sessions (the distinction between practice performance and actual learning is hard to keep in mind). The second is more fundamental: the goal of the therapy exercise is not itself clear to SLP or patient. In some ways, Maas et al.’s Tutorial contributes to this confusion of aims by focusing on motor programming and motor programs. Therapy sessions are conducted as if the goal is to perfect the specification and execution of a particular motor program. I prefer Wolpert’s approach to motor learning http://bit.ly/OE8VT5 (take a look at this if only for the Calvin and Hobbe’s cartoons). Wolpert and colleagues (2001) describe motor control “as the process of transforming sensory inputs into consequent motor outputs. The problem of motor learning is one of mastering and adapting such sensorimotor transformations” (p.488). We can think of speech therapy as the process of helping the patient to process the sensory input so as to transform them into the desired motor outputs. An approach to motor learning that takes into account information theory and information processing is the “challenge point framework”, described by Guadagnoli, M. A. and T. D. Lee (2004) http://www.tandfonline.com/doi/abs/10.3200/JMBR.36.2.212-224#preview. Françoise and I are going to teach a seminar about how to apply this framework to speech therapy at ASHA 2012 in Atlanta this fall:

Topic Area: Speech Sound Disorders in Children (SLP)
Session Number: 1530

Title: Application of the Challenge Point Concept to Developmental Phonological Disorders

Session Format: Oral Session (Seminar 2-Hours)
Day/Time: Saturday, Nov 17 — 03:00 PM – 05:00 PM

Authors: Susan Rvachew, McGill U; Francoise Brosseau-Lapre, McGill U

On Monkeys and Speech Therapy

A few months back Science Daily published yet another article about the possible evolutionary origins of speech (see Monkey Lip Smacks Provide New Insights into the Evolution of Human Speech, May 31, 2012: http://www.sciencedaily.com/releases/2012/05/120531135641.htm). Speculating about the evolutionary origins of speech and language is an academic parlour game of some interest to me but like any other sport I find it more entertaining to watch than participate. However, as with other sports, the game sometimes spills over into real life and causes some damage to innocent bystanders and thus I find it necessary to comment in this case.  

 The Science Daily article is based on a study by Ghazanfar and colleagues that used x-ray movies to observe the functional coordination of vocal tract structures during the production of lip smacks and chewing in adult monkeys (http://www.sciencedirect.com/science/article/pii/S0960982212004757). Another study that reported the rhythmic structure of lip smacks and chewing in infant, juvenile and adult monkeys is also relevant (http://onlinelibrary.wiley.com/doi/10.1111/j.1467-7687.2012.01149.x/abstract). The authors are following from the frame/content theory put forward by MacNeilage (1998: http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=29997). MacNeilage emphasizes the syllable as the “an organizational superstructure for the distribution of consonants and vowels” that “evolved from ingestive cyclicities (e.g., chewing).” Then he goes further and suggests that since “ontogeny recapitulates phylogeny in the realm of human motor function”, speech must arise from ingestive cyclicities in developmental time as well. This is where the parlour game gets dangerous. I don’t think that it is any accident that shortly after this time a whole host of speech therapy approaches, books, kits, videos, workshops and websites devoted to “oral motor therapies” sprang up with the express purpose of providing “a stable foundation for speech by first addressing instability in the jaw, lips and tongue” (http://speech-language-pathology-audiology.advanceweb.com/article/oral-motor.aspx).  The explicit rationale for these approaches is that “motor skills in feeding and non-speech movements act as prerequisites to speech clarity. Feeding and non-speech activities are targeted prior to speech production tasks to ensure adequate muscle functioning is available”. It has taken a decade of kinematic and electromyographic studies in infants and young children to both prove and transmit the message that chewing and speech are not related to each other. As Francoise and I describe in detail in our book (http://www.pluralpublishing.com/publication_dpd.htm) the muscle activation patterns for these two functions are completely different with reciprocal activation of agonist and antagonist mandibular muscle groups during chewing versus coactivation of these muscle groups during speech. More importantly, as shown by Steeve et al (2008) (http://www.ncbi.nlm.nih.gov/pubmed/18664699?dopt=Abstract) muscle activation patterns for chewing and babble are both uncoordinated in young infants and thus it is not true that speech emerges from a previously established “ingestive cyclicity”. Rather, speech and nonspeech oral behaviors involve distinct coordinative structures that develop along divergent but parallel paths. Clinical research is now emerging on the foundation of this basic research with some small sample studies showing that nonspeech oral motor exercises are not efficacious(http://www.uwo.ca/fhs/csd/ebp/reviews/2011-12/Peter.pdf).

Now, back to Science Daily. The thing is that MacNeilage (1998) also proposed that “an evolutionary route from ingestive cyclicities to speech is suggested by the existence of a putative intermediate form present in many other higher primates, namely, visuofacial communicative cyclicities such as lipsmacks, tonguesmacks, and teeth chatters.” The hypothesis of these intermediate forms must explain why the adherents of this theory are not at all concerned about a decade of research showing that speech and chewing in humans are not functionally or developmentally related in any fashion. In fact, the study trumpeted by Science Daily makes the point that the functional coordination of vocal tract structures is distinct during chewing versus lip smacks. Furthermore this research team claims that chewing and lip smacks develop along divergent paths in the monkey, with chewing achieving a slow stable rhythm at a young age whereas lipsmacks require a longer period to achieve stability at a faster rhythm. Notwithstanding the whole “ontogeny recapitulates phylogeny” thing, this is taken as evidence for the frame-content theory because speech in the infant shows a similar developmental trajectory, beginning with a slow a variable rhythm and finishing with a fast and stable rhythm. The fact that silent jaw wags, proposed by MacNeilage as a human equivalent of lipsmacks, are actually slow and not fast, doesn’t seem to bother them. In terms of clinical implications, the fact remains that the coordinative structures for communicative and ingestive behaviors develop along divergent paths in monkeys and in humans (for further evidence see Shephard et al. http://www.jneurosci.org/content/32/18/6105.abstract). Practice in one domain does not generalize to the achievement of motor control in the other domain.

I must admit that I found MacNeilage’s argument hard to follow the first time – it is even less clear now. But as I say, speculating on events that occurred two to six million years ago is a game best left to those who play it often. For myself, my concern is for children whose speech therapists believe unwisely that chewing (or lipsmacking) is a prerequisite for speech development. The notion that some level of oral-motor maturation is required for speech therapy is persistent and leads to two harmful practices – waiting too long to implement therapy to improve speech production accuracy or preceding speech therapy with useless exercises directed at jaw stability, tongue strength and the like. Throughout our book Françoise and I stress that “maturation of articulatory and neurophysiological structures and developmental changes in sensory feedback systems are not the key explanatory factors in speech development.” Rather than viewing structure as a limit on function, we believe that it is the child’s drive to function like other members of the human community that motivates practice, and practice itself causes the development of speech motor control.