Speech Therapy and Speech Motor Control: Part 3

In two previous blogs I discussed a recent paper by Strand in which she outlines in detail the theoretical foundation and procedural details of Dynamic Temporal and Tactile Cueing (DTTC) as a treatment for Childhood Apraxia of Speech (CAS). In Part 1 I suggested that the theoretical base, being Schmidt’s “Schema Theory of Discrete Motor Skill Learning,” was outdated. In Part 2 I discussed modern theories of speech motor control that assume a dynamic interplay of feedforward and feedback control mechanisms. In this blog I will discuss the implications for speech therapy, in relation to critical aspects of DTTC.

First, let us consider the core element of DTTC, “the focus on the movement (rather than the sound or phoneme) in terms of modeling, cueing, feedback, and target selection” (p. 4). I believe that all of us who strive to help children with CAS acquire intelligible speech agree that speech movements are the focus of speech therapy, as opposed to phonological contrasts. Nonetheless, this statement raises questions about the nature of “speech movements.” What is the goal of a speech movement? The answer to this question is controversial: it may be a somatosensory target involving specific articulators, such as for example bring the margins of the tongue blade into contact with the upper first molars; or it may be to produce a particular vocal tract shape such as a large back cavity separated from a small front cavity by a narrow constriction; or it may be to produce an acoustic output that will be perceived as the vowel [i]. The DTTC is structured to promote precise and consistent movements of the articulators and therefore the first scenario is presumed. Furthermore, the origin of CAS is hypothesized to be a deficit in proprioceptive processing that arises from an impairment in cerebellar mechanisms. Updating the theory, this hypothesis would implicate feedforward control which, following from Guenther and Vladosich (2012), “projects directly from the speech sound map [in left ventral premotor cortex and posterior Broca’s area] to articulatory control units in cerebellum and primary motor cortex” (p. 2). However, new research (Liégeois et al., 2019) identifies the locus of structural and functional impairments underlying CAS as being along a dorsal pathway of cortical structures, specifically: reduced white matter and fMRI activations in sensory motor cortex and along the arcuate fasciculus and reduced grey matter and fMRI activations in superior temporal gyrus and angular gyrus. They explain that “this route links auditory input/representation to articulatory systems … and transforms phonological representations into motor programs …In contrast, the speech execution white matter pathway (corticobulbar) and the ventral language route (IFOF) were not altered in this family” [that showed multigenerational impairments in speech praxis]. My point is that although the cerebellum is important to speech motor control and CAS may well involve impairments in proprioceptive feedback, speech is clearly a sensory motor skill that requires close connection among articulatory and auditory representations for sounds and syllables.

In Part 2 of this blog series I indicated that adults can compensate for unexpected perturbations to articulatory trajectories or auditory feedback very rapidly by drawing on their internal model of vocal tract function. It is interesting to consider that throughout speech development children cope with perturbations to articulatory gestures and expected acoustic outputs because their vocal tract is changing shape, sometimes quite dramatically, throughout childhood. Callen et al. (2000) showed how the developing child can adapt to the changing vocal tract by aiming for relatively stable auditory targets (conceived of as regions in auditory space) and using auditory feedback and simulations of auditory outputs to achieve those targets even as vocal tract structure is changing. The key to this remarkable ability is a learned mapping between articulator movements, vocal tract shapes and auditory outputs. The learning and updating of this internal model of vocal tract function arises from an unsupervised learning mechanism, essentially Hebbian learning: young infants engage in a great deal of unstructured vocal play as well as somewhat more structured babbling – speech practice that allows them to learn the necessary correspondences without having specific speech goals. Infants with CAS are widely believed to skip this period of speech development; therefore, it is likely they begin speech therapy without an internal model of vocal tract function which is foundational for goal directed speech practice. Therefore, precise, repeated, consistent speech movements may not be the best place to start a treatment program for severe CAS; a program of unstructured vocal play that targets highly varied playful vocalizations is a better starting place for many children. Subsequently, high intensity practice with babble (repetitive syllable production) will stabilize the mappings between articulatory gestures and the resulting vocal tract configurations and somatosensory and auditory outcomes.

One of the advantages of a well-tuned internal model of vocal tract function is that it supports “motor-equivalent speech production” given commonly occurring constraints on speech production. In other words, there are many different articulatory gestures that will produce the same acoustic-phonetic goal. When the child has a stable acoustic-phonetic target and is able to process auditory feedback in relation to that target, various articulatory solutions can be found to adapt to changing vocal tract structure or constraints such as talking while eating or a holding a pen between the teeth. Developmental changes in the way that articulators are coordinated to produce the same phoneme are well documented in the literature. Similarly speech production varies with phonetic context. Motor equivalent trading relations between tongue body height and lip rounding are well known for production of the vowel [u] and the consonant [ʃ] for example and the front-back positioning of the constriction in these phonemes is highly variable across speakers and phonetic contexts. The precision with which these phonemes are produced is related to the talker’s perceptual acuity: for example, adults who have sharp perceptual boundaries between [ʃ] and [s] produce them with greater articulatory consistency as well as greater acoustic contrast between the phoneme categories. Perkell et al. (2004) speculated “In learning to maximize intelligibility, the child with higher acuity is better able to reject poor exemplars of each phoneme (as in the DIVA model), and thus will adopt sensory goals for producing those phonemes that are further apart than the child with lower acuity.” The implications for speech therapy are that, even in the case of CAS, ensuring stable acoustic-phonetic targets for speech therapy goals is essential whereas insisting upon SLP defined articulatory parameters may be counter-productive. The goal is not absolute  consistency in the production of specific motor movements, but rather, dynamic stability in the achievement of speaking goals.

Although it is speculated that feedforward control is weighted more heavily than feedback control in adult speech, feedback is critical to speech learning during infancy and childhood. Furthermore, auditory feedback plays a crucial role. The initial goal is an auditory target. Guenther and Vladusich (2012) explain that “the auditory feedback control subsystem [helps to] shape the ongoing attempt to produce the sound by transforming auditory errors into corrective motor commands via the feedback control map in right ventral premotor cortex” (p. 2). They further explain that repeated practice of this type eventually leads to the development of somatosensory goal regions. A particular frustration for children with CAS is perseveration, the difficulty of changing a well-learned articulatory pattern to a new one that is more appropriate. This problem with perseveration highlights the need to engage the feedback control system. There are two strategies that are essential: first a high degree of variation in the practice materials which can be introduced by practicing nonsense syllables with a carefully graded increase in difficulty but variation in the combination of syllables within difficulty levels. The second strategy is to provide just the right amount of scaffolding along the integral stimulation hierarchy so that the child will be successful more often than not while experiencing a certain amount of error. Some error ensures that corrective motor commands will be generated from time to time. Imagine practicing syllables that combine four consonants [b, m, w, f] with four vowels [i], [u], [æ], [ɑ] and four diphthongs [ei], [ou], [ɑi], [au], [oi], presented at random so that the child imitates the first syllable (Say [bi]) and then repeats it again twice (Say it again… and again…), before proceeding to another syllable. You will have a great many targets in your session but created from a small number of elements. Imagine further that you progress to a more difficult level (reduplicated syllables, [bubu], [mimi]) as soon as the child achieves 80% correct production of the single syllables. You can see that you will also be allowing the child to produce quite a bit of error. We call this the challenge point. Tanya Matthews, Francoise Brosseau-Lapré and I are working on a paper to describe how to do this and describe our experiences with the approach. You will see that it is very different from working on five words and requiring that the child achieve 15 to 20 correct productions at the imitative word level before proceeding to delayed imitation and then again before proceeding to spontaneous productions. Errorless learning is a fundamental aspect of DTTC and has a long history in speech therapy practice. However it is not clear that it is well-motivated from the perspective of developmental science.

To summarize, there are many aspects of DTTC that are similar across all sensory-motor approaches to the treatment of CAS. In particular high intensity speech practice is well motivated and likely to be effective with all forms of moderate and severe speech sound disorder. Nonetheless there are some significant differences between Strand’s approach and the approach that I recommend based on an updated theory of speech motor control. There is still a great deal of research to do because very few of our specific speech therapy practices have received empirical validation even though speech therapy in general has been shown to be efficacious. As a guide to future research (hopefully using randomized and thus interpretable designs), I provide a table of procedures that are similar and different across the two theoretical approaches.




Treatment Procedures that are Similar

High intensity practice
Focus on speech movements (not phonemes)
Practice syllable sized units (not isolated sounds)
Attend to temporal aspects of trial structure (delayed imitation, delayed provision of feedback)
Integral stimulation hierarchy (attend to visual and auditory aspects of target)

Treatment Procedures that are Different

Focus on precise, consistent movements Focus on dynamic stability
Over-practice: accuracy over 10-20 trials Variable practice when possible
Errorless learning Challenge point: 4/5 correct, then move up
Behavioral shaping of accurate movements Motor equivalent movements
Tactile and gestural cues to ensure accuracy Sharpen knowledge of auditory target
“Hold” initial configurations Encourage vocal play, develop internal model


Callan, D. E., Kent, R. D., Guenther, F. H., & Vorperian, H. K. (2000). An auditory-feedback-based neural network model of speech production that is robust to developmental changes in the size and shape of the articulatory system. Journal of Speech, Language, and Hearing Research, 43, 721-738.

Guenther, F. H., & Vladusich, T. (2012). A neural theory of speech acquisition and production. Journal of Neurolinguistics, 25(5), 408-422.

Liégeois, F. J., Turner, S. J., Mayes, A., Bonthrone, A. F., Boys, A., Smith, L., . . . Morgan, A. T. (2019). Dorsal language stream anomalies in an inherited speech disorder. Brain, 142(4), 966-977.

Perkell, J., Matthies, M., Lane, H., Guenther, F. H., Wilhelms-Tricarico, R., Wozniak, J., & Guiod, P. (1997). Speech motor control: Acoustic goals, saturation effects, auditory feedback and internal models. Speech Communication, 22, 227-250.

Perkell, J., Matthies, M. L., Tiede, M., Lane, H., Zandipour, M., Marrone, M., . . . Guenther, F. H. (2004). The distinctness of speakers’ /s/-/ʃ/ contrast is related to their auditory discrimination and use of an articulatory saturation effect. Journal of Speech, Language, and Hearing Research, 47, 1259-1269.

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders, 67, 1-13.

Rvachew, S., & Matthews, T. (2019). An N-of-1 Randomized Controlled Trial of Interventions for Children With Inconsistent Speech Sound Errors. Journal of Speech, Language, and Hearing Research, 62, 3183–3203

Leave a comment


  1. Aravind Kumar

     /  February 3, 2020

    this is brilliantly written up! great insights …loved reading it. thank you

  2. I appreciate your explanation of the importance of stabilizing the mappings between articulatory gestures and the resulting vocal tract configurations and somatosensory and auditory outcomes for children with CAS. The idea of “dynamic stability in the achievement of speaking goals” is very helpful in conceptualizing how treatment should be directed. I look forward to reading the paper you are working on with Tanya Matthews and Francoise Brosseau-Lapré.

    • Thank you for your kind comment. I didn’t learn about dynamic stability until I was shockingly old and it changed my whole mindset about speech therapy. I am so happy you picked up on that and appreciate your comment.

  3. Gabrielle Miller

     /  February 23, 2020

    I reread this three-part blog every few days, (especially after slogging through a difficult article on neuro-computational models like DIVA or FACTS); rereading the blog helps me to refocus on the bigger picture I look forward to reading your paper detailing all of this information!

    • Thank you for your comment Gabrielle. I am very happy that you (and so many others) have found the blog posts to be useful. The paper that Tanya is writing is focused on the challenge point framework so it will be very practical.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: