Speech Therapy and Speech Motor Control: Part 2

Speech Therapy and Theories of Speech Motor Control: Part 2

In Part 1 of this blog series I described the theoretical basis of Dynamic Temporal and Tactile Cueing as recently published by Edy Strand. Specifically, the treatment is founded on Schmidt’s Schema Theory in which generalized motor programs are learned. During speech production the child must select the right program and apply the correct parameters before implementing it all at once. If the parameters are selected incorrectly, a speech error will occur. It is rather like making toast. If you forget to reset your settings after toasting bagels, your Wonderbread will come out black! The problem as stated by Schmidt is that by the time you realize that your toast settings are wrong and your motor gestures are off track, it’s too late— the toast is burned and you have said “Trat! Doast!” Learning occurs by “trial and error” — after much experience with your toaster you learn the settings (parameters) for getting the right amount of toastiness for different items. Learning to operate your toaster is similar to acquiring one “generalized motor program.” Speech motor learning is assumed to operate this way because sensory feedback is too slow to support on-line adjustments to the parameters in a direct way. I used a different analogy in the previous blog — once you have committed to swinging your golf club, you tend to follow through.

The problem with this model of speech motor control is that we know for certain that real time modification of vocal tract movements occurs in response to somatosensory and auditory feedback. Strangely we have known since the early eighties that the speech system is highly sensitive to error on-line; therefore, I don’t know why this idea of open-loop control persists. The proof comes from studies in which (typically) an adult is asked to repeatedly produce a particular syllable or disyllable and then experiences a perturbation in sensory feedback (either somatosensory feedback or auditory feedback). An early example of this paradigm involved productions of “aba”: during 15% of trials a mechanism placed an unexpected load on the talker’s lower lip. Here is where it gets interesting: the research participants corrected for this perturbation in the articulatory trajectory of the bottom lip very rapidly with compensatory actions of the top and the bottom lip (the bottom lip would need to exert greater upward force and the top lip would need to produce greater downward extent in order to produce the labial closure and the expected transitions into and out of the consonantal closure). Decades of experiments have followed involving many other perturbations in the domain of articulatory gestures, somatosensory (skin) sensations, and auditory feedback. For example, while the research participants are repeatedly saying “bed” you can trick their ear into thinking they are saying “bad” which leads to compensatory adjustments in articulation to get the expected auditory percept.

This kind of dynamic compensation across the entire vocal tract is made possible by an “internal model” — a neural model that simulates the behavior of a sensorimotor system in relation to its environment. The internal model can generate a prediction of the sensory consequences of implementing a motor plan via simulation. For speech, future outputs in the somatosensory and auditory domains are simulated; furthermore, the simulator takes into account delayed sensory feedback, noise in the perceptual system and other variables so that when feedback arrives it can be compared with the prediction and provide reliable error messages. Continuous tracking of the vocal tract state is thus permitted and forms the basis for ongoing planning of movements as speech unfolds. If an unexpected event occurs, as in the perturbation experiments that I have described, error corrections are dynamic across the entire system; therefore, if the predicted trajectory of acoustic formant transitions from the [a] into the [b] closure is not occurring, lower lip, upper lip, jaw and tongue movements can all be harnessed to produce the desired outcome.

As Houde and Nagarajan (2011) explain, “speech motor control is not an example of pure feedback control or feedforward control” (p. 11). The acquisition of speech motor control is dependent upon the development of the internal model of vocal tract function as well as detailed knowledge of auditory targets. This understanding has implications for the treatment of childhood apraxia of speech. I will explore these implications further in the next and final blog in this series.


Abbs, J. H., & Gracco, V. L. (1983). Sensorimotor actions in the control of multi-movement speech gestures. Trends in Neurosciences, 6, 391-395.

Houde, J. F., & Jordan, M. I. (2002). Sensorimotor adaptation of speech I: Compensation and adaptation. Journal of Speech, Language & Hearing Research, 45(2), 295-310.

Houde, J. F., & Nagarajan, S. S. (2011). Speech production as state feedback control. Frontiers in Human Neuroscience, 5, doi: 10.3389/fnhum.2011.00082.

Tourville, J. A., Reilly, K. J., & Guenther, F. H. (2008). Neural mechanisms underlying auditory feedback control of speech. NeuroImage, 39, 1429-1443.

Using SAILS to Assess Speech Perception in Children with SSD

I am very excited to see an Australian replication of the finding that children with a Speech Sound Disorder (SSD) have difficulty with speech perception when tested with a word identification test implemented with recordings of children’s speech. Hearnshaw, Baker, and Munro (2018) created a task modeled on my Speech Assessment and Interactive Learning (SAILS) program. A different software platform was used to present the stimuli and record the children’s responses. The critical elements of SAILS were otherwise replicated but there were some significant differences as shown in the table below.

Hearnshaw compare SAILS

The most important differences are the younger age of the children and the targeting of phonemes with older expected ages of acquisition. Furthermore there are 12 stimuli per block and two target words per target phoneme in Hearnshaw versus 10 stimuli per block and one target word per target phoneme in my own assessment studies. In Hearnshaw the practice procedures involved fewer stimuli and less training on the task. Finally, the response mode was more complex in Hearnshaw and the response alternatives do not replicate mine. Therefore this study does not constitute a replication of my own studies and I might expect lower performance levels compared to that observed by the children tested in my own studies (I say this before setting up the next table, let’s see what happens). None-the-less, we would all expect that children with SSD would underperform their counterparts with typically developing speech especially given the close matching on age and receptive vocabulary in Hearnshaw and my own studies.

Hearnshaw SAILS data comparison table

Looking at the data in the above table, the performance of the children with SSD is uniformly lower than that of the typically developing comparison groups. Hearnshaw’s SSD group obtained a lower score overall when compared to the large sample that I reported in 2006 but slightly higher when compared to the small sample that I reported in 2003 (that study was actually Alyssa Ohberg’s undergraduate honours thesis). It is not clear that any of these differences are statistically significant so I plotted them with standard error bars below.

Hearnshaw SAILS comparison figure

The chart does reinforce the impression that the differences between diagnostic groups are significant. It is not clear about the differences across studies. It is possible that the children that Alyssa tested were more severely impaired than all the others (the GFTA is not the same as the DEAP so it is difficult to compare) or more likely the best estimate is in the third study with the largest sample size. Nonetheless, the message is clear that typically developing children in this age range will achieve scores above 70% accurate whereas children with SSD are more likely to achieve scores below 70% accurate which suggests that they are largely guessing when making judgements about incorrectly produced exemplars of the target words. Hearnshaw et al. and I both emphasize the within group variance in perceptual performance by children with SSD. Therefore, it is important to assess these children’s speech perception abilities in order to plan the most suitable intervention.

And with that I am happy to announce that the iPad version of SAILS is now available with all four modules necessary to compare to the normative data that is presented below for three age groups.

SAILS Norms RBL 2018

Specifically, the modules that are currently available for purchase ($5.49 CAD per module) are as follows:

-“k”: cat (free)

-“l”: lake

-“r”: rat, rope, door

“s”: Sue, soap, bus

Please see www.dialspeech.com for more information from me and Alex Herbay who wrote the software, or go directly to the app store: SAILS by Susan Rvachew and Alex Herbay