Speech Therapy and Speech Motor Control: Part 3

In two previous blogs I discussed a recent paper by Strand in which she outlines in detail the theoretical foundation and procedural details of Dynamic Temporal and Tactile Cueing (DTTC) as a treatment for Childhood Apraxia of Speech (CAS). In Part 1 I suggested that the theoretical base, being Schmidt’s “Schema Theory of Discrete Motor Skill Learning,” was outdated. In Part 2 I discussed modern theories of speech motor control that assume a dynamic interplay of feedforward and feedback control mechanisms. In this blog I will discuss the implications for speech therapy, in relation to critical aspects of DTTC.

First, let us consider the core element of DTTC, “the focus on the movement (rather than the sound or phoneme) in terms of modeling, cueing, feedback, and target selection” (p. 4). I believe that all of us who strive to help children with CAS acquire intelligible speech agree that speech movements are the focus of speech therapy, as opposed to phonological contrasts. Nonetheless, this statement raises questions about the nature of “speech movements.” What is the goal of a speech movement? The answer to this question is controversial: it may be a somatosensory target involving specific articulators, such as for example bring the margins of the tongue blade into contact with the upper first molars; or it may be to produce a particular vocal tract shape such as a large back cavity separated from a small front cavity by a narrow constriction; or it may be to produce an acoustic output that will be perceived as the vowel [i]. The DTTC is structured to promote precise and consistent movements of the articulators and therefore the first scenario is presumed. Furthermore, the origin of CAS is hypothesized to be a deficit in proprioceptive processing that arises from an impairment in cerebellar mechanisms. Updating the theory, this hypothesis would implicate feedforward control which, following from Guenther and Vladosich (2012), “projects directly from the speech sound map [in left ventral premotor cortex and posterior Broca’s area] to articulatory control units in cerebellum and primary motor cortex” (p. 2). However, new research (Liégeois et al., 2019) identifies the locus of structural and functional impairments underlying CAS as being along a dorsal pathway of cortical structures, specifically: reduced white matter and fMRI activations in sensory motor cortex and along the arcuate fasciculus and reduced grey matter and fMRI activations in superior temporal gyrus and angular gyrus. They explain that “this route links auditory input/representation to articulatory systems … and transforms phonological representations into motor programs …In contrast, the speech execution white matter pathway (corticobulbar) and the ventral language route (IFOF) were not altered in this family” [that showed multigenerational impairments in speech praxis]. My point is that although the cerebellum is important to speech motor control and CAS may well involve impairments in proprioceptive feedback, speech is clearly a sensory motor skill that requires close connection among articulatory and auditory representations for sounds and syllables.

In Part 2 of this blog series I indicated that adults can compensate for unexpected perturbations to articulatory trajectories or auditory feedback very rapidly by drawing on their internal model of vocal tract function. It is interesting to consider that throughout speech development children cope with perturbations to articulatory gestures and expected acoustic outputs because their vocal tract is changing shape, sometimes quite dramatically, throughout childhood. Callen et al. (2000) showed how the developing child can adapt to the changing vocal tract by aiming for relatively stable auditory targets (conceived of as regions in auditory space) and using auditory feedback and simulations of auditory outputs to achieve those targets even as vocal tract structure is changing. The key to this remarkable ability is a learned mapping between articulator movements, vocal tract shapes and auditory outputs. The learning and updating of this internal model of vocal tract function arises from an unsupervised learning mechanism, essentially Hebbian learning: young infants engage in a great deal of unstructured vocal play as well as somewhat more structured babbling – speech practice that allows them to learn the necessary correspondences without having specific speech goals. Infants with CAS are widely believed to skip this period of speech development; therefore, it is likely they begin speech therapy without an internal model of vocal tract function which is foundational for goal directed speech practice. Therefore, precise, repeated, consistent speech movements may not be the best place to start a treatment program for severe CAS; a program of unstructured vocal play that targets highly varied playful vocalizations is a better starting place for many children. Subsequently, high intensity practice with babble (repetitive syllable production) will stabilize the mappings between articulatory gestures and the resulting vocal tract configurations and somatosensory and auditory outcomes.

One of the advantages of a well-tuned internal model of vocal tract function is that it supports “motor-equivalent speech production” given commonly occurring constraints on speech production. In other words, there are many different articulatory gestures that will produce the same acoustic-phonetic goal. When the child has a stable acoustic-phonetic target and is able to process auditory feedback in relation to that target, various articulatory solutions can be found to adapt to changing vocal tract structure or constraints such as talking while eating or a holding a pen between the teeth. Developmental changes in the way that articulators are coordinated to produce the same phoneme are well documented in the literature. Similarly speech production varies with phonetic context. Motor equivalent trading relations between tongue body height and lip rounding are well known for production of the vowel [u] and the consonant [ʃ] for example and the front-back positioning of the constriction in these phonemes is highly variable across speakers and phonetic contexts. The precision with which these phonemes are produced is related to the talker’s perceptual acuity: for example, adults who have sharp perceptual boundaries between [ʃ] and [s] produce them with greater articulatory consistency as well as greater acoustic contrast between the phoneme categories. Perkell et al. (2004) speculated “In learning to maximize intelligibility, the child with higher acuity is better able to reject poor exemplars of each phoneme (as in the DIVA model), and thus will adopt sensory goals for producing those phonemes that are further apart than the child with lower acuity.” The implications for speech therapy are that, even in the case of CAS, ensuring stable acoustic-phonetic targets for speech therapy goals is essential whereas insisting upon SLP defined articulatory parameters may be counter-productive. The goal is not absolute  consistency in the production of specific motor movements, but rather, dynamic stability in the achievement of speaking goals.

Although it is speculated that feedforward control is weighted more heavily than feedback control in adult speech, feedback is critical to speech learning during infancy and childhood. Furthermore, auditory feedback plays a crucial role. The initial goal is an auditory target. Guenther and Vladusich (2012) explain that “the auditory feedback control subsystem [helps to] shape the ongoing attempt to produce the sound by transforming auditory errors into corrective motor commands via the feedback control map in right ventral premotor cortex” (p. 2). They further explain that repeated practice of this type eventually leads to the development of somatosensory goal regions. A particular frustration for children with CAS is perseveration, the difficulty of changing a well-learned articulatory pattern to a new one that is more appropriate. This problem with perseveration highlights the need to engage the feedback control system. There are two strategies that are essential: first a high degree of variation in the practice materials which can be introduced by practicing nonsense syllables with a carefully graded increase in difficulty but variation in the combination of syllables within difficulty levels. The second strategy is to provide just the right amount of scaffolding along the integral stimulation hierarchy so that the child will be successful more often than not while experiencing a certain amount of error. Some error ensures that corrective motor commands will be generated from time to time. Imagine practicing syllables that combine four consonants [b, m, w, f] with four vowels [i], [u], [æ], [ɑ] and four diphthongs [ei], [ou], [ɑi], [au], [oi], presented at random so that the child imitates the first syllable (Say [bi]) and then repeats it again twice (Say it again… and again…), before proceeding to another syllable. You will have a great many targets in your session but created from a small number of elements. Imagine further that you progress to a more difficult level (reduplicated syllables, [bubu], [mimi]) as soon as the child achieves 80% correct production of the single syllables. You can see that you will also be allowing the child to produce quite a bit of error. We call this the challenge point. Tanya Matthews, Francoise Brosseau-Lapré and I are working on a paper to describe how to do this and describe our experiences with the approach. You will see that it is very different from working on five words and requiring that the child achieve 15 to 20 correct productions at the imitative word level before proceeding to delayed imitation and then again before proceeding to spontaneous productions. Errorless learning is a fundamental aspect of DTTC and has a long history in speech therapy practice. However it is not clear that it is well-motivated from the perspective of developmental science.

To summarize, there are many aspects of DTTC that are similar across all sensory-motor approaches to the treatment of CAS. In particular high intensity speech practice is well motivated and likely to be effective with all forms of moderate and severe speech sound disorder. Nonetheless there are some significant differences between Strand’s approach and the approach that I recommend based on an updated theory of speech motor control. There is still a great deal of research to do because very few of our specific speech therapy practices have received empirical validation even though speech therapy in general has been shown to be efficacious. As a guide to future research (hopefully using randomized and thus interpretable designs), I provide a table of procedures that are similar and different across the two theoretical approaches.

 

SCHEMA THEORY

AUDITORY FEEDBACK CONTROL

Treatment Procedures that are Similar

High intensity practice
Focus on speech movements (not phonemes)
Practice syllable sized units (not isolated sounds)
Attend to temporal aspects of trial structure (delayed imitation, delayed provision of feedback)
Integral stimulation hierarchy (attend to visual and auditory aspects of target)

Treatment Procedures that are Different

Focus on precise, consistent movements Focus on dynamic stability
Over-practice: accuracy over 10-20 trials Variable practice when possible
Errorless learning Challenge point: 4/5 correct, then move up
Behavioral shaping of accurate movements Motor equivalent movements
Tactile and gestural cues to ensure accuracy Sharpen knowledge of auditory target
“Hold” initial configurations Encourage vocal play, develop internal model

Readings:

Callan, D. E., Kent, R. D., Guenther, F. H., & Vorperian, H. K. (2000). An auditory-feedback-based neural network model of speech production that is robust to developmental changes in the size and shape of the articulatory system. Journal of Speech, Language, and Hearing Research, 43, 721-738.

Guenther, F. H., & Vladusich, T. (2012). A neural theory of speech acquisition and production. Journal of Neurolinguistics, 25(5), 408-422.

Liégeois, F. J., Turner, S. J., Mayes, A., Bonthrone, A. F., Boys, A., Smith, L., . . . Morgan, A. T. (2019). Dorsal language stream anomalies in an inherited speech disorder. Brain, 142(4), 966-977.

Perkell, J., Matthies, M., Lane, H., Guenther, F. H., Wilhelms-Tricarico, R., Wozniak, J., & Guiod, P. (1997). Speech motor control: Acoustic goals, saturation effects, auditory feedback and internal models. Speech Communication, 22, 227-250.

Perkell, J., Matthies, M. L., Tiede, M., Lane, H., Zandipour, M., Marrone, M., . . . Guenther, F. H. (2004). The distinctness of speakers’ /s/-/ʃ/ contrast is related to their auditory discrimination and use of an articulatory saturation effect. Journal of Speech, Language, and Hearing Research, 47, 1259-1269.

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders, 67, 1-13.

Rvachew, S., & Matthews, T. (2019). An N-of-1 Randomized Controlled Trial of Interventions for Children With Inconsistent Speech Sound Errors. Journal of Speech, Language, and Hearing Research, 62, 3183–3203

Speech Therapy and Speech Motor Control: Part 2

Speech Therapy and Theories of Speech Motor Control: Part 2

In Part 1 of this blog series I described the theoretical basis of Dynamic Temporal and Tactile Cueing as recently published by Edy Strand. Specifically, the treatment is founded on Schmidt’s Schema Theory in which generalized motor programs are learned. During speech production the child must select the right program and apply the correct parameters before implementing it all at once. If the parameters are selected incorrectly, a speech error will occur. It is rather like making toast. If you forget to reset your settings after toasting bagels, your Wonderbread will come out black! The problem as stated by Schmidt is that by the time you realize that your toast settings are wrong and your motor gestures are off track, it’s too late— the toast is burned and you have said “Trat! Doast!” Learning occurs by “trial and error” — after much experience with your toaster you learn the settings (parameters) for getting the right amount of toastiness for different items. Learning to operate your toaster is similar to acquiring one “generalized motor program.” Speech motor learning is assumed to operate this way because sensory feedback is too slow to support on-line adjustments to the parameters in a direct way. I used a different analogy in the previous blog — once you have committed to swinging your golf club, you tend to follow through.

The problem with this model of speech motor control is that we know for certain that real time modification of vocal tract movements occurs in response to somatosensory and auditory feedback. Strangely we have known since the early eighties that the speech system is highly sensitive to error on-line; therefore, I don’t know why this idea of open-loop control persists. The proof comes from studies in which (typically) an adult is asked to repeatedly produce a particular syllable or disyllable and then experiences a perturbation in sensory feedback (either somatosensory feedback or auditory feedback). An early example of this paradigm involved productions of “aba”: during 15% of trials a mechanism placed an unexpected load on the talker’s lower lip. Here is where it gets interesting: the research participants corrected for this perturbation in the articulatory trajectory of the bottom lip very rapidly with compensatory actions of the top and the bottom lip (the bottom lip would need to exert greater upward force and the top lip would need to produce greater downward extent in order to produce the labial closure and the expected transitions into and out of the consonantal closure). Decades of experiments have followed involving many other perturbations in the domain of articulatory gestures, somatosensory (skin) sensations, and auditory feedback. For example, while the research participants are repeatedly saying “bed” you can trick their ear into thinking they are saying “bad” which leads to compensatory adjustments in articulation to get the expected auditory percept.

This kind of dynamic compensation across the entire vocal tract is made possible by an “internal model” — a neural model that simulates the behavior of a sensorimotor system in relation to its environment. The internal model can generate a prediction of the sensory consequences of implementing a motor plan via simulation. For speech, future outputs in the somatosensory and auditory domains are simulated; furthermore, the simulator takes into account delayed sensory feedback, noise in the perceptual system and other variables so that when feedback arrives it can be compared with the prediction and provide reliable error messages. Continuous tracking of the vocal tract state is thus permitted and forms the basis for ongoing planning of movements as speech unfolds. If an unexpected event occurs, as in the perturbation experiments that I have described, error corrections are dynamic across the entire system; therefore, if the predicted trajectory of acoustic formant transitions from the [a] into the [b] closure is not occurring, lower lip, upper lip, jaw and tongue movements can all be harnessed to produce the desired outcome.

As Houde and Nagarajan (2011) explain, “speech motor control is not an example of pure feedback control or feedforward control” (p. 11). The acquisition of speech motor control is dependent upon the development of the internal model of vocal tract function as well as detailed knowledge of auditory targets. This understanding has implications for the treatment of childhood apraxia of speech. I will explore these implications further in the next and final blog in this series.

Readings

Abbs, J. H., & Gracco, V. L. (1983). Sensorimotor actions in the control of multi-movement speech gestures. Trends in Neurosciences, 6, 391-395.

Houde, J. F., & Jordan, M. I. (2002). Sensorimotor adaptation of speech I: Compensation and adaptation. Journal of Speech, Language & Hearing Research, 45(2), 295-310.

Houde, J. F., & Nagarajan, S. S. (2011). Speech production as state feedback control. Frontiers in Human Neuroscience, 5, doi: 10.3389/fnhum.2011.00082.

Tourville, J. A., Reilly, K. J., & Guenther, F. H. (2008). Neural mechanisms underlying auditory feedback control of speech. NeuroImage, 39, 1429-1443.

Speech Therapy and Theories of Speech Motor Control: Part I

Edy Strand recently published a detailed description of her Dynamic Temporal and Tactile Cueing treatment strategy. As she says this is a hugely valuable paper because it provides a complete description of a treatment designed for severe speech sound disorders, especially Childhood Apraxia of Speech, and more importantly, it summarizes in one place the theoretical foundation for the treatment. I think that, on the whole, this is an efficacious treatment although there are some procedures, derived directly from the outdated theoretical underpinnings, that are questionable however, and therefore I am going to devote several blogs to more recent theory and basic science research on the development of speech motor control and apraxia of speech. In this first blog, I review Schema Theory, even though this theory is just not right! But it has a long history and remains currently popular across almost all clinically-oriented papers on motor speech disorders.

The theory that is referenced in Edy Strand’s paper is Richard Schmidt’s “Schema Theory of Discrete Motor Skill Learning,” published in Psychological Review in 1975 and subsequently brought to speech-language pathology by Ray Kent and others as a useful framework for thinking about speech therapy. The important idea underlying this theory is that motor skills are made up of brief, discrete motor acts that are executed all-at-once as open-loop generalized motor programs, adapted with specific response specifications (called parameters) for the current conditions. The theory assumes “open-loop” control because sensory feedback is often too slow to impact movement after it has started. According to this theory feedback is processed after the movement is over and incorporated into the schema for the future execution of the generalized motor program. I have used golf as an example before; even though I haven’t played much in years let’s do it again: if we are adopting this theory we would think of practice sessions as developing different generalized motor programs for each type of shot, a long drive, a short 7-iron shot, the up-and-down pitch onto the green, and the putt into the hole. Which shot you choose depends upon your recall schema: what is your target and which type of shot is likely to achieve it? I personally recall that when close to the green my pitch is better than my chip (whereas my husband has the opposite preference). How you address the ball depends upon the initial conditions (flat ground, hill, tall grass etc.). The motor control parameters (also known as response specifications) depend upon the distance to the target (how high to lift the club, speed of follow through, force applied and so on). Based on the initial conditions and the desired outcome, I launch the shot with my wedge, expecting a certain “feel” as I hit the ball based on past experience with the sensory consequences of hitting this shot; I can always “recognize” a good hit even before I see the ball land (often I just turn my back on the ball, I don’t even want to see it land!). But in any case, the actual outcome is important for updating the “recall” schema; specifically, if I have actually achieved my target, I add all this information, the initial conditions, the response specifications, the recognition schema and the recall schema to my memory. The generalized motor program is an abstraction across all these remembered practice trials, permitting correct specification of the response parameters in future shots. Furthermore, I should be able to adapt the generalized motor program to similar shots, even if the ball is a little further or closer to the green for example.

When applied to CAS, in which current research suggests unreliable or degraded somatosensory feedback, the use of this model focuses attention on the child’s processing of initial conditions, inaccurate planning or programming of the movement due to poor selection of response specifications, and/or poor recognition schema (not knowing when the movement “feels right”). Therefore, certain procedures are recommended. DTTC providers use manual or gestural cues to shape the child’s articulators into the “initial position” and encourage the child to “hold” the position momentarily so as to fully process those initial conditions before launching the movement. During the initial stages of therapy, the SLP uses a slow rate and co-production so that the child is getting extra feedback during the practice trial, presumably with the goal of stabilizing the recognition schema. Imitative models support the child’s knowledge of the target which, when combined with copious knowledge of results feedback should support the development of recall schema. And finally, a great deal of practice with an errorless approach ensures that the child lays down many memory traces of correctly executed motor programs.

The recommendations that are provided make a certain amount of sense given the context of schema theory (even though there is in fact no evidence for the specific efficacy any one of these particular procedures). The problem is that it is not clear that schema theory is a reasonable foundation for modern speech therapy practice.

First, citing Richard Schmidt himself, he cautioned in 2003 that “schema theory was intended to be an account of discrete actions. Hence, continuous actions, such as steering a car or juggling, which are both of longer duration (allowing time for response-produced feedback to have a role) and more based on the performer’s interactions with the environment were outside the area for schema theory…long-duration actions might be based on interplay between open-loop subactions and feedback-based corrections… . Interestingly, tasks such as juggling seem appropriate for analysis in terms of the dynamical systems perspective” (p. 367). I would argue that our understanding of, not only juggling, but speech motor control has benefited immensely from the dynamical systems perspective and I will come back to that in the next blog. If juggling is considered too complex and continuous to be explained by schema theory, probably speech is not a good fit either.

Second, modern theories of speech motor control have shown that on-line correction of motor action even over short durations occurs despite the limitations of feedback control. The explanation lies in the continuous operation of feedforward control mechanisms. More on feedforward control in another blog.

References

Rvachew, S., & Brosseau-Lapré, F. (2012). Developmental Phonological Disorders: Foundations of Clinical Practice. San Diego, CA: Plural Publishing.

Schmidt, R. A. (1975). A schema theory of discrete motor skill learning. Psychological Review, 82(4), 225-260. doi:10.1037/h0076770

Schmidt, R. A. (2003). Motor schema theory after 27 years: Reflections and implications for a new theory. Research Quarterly for Exercise and Sport, 74(4), 366-375.

Strand Edythe, A. (2019, Early View). Dynamic Temporal and Tactile Cueing: A Treatment Strategy for Childhood Apraxia of Speech. American Journal of Speech-Language Pathology. doi:10.1044/2019_AJSLP-19-0005