Is Acoustic Feedback Effective for Remediating “r” Errors?

I am very pleased to see a third paper published in the speech-language pathology literature using the single-subject randomization design that I have described in two tutorials, the first in 1988 and the second more recently. Tara McAllister Byun used the design to investigate the effectiveness of acoustic biofeedback treatment to remediate persistent “r” errors in 7 children aged 9 to 15 years. She used the single subject randomized alternation design with block randomization, including a few unique elements in her implementation of the design. She and her research team provided one traditional treatment session and one biofeedback treatment session each week for ten weeks. However the order of the traditional and biofeedback sessions was randomized each week. Interestingly, each session targeted the same items (i.e., “r” was the speech sound target  in both treatment conditions): rhotic vowels were tackled first and consonantal “r” was introduced later, in a variety of phonetic contexts. (This procedure is a variance from my experience in which, for example, Tanya Matthews and I randomly assign different targets to different treatment conditions). Another innovation is the outcome measure: a probe constructed of untreated “r” words was given at the beginning and end of each session so that change (Mdif) over the session was the outcome measure submitted to statistical analysis (our tutorial explains that the advantage of the SSRD is that a nonparametric randomization test can be used to assess the outcome of the study, yielding a p value).  In addition, 3 baseline probes and 3 maintenance probes were collected so that an effect size for overall improvement could be calculated. In this way there are actually 3 time scales for measuring change in this study: (1) change from baseline to maintenance probes; (2) change from baseline to treatment performance as reflected in the probes obtained at the beginning of each session and plotted over time; and (3) change over a session, reflected in the probes given at the beginning and the end of each session. Furthermore, it is possible to compare differences in within session change for sessions provided with and without acoustic feedback.

I was really happy to see the implementation of the design but it is fair to say that the results were a dog’s breakfast, as summarized below:

Byun 2017 acoustic biofeedback

The table indicates that two participants (Piper, Clara) showed an effect of biofeedback treatment and generalization learning. Both showed rapid change in accuracy overall after treatment was introduced in both conditions and maintained at least some of that improvement after treatment was withdrawn. Garrat and Ian showed identical trajectories in the traditional and biofeedback conditions with a late rise in accuracy during treatment session, large within session improvements during the latter part of the treatment period, and good maintenance of those gains. Neither boy achieved 60% correct responding however at any point in the treatment program. Felix, Lucas and Evan demonstrated no change in probe scores across the twenty weeks of the experiment in both conditions. Lucas started at a higher level and therefore his probe performance is more variable: because he actually showed a within session decline during traditional sessions while showing stable performance within biofeedback sessions, the statistics indicate a treatment effect in favour of acoustic biofeedback but in fact no actual gains are observed.

So, this is a long description of the results that brings me to two conclusions: (1) the alternation design was the wrong choice for the hypothesis in these experiments; and (2) biofeedback was not effective for these children; even in those cases where it looks like there was an effect, the children were responsive to both biofeedback and the traditional intervention.

In a previous blog, I described the alternation design; there is another version of the single subject randomization design that would be more appropriate for Tara’s hypothesis however.  The thing about acoustic biofeedback is that it is not fundamentally different from traditional speech therapy, involving a similar sequence of events: (i) SLP says a word as an imitative model; (ii) child imitates the word; (iii) SLP provides informative or corrective feedback. In the case of incorrect responses in the traditional condition in Byun’s study, the SLP provided information about articulatory placement and reminded the child that the target involved certain articulatory movements (“make the back part of your tongue go back”). In the case of incorrect responses in the acoustic biofeedback condition, the SLP made reference to the acoustic spectrogram when providing feedback and reminded the child that the target involved certain formant movements (“make the third bump move over”). Firstly, the first two steps are completely overlapping in both conditions and secondly it can be expected that the articulatory cues given in the traditional condition will be remembered and their effects will carry-over into the biofeedback sessions. Therefore we can consider the acoustic biofeedback to be an add-on to traditional therapy. We want to know about the value added. Therefore the phase design is more appropriate: in this case, there would be 20 sessions (2 per week over 10 weeks as in Byun’s study), each session would be planned with the same format: beginning probe (optional), 100 practice trials with feedback, ending probe. The difference is that the starting point for the introduction of acoustic biofeedback would be selected at random. All the sessions that precede the randomly selected start point would be conducted with traditional feedback and all the remainder would be conducted with acoustic biofeedback. The first three would be designated as traditional and the last 3 would be designated as biofeedback for a 26 session protocol as described by Byun. Across the 7 children this would end up looking like a multiple baseline design except that (1) the duration of the baseline phase would be determined by random selection for each child; and (2) the baseline phase is actually the traditional treatment with the experimental phase testing the value added benefit of biofeedback. There are three possible categories of outcomes: no change after introduction of the biofeedback, an immediate change, or a late change. As with any single subject design, the change might be in level, trend or variance and the test statistic can be designed to capture any of those types of changes. The statistical analysis asks whether the obtained test statistic is bigger than all possible results given all of the possible random selection of starting points. Rvachew & Matthews (2016) provides a more complete  explanation of the statistical analysis.

I show below an imaginary result for Clara, using the data presented for her in Byun’s paper, as if the traditional treatment came first and then the biofeedback intervention. If we pretend that the randomly selected start point for the biofeedback intervention occurred exactly in the middle of the treatment period, the test statistic is the difference of the M(bf) and the M(trad) scores resulting in -2.308. All other possible random selections of starting points for intervention lead to 19 other possible mean differences, and 18 of them are bigger than the obtained test statistic leading to a p value of 18/20 = .9. In this data set the probe scores are actually bigger in the earlier part of the intervention when the traditional treatment is used and they do not get bigger when the biofeedback is introduced. These are the beginning probe scores obtained by Clara but Byun obtained a significant result in favour of biofeedback by block randomization and by examining change across each session. However, I am not completely sure that the improvements from beginning to ending probes are a positive sign—this result might reflect a failure to maintain gains from the previous session in one or the other condition.

Hypothetical Clara in SSR Phase Design

There are several reasons to think that both interventions that were used in Byun’s study might result in unsatisfactory generalization and maintenance. We discuss the principles of generalization in relation to theories of motor learning in Developmental Phonological Disorders: Foundations of Clinical Practice. One important principle is that the child needs a well-established representation of the acoustic-phonetic target. All seven of the children in Byun’s study had poor auditory processing skills but no part of the treatment program addressed phonological processing, phonological knowledge or acoustic phonetic representations. Second, it is essential to have the tools to monitor and use self-produced feedback (auditory, somatosensory) to evaluate success in achieving the target. Both the traditional and the biofeedback intervention put the child in the position of being dependent upon external feedback. The outcome measure focused attention on improvements from the beginning of the practice session to the end. The first principle of motor learning is that practice performance is not an indication of learning however.  The focus should have been on the sometimes large decrements in probe scores from the end of one session to the beginning of the next. The children had no means of maintaining any of those performance gains. Acoustic feedback may be a powerful means of establishing a new response but it is a counterproductive tool for maintenance and generalization learning.

Reading

McAllister Byun, T. (2017). Efficacy of Visual–Acoustic Biofeedback Intervention for Residual Rhotic Errors: A Single-Subject Randomization Study. Journal of Speech, Language, and Hearing Research, 60(5), 1175-1193. doi:10.1044/2016_JSLHR-S-16-0038

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders, 67, 1-13. doi:https://doi.org/10.1016/j.jcomdis.2017.04.003

 

Advertisements

Testing Client Response to Alternative Speech Therapies

Buchwald et al published one of the many interesting papers in a recent special issue on motor speech disorders in the Journal of Speech, Language and Hearing Research. In their paper they outline a common approach to speech production, one that is illustrated and discussed in some detail in Chapters 3 and 7 of our book, Developmental Phonological Disorders: Foundations of Clinical Practice. Buchwald et al. apply it in the context of Acquired Apraxia of Speech however. They distinguish between patients who produce speech errors subsequent to left hemisphere cardiovascular accident as a consequence of motor planning difficulties versus phonological planning difficulties. Specifically, in their study there are four such patients, two in each subgroup. Acoustic analysis was used to determine whether their cluster errors arose during phonological planning or in the next stage of speech production – during motor planning. The analysis involves comparing the durations of segments in triads of words like this: /skæmp/ → [skæmp], /skæmp/ → [skæm], /skæm/ → [skæm]. The basic idea is that if segments such as [k] in /sk/ → [k] or [m] in /mp/ → [m] are produced as they would be in a singleton context, then the errors arise during phonological planning; alternatively, if they are produced as they would be in the cluster context, then the deletion errors arise during motor planning. This leads the authors to hypothesize that patients with these different error types would respond differently to intervention. So they treated all four patients with the same treatment, described as “repetition based speech motor learning practice”. Consistent with their hypothesis, the two patients with motor planning errors responded to this treatment and the two with phonological planning errors did not as shown in the table of pre- versus post-treatment results.

Buchwald et al results corrected table

However, as the authors point out, a significant limitation of this study is that the design is not experimental. Having failed to establish experimental control either within or across speakers it is difficult to draw conclusions.

I find the paper to be of interest on two accounts nonetheless. Firstly, their hypothesis is exactly the same hypothesis that Tanya Matthews and I posed for children who present with phonological versus motor planning deficits. Secondly, their hypothesis is fully compatible with the application of a single subject randomization design. Therefore it provides me with an opportunity to follow through with my promise from the previous blog, to demonstrate how to set up this design for clinical research.

For her dissertation research, Tanya identified 11 children with severe speech disorders and inconsistent speech sound errors who completed our full experimental paradigm. These children were diagnosed with either a phonological planning disorder or a motor planning disorder using the Syllable Repetition Task and other assessments as described in our recently CJSLPA paper, available open access here. Using those procedures, we found that 6 had a motor planning deficit and 5 had a phonological planning deficit.

Then we hypothesized that the children with motor planning disorders would respond to a treatment that targeted speech motor control: much like Brumbach et al., it included repetition practice according to the principles of motor practice during the practice parts of the session but during prepractice, children were taught to identify the target words and to identify mispronunciations of the target words so that they would be better able to integrate feedback and self-correct during repetition practice. Notice that direct and delayed imitation are important procedures in this approach. We called this the auditory-motor integration (AMI approach).

For children with Phonological Planning disorders we hypothesized that they would respond to a treatment similar to the principles suggested by Dodd et al (i.e., see core vocabulary approach). Specifically the children are taught to segment the target words into phonemes, associating the phonemes with visual cues. Then we taught the children to chain the phonemes back together into a single word. Finally, during the practice component of each session, we encouraged the children to produce the words using the visual cues when necessary. An important component of this approach is that auditory-visual models are not provided prior to the child’s production attempt-the child is forced to construct the phonological plan independently. We called this the phonological memory & planning (PMP) approach.

We also had a control condition that consisted solely of repetition practice (CON condition).

The big difference between our work and Brumbach et al. is that we tested our hypothesis using a single subject block randomization design, as described in our recent tutorial in Journal of Communication Disorders. The design was set up so that each of the 11 children experienced all three treatments. We chose 3 treatment targets for each child, randomly assigned the targets to each of the three treatments, and then randomly assigned the treatments to each of three sessions, scheduled to occur on different days of the week, 3 sessions per week for 6 weeks. You can see from the table below that each week counts as one block, so there are 6 blocks of 3 sessions for 18 sessions in total. The randomization scheme was generated blindly and independently using computer software for each child. The diagram below shows the treatment schedule for one of the children with a motor planning disorder.

Block Randomization TASC02 DPD Blog

This design allowed us to compare response to the three treatments within each child using a randomization test. For this child, the randomization test revealed a highly significant difference in favour of the AMI treatment as compared to the PMP treatment, as hypothesized for children with motor planning deficits. I don’t want to scoop Tanya’s thesis because she will finish it soon, before the end of 2017 I’m sure, but the long and the short of it is that we have a very clear results in favour of our hypothesis using this fully experimental design and the statistics that are licensed by it. I hope you will check out our tutorial on the application of this design: we show how flexible and versatile this design can be for addressing many different questions about speech-language practice. There is much exciting work being done in the area of speech motor control and this is a design that gives researchers and clinicians an opportunity to obtain interpretable results with small samples of children with rare or idiosyncratic profiles.

Reading

Buchwald, A., & Miozzo, M. (2012). Phonological and Motor Errors in Individuals With Acquired Sound Production Impairment. Journal of Speech, Language, and Hearing Research, 55(5), S1573-S1586. doi:10.1044/1092-4388(2012/11-0200)

Rvachew, S., & Matthews, T. (2017). Using the Syllable Repetition Task to Reveal Underlying Speech Processes in Childhood Apraxia of Speech: A Tutorial. Canadian Journal of Speech-Language Pathology and Audiology, 41(1), 106-126.

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders, 67, 1-13. doi:https://doi.org/10.1016/j.jcomdis.2017.04.003

 

Single Subject Randomization Design For Clinical Research

Ebbels tweet Intervention ResearchDuring the week April 23 – 29, 2017 Susan Ebbels is curated WeSpeechies on the topic Carrying Out Intervention Research in SLP/SLT Practice. Susan kicked off the week with a link to her excellent paper that discusses the strengths and limitations of various procedures for conducting intervention research in the clinical setting. As we would expect, a parallel groups randomized control design was deemed to provide the best level of experimental control. Many ways of studying treatment related change within individual clients, with increasing degrees of control were also discussed. However, all of the ‘within participant’ methods described were vulnerable to confounding by threats to internal validity such history, selection, practice, fatigue, maturation or placebo effects to varying degrees.

One design was missing from the list because it is only just now appearing in the speech-language pathology literature, specifically the Single Subject Randomization Design. The design (actually a group of designs in which treatment sessions are randomly allocated to treatment conditions) provides the superior internal validity of the parallel groups randomized control trial by controlling for extraneous confounds through randomization. As an added benefit the results of a single subject randomization design can be submitted to a statistical analysis, so that clear conclusions can be drawn about the efficacy of the experimental intervention. At the same time, the design can be feasibly implemented in the clinical setting and is perfect for answering the kinds of questions that come up in daily clinical practice. For example, randomized control trials have shown than speech perception training is an effective adjunct to speech articulation therapy on average when applied to groups of children but you may want to know if it is a necessary addition to your therapy program for a speciRomeiser Logan Levels of Evidence SCRfic child.

Furthermore,  randomized single subject experiments are now acceptable as a high level of research evidence by the Oxford Centre for Evidence Based Medicine. An evidence hierarchy has been created for rating single subject trials, putting the randomized single subject experiments at the top of the evidence hierarchy as shown in the following table, taken from Romeiser Logan et al. 2008.

 

Tanya Matthews and I have written a tutorial showing exactly how to implement and interpret two versions of the Single Subject Randomization Design, a phase design and an alternation design. The accepted manuscript is available but behind a paywall at the Journal of Communication Disorders. In another post I will provide a mini-tutorial showing how the alternation design could be used to answer a clinical question about a single client.

Further Reading

Ebbels, Susan H. 2017. ‘Intervention research: Appraising study designs, interpreting findings and creating research in clinical practice’, International Journal of Speech-Language Pathology: 1-14.

Kratochwill, Thomas R., and Joel R. Levin. 2010. ‘Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue’, Psychological Methods, 15: 124-44.

Romeiser Logan, L., R. Hickman, R.R. Harris, S.R. Harris, and C. Heriza. 2008. ‘Single-subject research design: recommendations for levels of evidence and quality rating’, Developmental Medicine and Child Neuroloogy, 50: 99-103.

Rvachew, S. 1988. ‘Application of single subject randomization designs to communicative disorders research’, Human Communication Canada (now Canadian Journal of Speech-Language Pathology and Audiology), 12: 7-13. [open access]

Rvachew, S. 1994. ‘Speech perception training can facilitate sound production learning.’, Journal of Speech and Hearing Research, 37: 347-57.

Rvachew, Susan, and Tanya Matthews. in press. ‘Demonstrating Treatment Efficacy using the Single Subject Randomization Design: A Tutorial and Demonstration’, Journal of Communication Disorders.

 

Single Subject Designs and Evidence Based Practice in Speech Therapy

I was really happy to see the tutorial on Single Subject Experimental designs in November’s issue of the American Journal of Speech-Language Pathology and Audiology, by Byiers, Reichle, and Symons. The paper does not really present anything new since it covers ground previously published by authors such as Kearns (1986). However, with the current focus on RCTs as the be-all and end-all for evidence based practice, it was a timely reminder that single-subject designs have a lot to offer for EPB in speech therapy. It really irritates me when I see profs tell their students that speech therapy practice does not have an evidentiary base: many of our standard practices are well grounded in good quality single subject research (not to mention some rather nice RCTs from the sixties as well but that is another story, maybe for another post).

Byiers et al. do a nice job of outlining the primary features of a valid single-subject experiment. The internal validity of the standard designs is completely dependent upon the stable baseline with no improving trend in the data prior to the introduction of the treatment. They indicate that “by convention, a minimum of three baseline data points are required to establish dependent measure stability.” Furthermore, it is essential to not see carry-over effects of treatment of one target to the second target prior to the introduction of treatment for the second target; in other words, performance on any given target must remain stable until treatment for that specific target is introduced. The internal validity of the experiment is voided when stable baselines for each target are not established and maintained throughout their respective baseline periods. This is true even for the multiple-probe design which is a variation on the multiple-baseline design in which the dependent measure is sampled at irregular intervals tied to the introduction of successive phases of the treatment program (as opposed to regular and repeated measurement  that occurs during each and every session of a multiple baseline design). Even with the multiple probe design, a series of closely spaced baseline probes are required at certain intervals to demonstrate stability of baselines just before you begin a new treatment phase. Furthermore, the design is an inappropriate choice unless a “strong a priori assumption of stability can be made” (see Horner and Baer, 1978).

I am interested in the multiple probe design because it is the preferred design of the research teams that claim that the “complexity approach” to target selection in phonology interventions is effective and efficient. However, it is clear that the design is not appropriate in this context (in fact, given the research question, I would argue that all single subject designs are inappropriate in this context).  The reasoning behind the complexity approach is that treating complex targets results in generalization of learning to less complex targets. This is supposed to be more efficient than treating the less complex targets first because these targets are expected to improve spontaneously without treatment (e.g., as a result of maturation) while not resulting in generalization to more complex targets. The problem of course is that improvements in less complex targets while you are treating a more complex one (especially when you get no improvement on the treatment target, see Cummings and Barlow, 2011) cannot be interpreted as a treatment effect. By the logic of a single-subject experiment, this outcome indicates that you do not have experimental control. To make matters worse, these improvements in generalization targets are often observed prior to the introduction of treatment –  and indeed the a priori assumption is that these improvements in less complex targets will occur without treatment – that is the whole rationale behind avoiding them as treatment targets! And therefore, by definition, both the multiple baseline and multiple probe designs are invalid approaches to the test of the complexity hypothesis. Without a randomized control trial one can only conclude that the changes observed in less complex targets in these studies are the result of maturation or history effects. (If you want to see what happens when you test the efficacy of the complexity approach using a randomized control trial, check out my publications: Rvachew & Nowak, 2001; Rvachew & Nowak, 2003; Rvachew, 2005; Rvachew & Bernhardt, 2010).

Some recent single subject studies have had some really nice outcomes for some children. Ballard, Robin and McCabe (2010) demonstrated an effective treatment for improving prosody in children with apraxia of speech, showing that work on pseudoword targets generalizes to real word dependent measures. Skelton (2004) showed that you can literally randomize your task sequence and get excellent results for the treatment of /s/ with carryover to the nonclinic environment (in other words you don’t have to follow the usual isolation-syllable- word-phrase-sentence sequence; rather, you can mix it up by practicing items with random difficulty level on every trial). Both of these studies showed uneven outcomes for different children however. Francoise and I suggested at ASHA2012 that the “challenge point framework” helps to explain variability in outcomes across children. The trick is to teach targets that are at the challenge point for the child – not uniformly complex but carefully selected to be neither too simple nor too complex for each individual child.

Both of these studies (Ballard et al. and the Skelton study) used a multiple baseline design. This design tends to encourage the selection of complex targets because consistent 0% correct is as stable as you can get in a baseline. If you want to pick targets that are at the “challenge point” you may be working on targets for which the child is demonstrating less stable performance. Fortunately there is a single subject design that does not require a stable baseline for internal validity – it is called a single subject randomization design. We are using two different variations on this design in our current study of different treatments for childhood apraxia of speech. I will describe our application of the design in another post.