Is Acoustic Feedback Effective for Remediating “r” Errors?

I am very pleased to see a third paper published in the speech-language pathology literature using the single-subject randomization design that I have described in two tutorials, the first in 1988 and the second more recently. Tara McAllister Byun used the design to investigate the effectiveness of acoustic biofeedback treatment to remediate persistent “r” errors in 7 children aged 9 to 15 years. She used the single subject randomized alternation design with block randomization, including a few unique elements in her implementation of the design. She and her research team provided one traditional treatment session and one biofeedback treatment session each week for ten weeks. However the order of the traditional and biofeedback sessions was randomized each week. Interestingly, each session targeted the same items (i.e., “r” was the speech sound target  in both treatment conditions): rhotic vowels were tackled first and consonantal “r” was introduced later, in a variety of phonetic contexts. (This procedure is a variance from my experience in which, for example, Tanya Matthews and I randomly assign different targets to different treatment conditions). Another innovation is the outcome measure: a probe constructed of untreated “r” words was given at the beginning and end of each session so that change (Mdif) over the session was the outcome measure submitted to statistical analysis (our tutorial explains that the advantage of the SSRD is that a nonparametric randomization test can be used to assess the outcome of the study, yielding a p value).  In addition, 3 baseline probes and 3 maintenance probes were collected so that an effect size for overall improvement could be calculated. In this way there are actually 3 time scales for measuring change in this study: (1) change from baseline to maintenance probes; (2) change from baseline to treatment performance as reflected in the probes obtained at the beginning of each session and plotted over time; and (3) change over a session, reflected in the probes given at the beginning and the end of each session. Furthermore, it is possible to compare differences in within session change for sessions provided with and without acoustic feedback.

I was really happy to see the implementation of the design but it is fair to say that the results were a dog’s breakfast, as summarized below:

Byun 2017 acoustic biofeedback

The table indicates that two participants (Piper, Clara) showed an effect of biofeedback treatment and generalization learning. Both showed rapid change in accuracy overall after treatment was introduced in both conditions and maintained at least some of that improvement after treatment was withdrawn. Garrat and Ian showed identical trajectories in the traditional and biofeedback conditions with a late rise in accuracy during treatment session, large within session improvements during the latter part of the treatment period, and good maintenance of those gains. Neither boy achieved 60% correct responding however at any point in the treatment program. Felix, Lucas and Evan demonstrated no change in probe scores across the twenty weeks of the experiment in both conditions. Lucas started at a higher level and therefore his probe performance is more variable: because he actually showed a within session decline during traditional sessions while showing stable performance within biofeedback sessions, the statistics indicate a treatment effect in favour of acoustic biofeedback but in fact no actual gains are observed.

So, this is a long description of the results that brings me to two conclusions: (1) the alternation design was the wrong choice for the hypothesis in these experiments; and (2) biofeedback was not effective for these children; even in those cases where it looks like there was an effect, the children were responsive to both biofeedback and the traditional intervention.

In a previous blog, I described the alternation design; there is another version of the single subject randomization design that would be more appropriate for Tara’s hypothesis however.  The thing about acoustic biofeedback is that it is not fundamentally different from traditional speech therapy, involving a similar sequence of events: (i) SLP says a word as an imitative model; (ii) child imitates the word; (iii) SLP provides informative or corrective feedback. In the case of incorrect responses in the traditional condition in Byun’s study, the SLP provided information about articulatory placement and reminded the child that the target involved certain articulatory movements (“make the back part of your tongue go back”). In the case of incorrect responses in the acoustic biofeedback condition, the SLP made reference to the acoustic spectrogram when providing feedback and reminded the child that the target involved certain formant movements (“make the third bump move over”). Firstly, the first two steps are completely overlapping in both conditions and secondly it can be expected that the articulatory cues given in the traditional condition will be remembered and their effects will carry-over into the biofeedback sessions. Therefore we can consider the acoustic biofeedback to be an add-on to traditional therapy. We want to know about the value added. Therefore the phase design is more appropriate: in this case, there would be 20 sessions (2 per week over 10 weeks as in Byun’s study), each session would be planned with the same format: beginning probe (optional), 100 practice trials with feedback, ending probe. The difference is that the starting point for the introduction of acoustic biofeedback would be selected at random. All the sessions that precede the randomly selected start point would be conducted with traditional feedback and all the remainder would be conducted with acoustic biofeedback. The first three would be designated as traditional and the last 3 would be designated as biofeedback for a 26 session protocol as described by Byun. Across the 7 children this would end up looking like a multiple baseline design except that (1) the duration of the baseline phase would be determined by random selection for each child; and (2) the baseline phase is actually the traditional treatment with the experimental phase testing the value added benefit of biofeedback. There are three possible categories of outcomes: no change after introduction of the biofeedback, an immediate change, or a late change. As with any single subject design, the change might be in level, trend or variance and the test statistic can be designed to capture any of those types of changes. The statistical analysis asks whether the obtained test statistic is bigger than all possible results given all of the possible random selection of starting points. Rvachew & Matthews (2016) provides a more complete  explanation of the statistical analysis.

I show below an imaginary result for Clara, using the data presented for her in Byun’s paper, as if the traditional treatment came first and then the biofeedback intervention. If we pretend that the randomly selected start point for the biofeedback intervention occurred exactly in the middle of the treatment period, the test statistic is the difference of the M(bf) and the M(trad) scores resulting in -2.308. All other possible random selections of starting points for intervention lead to 19 other possible mean differences, and 18 of them are bigger than the obtained test statistic leading to a p value of 18/20 = .9. In this data set the probe scores are actually bigger in the earlier part of the intervention when the traditional treatment is used and they do not get bigger when the biofeedback is introduced. These are the beginning probe scores obtained by Clara but Byun obtained a significant result in favour of biofeedback by block randomization and by examining change across each session. However, I am not completely sure that the improvements from beginning to ending probes are a positive sign—this result might reflect a failure to maintain gains from the previous session in one or the other condition.

Hypothetical Clara in SSR Phase Design

There are several reasons to think that both interventions that were used in Byun’s study might result in unsatisfactory generalization and maintenance. We discuss the principles of generalization in relation to theories of motor learning in Developmental Phonological Disorders: Foundations of Clinical Practice. One important principle is that the child needs a well-established representation of the acoustic-phonetic target. All seven of the children in Byun’s study had poor auditory processing skills but no part of the treatment program addressed phonological processing, phonological knowledge or acoustic phonetic representations. Second, it is essential to have the tools to monitor and use self-produced feedback (auditory, somatosensory) to evaluate success in achieving the target. Both the traditional and the biofeedback intervention put the child in the position of being dependent upon external feedback. The outcome measure focused attention on improvements from the beginning of the practice session to the end. The first principle of motor learning is that practice performance is not an indication of learning however.  The focus should have been on the sometimes large decrements in probe scores from the end of one session to the beginning of the next. The children had no means of maintaining any of those performance gains. Acoustic feedback may be a powerful means of establishing a new response but it is a counterproductive tool for maintenance and generalization learning.


McAllister Byun, T. (2017). Efficacy of Visual–Acoustic Biofeedback Intervention for Residual Rhotic Errors: A Single-Subject Randomization Study. Journal of Speech, Language, and Hearing Research, 60(5), 1175-1193. doi:10.1044/2016_JSLHR-S-16-0038

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders, 67, 1-13. doi:



Speech Perception and Persistent Speech Errors

Jon Preston is in my opinion the most interesting researcher to watch in the field of speech sound disorders. His recent studies on structural and functional brain differences in older children with persistent speech errors are very interesting. The two studies complement each other and point to structural and functional differences in supra superior temporal gyrus, consistent with repeated findings of perceptual deficits in children with speech deficits. Additional differences in supramarginal gyrus implicate integration of auditory and somatosensory information during feedback processes that are important for learning to produce speech sounds as well as monitoring and fine-tuning speech sound production as the articulatory system matures.

Given these neuroimaging findings, it makes sense to look for behavioral indices of perceptual difficulties in this population of children with persistent speech sound errors. Recently Preston et al (2015) used SAILS to do this with two different groups of school aged children. SAILS is a tool that I developed for speech therapy with preschoolers. Each module consists of natural speech recordings of adults and children producing a word that begins with a commonly misarticulated phoneme, for example, the word “rat”. Half the words are produced correctly and the remainder are misarticulated, e.g., [wæt], [jæt], [ɹ̮æt]. Each module is designed as a series of blocks in which the contrasts are theoretically closer, e.g., practice [ɹæt] versus [mæt], block 1 [ɹæt] versus [wæt], [jæt] and block 2 [ɹæt] versus [ɹ̮æt].  The child’s task is to identify the words that are “good” representatives of the target word. Although the blocks are numbered, they do not necessarily fall into a linear difficulty scale because each child can be quite idiosyncratic in terms of the features that they attend to. I’ll come back to this point later. After establishing that the tool was effective as an intervention for improving children’s speech perception and speech production skills, I found that it also had some value as an assessment tool (Rvachew & Grawburg, 2006) although I do not feel that the psychometric qualities are particularly good and I certainly did not design it for that purpose.

Now, back to Jon Preston’s study. In the first study, older children with [ɹ] distortions were compared to children with correctly produced [ɹ] and no history of speech delay. They were administered only the “most difficult” levels of SAILS modules including the [ɹ] Level 2 module. Although 1/20 children with typical speech and 6/27 misarticulators failed the [ɹ] SAILS module, the mean difference between groups was not significant. In the second study, a group of 25 children who received speech therapy as preschoolers was tested with SAILS, 3 years later when the speech deficit was resolved except in some cases for a persistent speech sound distortion. Performance on the “most difficult” [s] or [ɹ] module was correlated with their performance on an [s] or [ɹ] production probe. There was no correlation. (I was initially mystified by the perception results because they didn’t look like anything I had seen before but that was before I realized that the children were not presented with the complete test!).

So, how do we interpret these results? I have three comments.

First, Preston, Irwin, & Turcios have done us all a good turn by establishing that SAILS is NOT a good tool for assessing speech perception in 7 to 14 year old children with persistent speech errors. I never intended it for that purpose and I am pleased to have empirical evidence that supports a clear answer to the question when it comes up (we should be grateful to Seminars in Speech and Language for publishing it I suppose, despite the small samples, because rumour has it that ‘negative results’ are hard to publish). Anyway, we need something better for testing speech perception, especially for older children. I invite reader comments on what that “something better” would be. We know from many studies using synthetic speech that this population is at risk for perceptual deficits. We need to be able to identify those children in the clinic.

Second, if you are going to use SAILS for assessment (with children aged 4 to 7) it is very important to administer the complete module to the child, working through all the levels of the module, in order as intended. We cannot be sure that the child’s response to, for example Level 3 /s/, will mirror that of the normative samples who encountered Level 3 after first working through Practice, Level 1 and Level 2 before getting there. I will come back to this in another post in which I will give a sneak peek at the upcoming second edition of our book Developmental Phonological Disorders: Foundations of Clinical Practice.

Third, the relationship between speech perception and speech production is not linear. Even though I have found relationships between speech perception and speech production in the past using some rather fancy statistics with large groups (Rvachew & Grawburg, 2007; Rvachew, 2006), I cannot at the individual child level relate in a simple fashion SAILS score with number of correct productions of a phoneme. The reason is that the child’s production and perception of a phoneme is related to the way in which the child attends to the features associated with phoneme contrasts and certain features have different information value for perception versus production. We give an example of this in Chapter 4 of DPD (from Alyssa Ohberg’s masters thesis): preschoolers who were stimulable for /θ/ and /s/ but had not mastered this contrast were administered the SAILS /θ/ assessment module. Some children, in their speech, differentiated /s/-/θ/ by manipulating the duration cue whereas others differentiated /s/-/θ/ by manipulating the spectral cue; as you would expect, manipulating the spectral cue resulted in comparatively better articulatory accuracy but these two groups produced roughly comparable perceptual performance but with some interesting differences. The children who attended to the spectral cue actually did better on the supposedly “harder” level 3 stimuli than the supposedly “easier” level 2 stimuli, highlighting again that there is not a linear difficulty gradient across the stimulus blocks. The children who attended to the duration cue did surprisingly well at levels 2 and 3.  For some stimuli, attention to the duration cue actually provides an advantage. This results occurs because duration is actually a pretty reliable cue for perception of /θ/ but it does not provide any information that helps the child achieve the critical articulatory gestures (e.g., grooved versus nongrooved tongue,interdental versus alveolar tongue tip placement). In this case, there is no direct linear relationship between the child’s speech perception score and their speech production score on the tests that we gave. However, there is a direct relation between the child’s perceptual focus on only one of the relevant acoustic cues and their inability to produce the phoneme correctly. The only children who achieved good perception scores and good production scores attended to both the duration and the centroid cues.

This example raises a fourth point and that is, drawing on Shuster’s findings, the best test for older children may well involve using the child’s own speech production output. The most important question is, does the child mistakenly believe that their own productions are accurate and acceptable representations of the target category? I cannot recommend Shuster’s brilliant study highly enough for anyone treating this population. If the child does prove to have incomplete perceptual knowledge of /ɹ/ or /s/ however, treatment that includes highly variable (multi-talker) stimuli remains important, as a general rule of perceptual learning.

Further Reading

Preston, J. L., Felsenfeld, S., Frost, S. J., Mencl, W. E., Fulbright, R. K., Grigorenko, E. L., . . . Pugh, K. R. (2012). Functional Brain Activation Differences in School-Age Children With Speech Sound Errors: Speech and Print Processing. Journal of Speech, Language, and Hearing Research, 55(4), 1068-1082. doi: 10.1044/1092-4388(2011/11-0056)

Preston, J. L., Molfese, P. J., Mencl, W. E., Frost, S. J., Hoeft, F., Fulbright, R. K., … & Pugh, K. R. (2014). Structural brain differences in school-age children with residual speech sound errors. Brain and Language, 128(1), 25-33.

Preston, J. L., Irwin, J. R., & Turcios, J. (2015). Perception of Speech Sounds in School-Aged Children with Speech Sound Disorders. Seminars in Speech and Language, 36(04), 224-233. doi: 10.1055/s-0035-1562906

Rvachew, S. (2006). Longitudinal prediction of implicit phonological awareness skills. American Journal of Speech-Language Pathology, 15, 165-176.

Rvachew, S., & Grawburg, M. (2006). Correlates of phonological awareness in preschoolers with speech sound disorders. Journal of Speech, Language, and Hearing Research, 49, 74-87.

Shuster, L. I. (1998). The perception of correctly and incorrectly produced /r/. Journal of Speech, Language, and Hearing Research, 41, 941-950.