Is Acoustic Feedback Effective for Remediating “r” Errors?

I am very pleased to see a third paper published in the speech-language pathology literature using the single-subject randomization design that I have described in two tutorials, the first in 1988 and the second more recently. Tara McAllister Byun used the design to investigate the effectiveness of acoustic biofeedback treatment to remediate persistent “r” errors in 7 children aged 9 to 15 years. She used the single subject randomized alternation design with block randomization, including a few unique elements in her implementation of the design. She and her research team provided one traditional treatment session and one biofeedback treatment session each week for ten weeks. However the order of the traditional and biofeedback sessions was randomized each week. Interestingly, each session targeted the same items (i.e., “r” was the speech sound target  in both treatment conditions): rhotic vowels were tackled first and consonantal “r” was introduced later, in a variety of phonetic contexts. (This procedure is a variance from my experience in which, for example, Tanya Matthews and I randomly assign different targets to different treatment conditions). Another innovation is the outcome measure: a probe constructed of untreated “r” words was given at the beginning and end of each session so that change (Mdif) over the session was the outcome measure submitted to statistical analysis (our tutorial explains that the advantage of the SSRD is that a nonparametric randomization test can be used to assess the outcome of the study, yielding a p value).  In addition, 3 baseline probes and 3 maintenance probes were collected so that an effect size for overall improvement could be calculated. In this way there are actually 3 time scales for measuring change in this study: (1) change from baseline to maintenance probes; (2) change from baseline to treatment performance as reflected in the probes obtained at the beginning of each session and plotted over time; and (3) change over a session, reflected in the probes given at the beginning and the end of each session. Furthermore, it is possible to compare differences in within session change for sessions provided with and without acoustic feedback.

I was really happy to see the implementation of the design but it is fair to say that the results were a dog’s breakfast, as summarized below:

Byun 2017 acoustic biofeedback

The table indicates that two participants (Piper, Clara) showed an effect of biofeedback treatment and generalization learning. Both showed rapid change in accuracy overall after treatment was introduced in both conditions and maintained at least some of that improvement after treatment was withdrawn. Garrat and Ian showed identical trajectories in the traditional and biofeedback conditions with a late rise in accuracy during treatment session, large within session improvements during the latter part of the treatment period, and good maintenance of those gains. Neither boy achieved 60% correct responding however at any point in the treatment program. Felix, Lucas and Evan demonstrated no change in probe scores across the twenty weeks of the experiment in both conditions. Lucas started at a higher level and therefore his probe performance is more variable: because he actually showed a within session decline during traditional sessions while showing stable performance within biofeedback sessions, the statistics indicate a treatment effect in favour of acoustic biofeedback but in fact no actual gains are observed.

So, this is a long description of the results that brings me to two conclusions: (1) the alternation design was the wrong choice for the hypothesis in these experiments; and (2) biofeedback was not effective for these children; even in those cases where it looks like there was an effect, the children were responsive to both biofeedback and the traditional intervention.

In a previous blog, I described the alternation design; there is another version of the single subject randomization design that would be more appropriate for Tara’s hypothesis however.  The thing about acoustic biofeedback is that it is not fundamentally different from traditional speech therapy, involving a similar sequence of events: (i) SLP says a word as an imitative model; (ii) child imitates the word; (iii) SLP provides informative or corrective feedback. In the case of incorrect responses in the traditional condition in Byun’s study, the SLP provided information about articulatory placement and reminded the child that the target involved certain articulatory movements (“make the back part of your tongue go back”). In the case of incorrect responses in the acoustic biofeedback condition, the SLP made reference to the acoustic spectrogram when providing feedback and reminded the child that the target involved certain formant movements (“make the third bump move over”). Firstly, the first two steps are completely overlapping in both conditions and secondly it can be expected that the articulatory cues given in the traditional condition will be remembered and their effects will carry-over into the biofeedback sessions. Therefore we can consider the acoustic biofeedback to be an add-on to traditional therapy. We want to know about the value added. Therefore the phase design is more appropriate: in this case, there would be 20 sessions (2 per week over 10 weeks as in Byun’s study), each session would be planned with the same format: beginning probe (optional), 100 practice trials with feedback, ending probe. The difference is that the starting point for the introduction of acoustic biofeedback would be selected at random. All the sessions that precede the randomly selected start point would be conducted with traditional feedback and all the remainder would be conducted with acoustic biofeedback. The first three would be designated as traditional and the last 3 would be designated as biofeedback for a 26 session protocol as described by Byun. Across the 7 children this would end up looking like a multiple baseline design except that (1) the duration of the baseline phase would be determined by random selection for each child; and (2) the baseline phase is actually the traditional treatment with the experimental phase testing the value added benefit of biofeedback. There are three possible categories of outcomes: no change after introduction of the biofeedback, an immediate change, or a late change. As with any single subject design, the change might be in level, trend or variance and the test statistic can be designed to capture any of those types of changes. The statistical analysis asks whether the obtained test statistic is bigger than all possible results given all of the possible random selection of starting points. Rvachew & Matthews (2016) provides a more complete  explanation of the statistical analysis.

I show below an imaginary result for Clara, using the data presented for her in Byun’s paper, as if the traditional treatment came first and then the biofeedback intervention. If we pretend that the randomly selected start point for the biofeedback intervention occurred exactly in the middle of the treatment period, the test statistic is the difference of the M(bf) and the M(trad) scores resulting in -2.308. All other possible random selections of starting points for intervention lead to 19 other possible mean differences, and 18 of them are bigger than the obtained test statistic leading to a p value of 18/20 = .9. In this data set the probe scores are actually bigger in the earlier part of the intervention when the traditional treatment is used and they do not get bigger when the biofeedback is introduced. These are the beginning probe scores obtained by Clara but Byun obtained a significant result in favour of biofeedback by block randomization and by examining change across each session. However, I am not completely sure that the improvements from beginning to ending probes are a positive sign—this result might reflect a failure to maintain gains from the previous session in one or the other condition.

Hypothetical Clara in SSR Phase Design

There are several reasons to think that both interventions that were used in Byun’s study might result in unsatisfactory generalization and maintenance. We discuss the principles of generalization in relation to theories of motor learning in Developmental Phonological Disorders: Foundations of Clinical Practice. One important principle is that the child needs a well-established representation of the acoustic-phonetic target. All seven of the children in Byun’s study had poor auditory processing skills but no part of the treatment program addressed phonological processing, phonological knowledge or acoustic phonetic representations. Second, it is essential to have the tools to monitor and use self-produced feedback (auditory, somatosensory) to evaluate success in achieving the target. Both the traditional and the biofeedback intervention put the child in the position of being dependent upon external feedback. The outcome measure focused attention on improvements from the beginning of the practice session to the end. The first principle of motor learning is that practice performance is not an indication of learning however.  The focus should have been on the sometimes large decrements in probe scores from the end of one session to the beginning of the next. The children had no means of maintaining any of those performance gains. Acoustic feedback may be a powerful means of establishing a new response but it is a counterproductive tool for maintenance and generalization learning.

Reading

McAllister Byun, T. (2017). Efficacy of Visual–Acoustic Biofeedback Intervention for Residual Rhotic Errors: A Single-Subject Randomization Study. Journal of Speech, Language, and Hearing Research, 60(5), 1175-1193. doi:10.1044/2016_JSLHR-S-16-0038

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders, 67, 1-13. doi:https://doi.org/10.1016/j.jcomdis.2017.04.003

 

Advertisements

How effective is phonology treatment?

Previously I asked whether it made sense to calculate effect sizes for phonology therapy at the within subject level. In other words, from the clinical point of view, do we really want to know whether the child’s rate of change is bigger during treatment than it was when the child was not being treated? Or, do we want to know if the child’s rate of change is bigger than the average amount of change observed among groups of children who get treated? If children who get treated typically change quite a bit and your client is not changing much at all, that might indicate a course correction (and note please, not a treatment rest!). From this perspective, group level effect sizes might be useful so I am providing raw and standardized effect sizes here from three of my past studies with a discussion to follow.

Rvachew, S., & Nowak, M. (2001). The effect of target selection strategy on sound production learning. Journal of Speech, Language, and Hearing Research, 44, 610-623.

The first data set involves 48 four-year-old children who scored at the second percentile, on average, on the GFTA (and 61 percent consonants correct in conversation). They were randomly assigned to receive treatment for relatively early developing stimulable sound targets (ME group, n=24) or late developing unstimulable sound targets (LL group, n=24). Each received treatment for four sounds over 2 six-week blocks, during 12 30 to 40 minute treatment sessions. The treatment approach employed traditional articulation therapy procedures. The children did not receive homework or additional speech and language interventions during this 12 week period. Outcome measures included single word naming probes covering all consonants in 3 word positions and percent consonants correct (PCC) in conversation, with 12 to 14 weeks intervening between the pre- and the post-test assessments. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) which indicates the mean pre- to post-change in percent consonants corrects on probes and in conversation; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for three outcome measures: single word naming probe scores for unstimulable phonemes, probe scores for stimulable phonemes, and percent consonants correct (PCC) obtained from conversations recorded while the child looked at a wordless picture book with the assessor.Effect size blog figure 2.

Some initial conclusions can be drawn from this table. The effect sizes for change in probe scores are all large. However, the group that received treatment for stimulable sounds showed greater improvement for both treated stimulable sounds and untreated unstimulable sounds compared to the group that received treatment for unstimulable sounds. There was almost no change in PCC derived from the conversational samples overall. I can report that 10 children in the ME group and 6 children in the LL group achieved improvements of greater than 5 PCC points, judged to be a “minimally important change”  by Thomas-Stonell et al. (2013). Half the children achieved no change at all however in PCC (conversation).

Rvachew, S., Nowak, M., & Cloutier, G. (2004). Effect of phonemic perception training on the speech production and phonological awareness skills of children with expressive phonological delay. American Journal of Speech-Language Pathology, 13, 250-263.

The second data set involves 34 four-year-old children who scored at the second percentile, on average, on the GFTA (and approximately 60 percent consonants correct in conversation). All of the children received 16 hour-long speech therapy sessions, once-weekly. The treatment that they received was entirely determined by their SLP with regard to target selection and approach to intervention. Ten SLPs provided the interventions, 3 using the Hodson cycles approach, 1 a sensory motor approach and the remainder using a traditional articulation therapy approach. The RCT element of this study is that the children were randomly assigned to an extra treatment procedure that occurred during the final 15 minutes of each session, concealed from their SLP. Children in the control group (n=17) listened to ebooks and answered questions. Children randomly assigned to the PA group (n=17) played a computer game that targeted phonemic perception and phonological awareness covering 8 phonemes in word initial and then word final position. Although the intervention lasted 4 months, the interval between pre-treatment and post-treatment assessments was 6 months long. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) indicates the mean pre- to post-change in percent consonants correct; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for two outcome measures: percent consonants correct (PCC) obtained from conversations recorded while the child looked at a wordless picture book with the assessor; and PCC-difficult, derived from the same conversations but restricted to phonemes that were produced with less than 60% accuracy at intake-in other words, phonemes that were potential treatment targets, specifically /ŋ,k,ɡ,v,ʃ,ʧ,ʤ,θ,ð,s,z,l,ɹ/.

Effect size blog figure 3

The sobering finding here is that the control group effect size for potential treatment targets is the smallest, with half the group making no change and the other half making a small change. The effect size for PCC (all) in the control group is more satisfying in that it is better than the minimally important change (i.e., 8% > 5%); 13 children in this group achieved a change of more than 5 points and only 3 made no change at all. The effect sizes are large in the group that received the Speech Perception/PA intervention in addition to their regular SLP program with good results for PCC (all) and PCC-difficult. This table shows that the SLP’s choice of treatment procedures makes a difference to speech accuracy outcomes.

Rvachew, S., & Brosseau-Lapré, F. (2015). A randomized trial of twelve week interventions for the treatment of developmental phonological disorder in francophone children. American Journal of Speech-Language Pathology, 24, 637-658. doi:10.1044/2015_AJSLP-14-0056

The third data set involves data from 64 French-speaking four-year-olds who were randomly assigned to receive either an output oriented intervention (n = 30) or an input-oriented intervention (n = 34) for remediation of their speech sound disorder. Another 10 children who were not treated also provide effect size data here. The children obtained PCC scores of approximately 70% on the Test Francophone de Phonologie, indicating severe speech sound disorder (consonant accuracy is typically higher in French-speaking children, compared to English). The children received other interventions as well as described in the research report (home programs and group phonological awareness therapy) with the complete treatment program lasting 12 weeks. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) indicates the mean pre- to post-change in percent consonants correct; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for two outcome measures: percent consonants correct with glides excluded (PCC), obtained from the Test Francophone de Phonologie, a single word naming test; PCC-difficult, derived from the same test but restricted to phonemes that were produced with less than 60% accuracy at intake-specifically /ʃ,ʒ,l,ʁ/. An outcome measure restricted to phonemes that were absent from the inventory at intake is not possible for this group because French-speaking children with speech sound disorders have good phonetic repertoires for the most part as their speech errors tend to involve syllable structure (see Brosseau-Lapré and Rvachew, 2014).

Effectsize blog figure 4

There are two satisfying findings here: first, when we do not treat children with a speech sound disorder, they do not change, and when we do treat them, they do! Second, when children receive an appropriate suite of treatment elements, large changes in PCC can be observed even over an observation interval as short as 12 weeks.

Overall Conclusions

  1. In the introductory blog to this series, I pointed out that Thomas-Stonell and her colleagues had identified a PCC change of 5 points as a “minimally important change”. The data presented here suggests that this goal can be met for most children over a 3 to 6 months period when children are receiving an appropriate intervention. The only case where this minimum standard was not met on average was in Rvachew & Nowak (2001), a study in which a strictly traditional articulation therapy approach was implemented at low intensity with no homework component.
  2. The measure that we are calling PCC-difficult might be more sensitive and more ecologically valid for 3 and 6 month intervals. This is percent consonants correct, restricted to potential treatment targets, so those consonants that are produced with less than 60% accuracy at intake. These turn out to be mid- to late-developing frequently misarticulated phonemes, therefore /ŋ,k,ɡ,v,ʃ,ʧ,ʤ,θ,ð,s,z,l,ɹ/ in English and /ʃ,ʒ,l,ʁ/ in French for these samples of 4-year-old children with severe and moderate-to-severe primary speech sound disorders. My impression is that when providing an appropriate intervention an SLP should expect at least a 10% change in these phonemes whether assessed with a broad based single word naming probe or in conversation-in fact a 15% change is closer to the average. This does not mean that you should treat the most difficult sounds first! Look carefully at the effect size data from Rvachew and Nowak (2001): when we treated stimulable phonemes we observed a 15% improvement in difficult unstimulable sounds. You can always treat a variety of phonemes from different levels of the phonological hierarchy as described in a previous blog.
  3. Approximately 10% of 4-year-old children with severe and moderate-to-severe primary speech sound disorders do not improve at all over a 3 to 6 month period, given adequate speech therapy. If a child is not improving, the SLP and the parent should be aware that this is a rare event that requires special attention.
  4. In a previous blog I cited some research evidence for the conclusion that patients treated as part of research trials achieve better outcomes than patients treated in a usual care situation. There is some evidence for that in these data. The group in Rvachew, Nowak and Cloutier that received usual care obtained a lower effect size (d=0.45) in comparison to the group that received an extra experimental intervention (d=1.31). In practical terms this difference meant that the group that received the experimental intervention made four times more improvement in the production of difficult sounds than the control group that received usual care.
  5. The variation in effect sizes that is shown in these data indicate that SLP decisions about treatment procedures and service delivery options have implications for success in therapy. What are the characteristics of the interventions that led to relatively large changes in PCC or relatively large standardized effect sizes? (i) Comprehensiveness, that is the inclusion of intervention procedures that target more than one level of representation, e.g., procedures to improve articulation accuracy and speech perception skills and/or phonological awareness; and (ii) parent involvement, specifically the inclusion of a well-structured and supported home program.

If you see other messages in these data, or have observations from your own practice or research, please write to me in the comments.

 

 

Are effect sizes in research papers useful in SLP practice?

Effect size blog figure 1Effect sizes are now required in addition to statistical significance reporting in scientific reports. As discussed in a previous blog, effect sizes are useful for research purposes because they can be aggregated across studies to draw conclusions (i.e., as, in a meta-analysis). However, they are also intended to be useful as an indication of the “practical consequences of the findings for daily life.” Therefore, Gierut, Morrisette, & Dickinson’s paper “Effect Size for Single-Subject Design in Phonological Treatment” was of considerable interest to me when it was published in 2015. They report the distribution of effect sizes for 135 multiple baseline studies using a pooled standard deviation for the baseline phase of the studies as the denominator and the mean of the treatment phase minus the mean of the baseline phase as the numerator in the equation to calculate the effect size statistic. In these studies, the mean and the variance of probe scores in the baseline phase is restricted to be very small by design, because the treatment targets and generalization probe targets must show close to stable 0% correct performance during the baseline phase. The consequence of this restriction is that the effect size number will be very large even when the raw amount of performance change is not so great. Therefore the figure above shows hypothetical data that yields exactly their average effect size of 3.66 (specifically, [8.57%-1.25%]/.02 = 3.66). This effect size is termed a medium effect size in their paper but I leave it to the reader to decide if a change of not quite 9% accuracy in speech sound production is an acceptable level of change. It may be because in these studies, a treatment effect is operationalized as probe scores (single word naming task) for all the phonemes that were absent from the child’s repertoire at intake. From the research point of view this paper provides very important information: it permits researchers to compare effect sizes and explore variables that account for between-case differences in effect sizes in those cases where the researchers have used a multiple baseline design and treatment intensities similar to those reported in this paper (5 to 19 one-hour sessions typically delivered 3 times per week).

The question I am asking myself is whether the distribution of effect sizes that is reported in this paper is helpful to clinicians who are concerned with the practical significance of these studies. I ask this because I am starting to see manuscripts reporting clinical case studies in which the data are used to claim “large treatment effects” for a single case (using Gierut et al’s standard of an effect size of 6.32 or greater). Indeed, in the clinical setting SLPs will be asked to consider whether their clients are making “enough” progress. For example, in Rvachew and Nowak (2001) we asked parents to rate their agreement with the statement “My child’s communication skills are improving as fast as can be expected.” (This question was on our standard patient satisfaction questionnaire so in fact, we asked every parent this question, not just the ones in this RCT). But the parent responses in the RCT showed that there were significant between group differences in response to this question that aligned with the dramatic differences in child response to the traditional versus complexity approach to target selection that was tested in that study (e.g., 34% vs. 17% of targets mastered in these groups respectively). It seems to me that when a parent asks themselves this question they have multiple frames of reference: not only do they consider the child’s communicative competence before and after the introduction of therapy, they consider whether their child would make more or less change with other hypothetical SLPs and other treatment approaches, given that parents actually have choices about these things. Therefore, an effect size that says effectively, the child made more progress with treatment compared to no treatment is not really answering the parent’s question. However, with a group design it is possible to calculate an effect size that reflects change relative to the average amount of change one might expect, given therapy. To my mind this kind of effect size comes closer to answering the questions about practical significance that a parent or employer might ask.

This still leaves us with the question of what kind of change to describe. It is unfortunate that there are few if any controlled studies that have reported functional measures. I can think of some examples of descriptive studies that reported functional measures however. First, Campbell (1999) reported that good functional outcomes were achieved when preschoolers with moderate and severe Speech Delay received twice-weekly therapy over a 90- to 120-day period (i.e., on average the children’s speech intelligibility improved from approximately 50% to 75% intelligible as reported by parents). Second, there are a number of studies reporting ASHA-NOMS (functional communication measures provided by treating SLPs) for children receiving speech and language therapy. However, Thomas-Stonell et al (2007) found that improvement on the ASHA-NOMS was not as sensitive as parental reports of “real life communication change” over a 3 to 6 month interval. Therefore, Thomas-Stonell and her colleagues developed the FOCUS to document parental reports of functional outcomes in a reliable and standardized manner.

Thomas-Stonell et al (2013) report changes in FOCUS scores for 97 preschool aged children who received an average of 9 hours of SLP service in Canada, comparing change during the waiting period (60 day interval) to change during the treatment period (90 day interval). FOCUS assessments demonstrated significantly more change during treatment (about 18 FOCUS points on average) than during the wait period (about 6 FOCUS points on average). Then they compared minimally important changes in PCC, the Children’s Speech Intelligibility Measure, and FOCUS scores for 28 preschool aged children. The FOCUS measure was significantly correlated with the speech accuracy and intelligibility measures but there was not perfect agreement among these measures. For example, 21/28 children obtained a minimally important change of at least 16 points on the FOCUS but 4 of those children did not show significant change on PCC/CSIM. In other words speech accuracy, speech intelligibility and functional improvements are related but not completely aligned; each provides independent information about change over time.

In controlled studies, some version of percent consonants correct is a very common treatment outcome that is used  to assess the efficacy of phonology therapy. Gierut et al (2015) focused specifically on change in those phonemes that are late developing and produced with very low accuracy, if not completely absent from the child’s repertoire at intake. This strikes me as a defensible measure of treatment outcome. Regardless of whether one chooses to treat a complex sound, an early developing sound, a medium-difficulty sound (or one of each as I demonstrated in a previous blog), presumably the SLP wants to have dramatic effects across the child’s phonological system. Evidence that the child is adding new sounds to the repertoire is a good indicator of that kind of change. Alternatively the SLP might count increases in correct use of all consonants that were potential treatment targets prior to the onset of treatment. Or, the SLP could count percent consonants correct for all the consonants because this measure is associated with intelligibility and takes into account the fact that there can be regressions in previously mastered sounds when phonological reorganization is occurring. The number of choices suggests that it would be valuable to have effect size data for a number of possible indicators of change. More to the point, Gierut et al’s single subject effect size implies that almost any change above “no change” is an acceptable level of change in a population that receives intervention because they are stalled without it. I am curious to know if this is a reasonable position to take. In my next blog post I will report effect sizes for these speech accuracy measures taken from my own studies going back to 2001. I will also discuss the clinical significance of the effect sizes that I will aggregate. I am going to calculate the effect size for paired mean differences along with the corresponding confidence intervals for groups of preschoolers treated in three different studies. I haven’t done the calculations yet, so, for those readers who are at all interested in this, you can hold your breath with me.

References

Campbell, T. F. (1999). Functional treatment outcomes in young children with motor speech disorders. In A. Caruso & E. A. Strand (Eds.), Clinical Management of Motor Speech Disorders in Children (pp. 385-395). New York: Thieme Medical Publishers, Inc.

Gierut, J. A., Morrisette, M. L., & Dickinson, S. L. (2015). Effect Size for Single-Subject Design in Phonological Treatment. Journal of Speech, Language, and Hearing Research, 58(5), 1464-1481. doi:10.1044/2015_JSLHR-S-14-0299

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 1-12. doi:10.3389/fpsyg.2013.00863

Thomas-Stonell, N., McConney-Ellis, S., Oddson, B., Robertson, B., & Rosenbaum, P. (2007). An evaluation of the responsiveness of the pre-kindergarten ASHA NOMS. Canadian Journal of Speech-Language Pathology and Audiology, 31(2), 74-82.

Thomas-Stonell, N., Oddson, B., Robertson, B., & Rosenbaum, P. (2013). Validation of the Focus on the Outcomes of Communication under Six outcome measure. Developmental Medicine and Child Neuroloogy, 55(6), 546-552. doi:10.1111/dmcn.12123

Rvachew, S., & Nowak, M. (2001). The effect of target selection strategy on sound production learning. Journal of Speech, Language, and Hearing Research, 44, 610-623.

 

 

 

Research Engagement with SLPs

I still have days when I miss my former job as a research coordinator in a hospital speech-language department. As a faculty researcher, I try to embed my research in clinical settings as often as I can but it is not easy. Administrators, in particular, and speech-language pathologists on occasion may be leery of the time requirement and often worry that the project might shine too bright a light on every day clinical practices that may not be up to the highest evidence based standard. I always try to design projects that are mutually beneficial to the research team and the clinical setting. As a potential support to the promise of mutual benefit, I was pleased to read a recent paper in the British Medical Journal “Does the engagement of clinicians and organizations in research improve healthcare performance: a three-stage review”. On the basis of an hour-glass shaped review, using an interpretive sythesis of the literature on the topic, Boaz, Hanney, Jones, and Saper drew the following conclusions:

Some papers reported an association between hospital participation in research and improved patient outcomes. Some of these findings were quite striking as for example significantly worse survival from ovarian cancer in “non study hospitals” versus hospitals involved in research trials (my sister-in-law died from this terrible disease this month so I couldn’t help but notice this).

A majority of papers reported an association between hospital participation in research and improved processes of healthcare. This includes the adoption of innovative treatments as well as better compliance with best practice guidelines.

Different causal mechanisms may account for these findings when examining impacts at the clinician versus organization level. For example, involvement in a clinical trial may include staff training and other experiences that change clinician attitudes and behaviors. Higher up, participation in the trial may require the organization to acquire new infrastructure or adopt new policies.

The direction of cause and effect may be difficult to discern. Specifically, a hospital that is open to involvement in research may have a higher proportion of research-active staff who have unique skills, specialization or personal characteristics. These characteristics may jointly improve healthcare outcomes in that setting and that make those staff more amenable to engagement with research.

This last point resonates well with my experience at the Alberta Children’s Hospital in the 80’s and 90’s. The hospital had a very large SLP department, up to 30 SLPs, permitting considerable specialization among us. Furthermore, as a teaching hospital we a had a good network of linkages to the two universities in the province and to a broad array of referral sources. Our working model, that was based on multidisciplinary teams, also supported involvement in research. Currently, in Montreal I am able to set up research clinics in healthcare and educational settings from time to time, but none of them have the resources that we enjoyed in Alberta three decades ago.

Of course, direct involvement in research is not the only way for SLPs to engage with research evidence. Another paper, published in Research in Developmental Disabilities used a survey to explore “Knowledge acquisition and research evidence in autism.” Carrington et al found that researchers and practitioners had somewhat different perspectives. The researcher group (n=256) and the practitioner group (n=422) identified sources of information that they used to stay up to date with current information on autism. Researchers were more likely to identify scientific journals and their colleagues whereas practitioners were more likely to identify conferences/PD workshops and non-academic journals. Respondents also identified sources of information that they thought would help practitioners translate research to practice. Researchers thought that nontechnical summaries and interactions with researchers would be most helpful. Practitioners identified academic journals as the best source of information (but the paper doesn’t explain why they were not using these journals as their primary source).

Finally, the most interesting finding for me was that both groups did not use or suggest social media as a helpful source of information. I thought this was odd because social media is a potential access point to academic journal articles or summaries of those articles as well as a way of interacting directly with scientists.

The authors concluded that knowledge translation requires that practitioners be engaged with research and researchers. For that to happen they suggest that “research should focus on priority areas that meet the needs of the research-user community” and that “attempts to bridge the research-practice gap need to involve greater collaboration between autism researchers and research-users.”

Given that the research shows that the involvement of practitioners in research actually improves care and outcomes for our  clients and patients, I would say that it is past time to bring down barriers to researcher-SLP collaboration and bring research right into the clinical setting.

Maternal Responsiveness to Babbling

Over the course of my career the most exciting change in speech-language pathology practice has been the realization that we can have an impact on speech and language development by working with the youngest patients, intervening even before the child “starts to talk”. Our effectiveness with these young patients is dependent upon the growing body of research on the developmental processes that underlie speech development during the first year of life. Now that we know that the emergence of babbling is a learned behavior, influenced by auditory and social inputs, this kind of research has mushroomed although our knowledge remains constrained because these studies are hugely expensive, technically difficult and time consuming to conduct. Therefore I was very excited to see a new paper on the topic in JSLHR this month:

Fagan, M. K., & Doveikis, K. N. (2017). Ordinary Interactions Challenge Proposals That Maternal Verbal Responses Shape Infant Vocal Development. Journal of Speech, Language, and Hearing Research, 60(10), 2819-2827. doi:10.1044/2017_JSLHR-S-16-0005

The purpose of this paper was to examine the hypothesis that maternal responses to infant vocalizations are a primary cause of the age related change in the maturity of infant speech during the period 4 through 10 months of age. This time period encompasses three stages of infant vocal development: (1) expansion stage, that is producing vowels and a broad variety of vocalizations that are not speech like but nonetheless exercise vocal parameters such as pitch, resonance and vocal tract closures; (2) canonical babbling stage, that is producing speech like CV syllables, singly or in repetitive strings; and, (3) integrative stage, that is producing a mix of babbling and meaningful words. In the laboratory, contingent verbal responses from adults increase the production rate of mature syllables by infants. Fagan and Doveikis asked whether this shaping mechanism, demonstrated in the laboratory, explains the course of infant speech development in natural interactions in real world settings. They coded 5 and a quarter hours of natural interactions recorded between mothers and infants in the home environment from 35 dyads in a cross-sectional study. Their analysis focuses on maternal behaviors in the 3 second interval following an infant vocalization, defined as a speech-like vowel or syllable type utterance. They were specifically interested to know whether maternal vocalizations in this interval would be responsive (prompt, contingent, relevant to the infant’s vocal behavior, e.g., affirmations, questions, imitations) or nonresponsive (prompt but not meaningfully related to the infant’s vocal behavior, e.g., activity comment, unrelated comment, redirect). This is a summary of their findings:

  • Mothers vocalized 3 times more frequently than infants.
  • One quarter of maternal vocalizations fell within the 3 sec interval after an infant vocalization.
  • About 40% of the prompt maternal vocalizations were responsive and the remainder were nonresponsive, according to their definitions derived from Bornstein et al., 2008).
  • Within the category of responsive maternal vocalizations, the most common were questions and affirmations.
  • A maternal vocalization of some kind occurred promptly after 85% of all infant utterances.
  • Imitations of the infant utterance (also in the responsive category) occurred after approximately 11% of infant utterances (my estimate from their data).
  • Mothers responded preferentially to speech-like vocalizations but not differentially to CV syllables versus vowel-only syllables. In other words, it did not appear that maternal reinforcement or shaping of mature syllables could account for the emergence and increase in this behavior with infant age.

One reason I like this paper so much is that some of the results accord with data that we are collecting in my lab in a project coordinated by my doctoral student Pegah Athari who is showing great skill and patience, having worked her way through through 10 hours of recordings from 5 infants in a longitudinal study (3 months of recording from each infant but covering ages 6 through 14 months overall). The study is designed to explore mimicry specifically as a responsive utterance that may be particularly powerful (mimicry involves full or partial imitation of the preceding utterance). We want to be able to predict when mimicry will occur and to understand its function. In our study we examine the 2 second intervals that precede and follow each infant utterance. Another important difference is that we record the interactions in the lab but there are no experimental procedures, we arrange the setting and materials to support interactions that are as naturalistic as possible. These are some of our findings:

  • Mothers produced 1.6 times as many utterances as their infants.
  • Mothers said something after the vast majority of the infant’s vocalizations just as observed by Fagan and Doveikis.
  • Instances in which one member of the dyad produced an utterance that is similar to the other were rare, but twice as common in the direction of mother mimicking the infant (10%), compared to the baby mimicking the mother (5%).
  • Infant mimicry of the mother is significantly (but not completely) contingent on the mother modeling one of the infant’s preferred sounds in her utterance (mean contingency coefficient = .34).
  • Maternal mimicry is significantly (but not completely) contingent on perceived meaningfulness of the child’s vocalization (mean contingency coefficient = .35). In other words, it seems that the mother is not specifically responding to the phonetic character of her infant’s speech output; rather, she makes a deliberate attempt to teach meaningful communication throughout early development.
  • The number of utterances that the mother perceives to be meaning increase with the infant’s age although this is not a hard and fast rule because regressions occur when the infant is ill and the canonical babbling ratio declines. Mothers will also respond to nonspeechlike utterances in the precanonical stage as being meaningful (animal noises, kissing and so forth).

We want to replicate our findings with another 5 infants before we try to publish our data but I feel confident that our conclusions will be subtly different from Fagan and Doveikis’ despite general agreement with their suggestion that self-motivation factors and access to auditory feedback of their own vocal output plays a primary role in infant vocal development. I think that maternal behavior may yet prove to have an important function however. It is necessary to think about learning mechanisms in which low frequency random inputs are actually helpful. I have talked about this before on this blog in a post about the difference between exploration and exploitation in learning. Exploration is a phase during which trial and error actions help to define the boundaries of the effective action space and permit discovery of actions that are most rewarding. Without exploration one might settle on a small repertoire of actions that are moderately rewarding and never discover others that will be needed as your problems become more complex. Exploitation is the phase during which you use the actions that you have learned to accomplish increasingly complex goals.

The basic idea behind the exploration-exploitation paradox is that long term learning is supported by using an exploration strategy early in the learning process. Specifically, many studies have shown that more variable responding early in learning is associated with easier learning of difficult skills later in the learning process. For early vocal learning, the expansion stage corresponds to this principle nicely: the infant produces a broad variety of vocalizations—squeals, growls, yells, raspberries, vowels, quasiresonants, fully resonant vowels and combinations called marginal babbles. These varied productions lay the foundations for the production of speech like syllables during the coming canonical babbling stage. Learning theorists have demonstrated that environmental inputs can support this kind of free exploration. Specifically, a high reinforcement rate will promote a high response rate but it is important to reinforce variable responses early in the learning process.

In the context of mother-infant interactions, it may be that mothers are reinforcing many different kinds of infant vocalizations in the early stages because they are trying to teach words but the infant is not really capable of producing real words and she has to work with what she hears. She does do something after almost every infant utterance however so she encourages many different practice trials on the part of the infant. It is also possible (although not completely proven) that imitative responses on the part of the mother are particularly reinforcing to the infant. In the short excerpt of a “conversation” between a mum and her 11 month old infant shown here, it can be seen that she responds to every one of the infant’s utterances, encouraging a number of variable responses, specifically mimicking those that are most closely aligned with her intentions.

IDV11E03A EXCERPT

It is likely that when alone in the crib, the infant’s vocalizations will be more repetitive, permitting more specific practice of preferred phonetic forms such as “da” (infants are known to babble more when alone than in dyadic interactions, especially when scientists feed back their vocalizations over loud speakers). The thing is, the infant’s goals are not aligned with the mothers. In my view, the most likely explanation for infant vocal learning is self-supervised learning. The infant is motivated to produce specific utterances and finds achievement of those utterances to be intrinsically motivating. What kind of utterances does the infant want to produce? Computer models of this process have settled on two factors: salience and learning progress. That is, the infant enjoys producing sounds that are interesting and that are not yet mastered. The mother’s goals are completely different (teach real words) but her behaviors in this regard serve the infant’s goals nonetheless by: (1) supporting perceptual learning of targets that correspond to the ambient language; (2) encouraging sound play/practice by responding to the infant’s attempts with a variety of socially positive behaviors; (3) reinforcing variable productions by modeling a variety of forms and accepting a variety of attempts as approximations of meaningful utterances when possible; and (4) increasing the salience of speech-like utterances through mimicry of these rare utterances. The misalignment of the infant’s and the mother’s goals is helpful to the process because if the mother were trying to teach the infant specific phonetic forms (CV syllables for example), the exploration process might be curtailed prematurely and self-motivation mechanisms might be hampered.

What are the clinical implications of these observations? I am not sure yet. I need a lot more data to feel more confident that I can predict maternal behavior in relation to infant behavior. But in the meantime it strikes me that SLPs engage in a number of parent teaching practices that assume that responsiveness by the parent is a “good thing”. However, it is not certain that parents typically respond to their infant’s vocalizations in quite the ways that we expect. In the mean time, procedures to encourage vocal play are a valuable part of your tool box, as described in Chapter 10 of our book:

Rvachew, S., & Brosseau-Lapre, F. (2018). Developmental Phonological Disorders: Foundations of Clinical Practice (Second ed.). San Diego, CA: Plural Publishing, Inc.

 

Testing Client Response to Alternative Speech Therapies

Buchwald et al published one of the many interesting papers in a recent special issue on motor speech disorders in the Journal of Speech, Language and Hearing Research. In their paper they outline a common approach to speech production, one that is illustrated and discussed in some detail in Chapters 3 and 7 of our book, Developmental Phonological Disorders: Foundations of Clinical Practice. Buchwald et al. apply it in the context of Acquired Apraxia of Speech however. They distinguish between patients who produce speech errors subsequent to left hemisphere cardiovascular accident as a consequence of motor planning difficulties versus phonological planning difficulties. Specifically, in their study there are four such patients, two in each subgroup. Acoustic analysis was used to determine whether their cluster errors arose during phonological planning or in the next stage of speech production – during motor planning. The analysis involves comparing the durations of segments in triads of words like this: /skæmp/ → [skæmp], /skæmp/ → [skæm], /skæm/ → [skæm]. The basic idea is that if segments such as [k] in /sk/ → [k] or [m] in /mp/ → [m] are produced as they would be in a singleton context, then the errors arise during phonological planning; alternatively, if they are produced as they would be in the cluster context, then the deletion errors arise during motor planning. This leads the authors to hypothesize that patients with these different error types would respond differently to intervention. So they treated all four patients with the same treatment, described as “repetition based speech motor learning practice”. Consistent with their hypothesis, the two patients with motor planning errors responded to this treatment and the two with phonological planning errors did not as shown in the table of pre- versus post-treatment results.

Buchwald et al results corrected table

However, as the authors point out, a significant limitation of this study is that the design is not experimental. Having failed to establish experimental control either within or across speakers it is difficult to draw conclusions.

I find the paper to be of interest on two accounts nonetheless. Firstly, their hypothesis is exactly the same hypothesis that Tanya Matthews and I posed for children who present with phonological versus motor planning deficits. Secondly, their hypothesis is fully compatible with the application of a single subject randomization design. Therefore it provides me with an opportunity to follow through with my promise from the previous blog, to demonstrate how to set up this design for clinical research.

For her dissertation research, Tanya identified 11 children with severe speech disorders and inconsistent speech sound errors who completed our full experimental paradigm. These children were diagnosed with either a phonological planning disorder or a motor planning disorder using the Syllable Repetition Task and other assessments as described in our recently CJSLPA paper, available open access here. Using those procedures, we found that 6 had a motor planning deficit and 5 had a phonological planning deficit.

Then we hypothesized that the children with motor planning disorders would respond to a treatment that targeted speech motor control: much like Brumbach et al., it included repetition practice according to the principles of motor practice during the practice parts of the session but during prepractice, children were taught to identify the target words and to identify mispronunciations of the target words so that they would be better able to integrate feedback and self-correct during repetition practice. Notice that direct and delayed imitation are important procedures in this approach. We called this the auditory-motor integration (AMI approach).

For children with Phonological Planning disorders we hypothesized that they would respond to a treatment similar to the principles suggested by Dodd et al (i.e., see core vocabulary approach). Specifically the children are taught to segment the target words into phonemes, associating the phonemes with visual cues. Then we taught the children to chain the phonemes back together into a single word. Finally, during the practice component of each session, we encouraged the children to produce the words using the visual cues when necessary. An important component of this approach is that auditory-visual models are not provided prior to the child’s production attempt-the child is forced to construct the phonological plan independently. We called this the phonological memory & planning (PMP) approach.

We also had a control condition that consisted solely of repetition practice (CON condition).

The big difference between our work and Brumbach et al. is that we tested our hypothesis using a single subject block randomization design, as described in our recent tutorial in Journal of Communication Disorders. The design was set up so that each of the 11 children experienced all three treatments. We chose 3 treatment targets for each child, randomly assigned the targets to each of the three treatments, and then randomly assigned the treatments to each of three sessions, scheduled to occur on different days of the week, 3 sessions per week for 6 weeks. You can see from the table below that each week counts as one block, so there are 6 blocks of 3 sessions for 18 sessions in total. The randomization scheme was generated blindly and independently using computer software for each child. The diagram below shows the treatment schedule for one of the children with a motor planning disorder.

Block Randomization TASC02 DPD Blog

This design allowed us to compare response to the three treatments within each child using a randomization test. For this child, the randomization test revealed a highly significant difference in favour of the AMI treatment as compared to the PMP treatment, as hypothesized for children with motor planning deficits. I don’t want to scoop Tanya’s thesis because she will finish it soon, before the end of 2017 I’m sure, but the long and the short of it is that we have a very clear results in favour of our hypothesis using this fully experimental design and the statistics that are licensed by it. I hope you will check out our tutorial on the application of this design: we show how flexible and versatile this design can be for addressing many different questions about speech-language practice. There is much exciting work being done in the area of speech motor control and this is a design that gives researchers and clinicians an opportunity to obtain interpretable results with small samples of children with rare or idiosyncratic profiles.

Reading

Buchwald, A., & Miozzo, M. (2012). Phonological and Motor Errors in Individuals With Acquired Sound Production Impairment. Journal of Speech, Language, and Hearing Research, 55(5), S1573-S1586. doi:10.1044/1092-4388(2012/11-0200)

Rvachew, S., & Matthews, T. (2017). Using the Syllable Repetition Task to Reveal Underlying Speech Processes in Childhood Apraxia of Speech: A Tutorial. Canadian Journal of Speech-Language Pathology and Audiology, 41(1), 106-126.

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders, 67, 1-13. doi:https://doi.org/10.1016/j.jcomdis.2017.04.003

 

Single Subject Randomization Design For Clinical Research

Ebbels tweet Intervention ResearchDuring the week April 23 – 29, 2017 Susan Ebbels is curated WeSpeechies on the topic Carrying Out Intervention Research in SLP/SLT Practice. Susan kicked off the week with a link to her excellent paper that discusses the strengths and limitations of various procedures for conducting intervention research in the clinical setting. As we would expect, a parallel groups randomized control design was deemed to provide the best level of experimental control. Many ways of studying treatment related change within individual clients, with increasing degrees of control were also discussed. However, all of the ‘within participant’ methods described were vulnerable to confounding by threats to internal validity such history, selection, practice, fatigue, maturation or placebo effects to varying degrees.

One design was missing from the list because it is only just now appearing in the speech-language pathology literature, specifically the Single Subject Randomization Design. The design (actually a group of designs in which treatment sessions are randomly allocated to treatment conditions) provides the superior internal validity of the parallel groups randomized control trial by controlling for extraneous confounds through randomization. As an added benefit the results of a single subject randomization design can be submitted to a statistical analysis, so that clear conclusions can be drawn about the efficacy of the experimental intervention. At the same time, the design can be feasibly implemented in the clinical setting and is perfect for answering the kinds of questions that come up in daily clinical practice. For example, randomized control trials have shown than speech perception training is an effective adjunct to speech articulation therapy on average when applied to groups of children but you may want to know if it is a necessary addition to your therapy program for a speciRomeiser Logan Levels of Evidence SCRfic child.

Furthermore,  randomized single subject experiments are now acceptable as a high level of research evidence by the Oxford Centre for Evidence Based Medicine. An evidence hierarchy has been created for rating single subject trials, putting the randomized single subject experiments at the top of the evidence hierarchy as shown in the following table, taken from Romeiser Logan et al. 2008.

 

Tanya Matthews and I have written a tutorial showing exactly how to implement and interpret two versions of the Single Subject Randomization Design, a phase design and an alternation design. The accepted manuscript is available but behind a paywall at the Journal of Communication Disorders. In another post I will provide a mini-tutorial showing how the alternation design could be used to answer a clinical question about a single client.

Further Reading

Ebbels, Susan H. 2017. ‘Intervention research: Appraising study designs, interpreting findings and creating research in clinical practice’, International Journal of Speech-Language Pathology: 1-14.

Kratochwill, Thomas R., and Joel R. Levin. 2010. ‘Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue’, Psychological Methods, 15: 124-44.

Romeiser Logan, L., R. Hickman, R.R. Harris, S.R. Harris, and C. Heriza. 2008. ‘Single-subject research design: recommendations for levels of evidence and quality rating’, Developmental Medicine and Child Neuroloogy, 50: 99-103.

Rvachew, S. 1988. ‘Application of single subject randomization designs to communicative disorders research’, Human Communication Canada (now Canadian Journal of Speech-Language Pathology and Audiology), 12: 7-13. [open access]

Rvachew, S. 1994. ‘Speech perception training can facilitate sound production learning.’, Journal of Speech and Hearing Research, 37: 347-57.

Rvachew, Susan, and Tanya Matthews. in press. ‘Demonstrating Treatment Efficacy using the Single Subject Randomization Design: A Tutorial and Demonstration’, Journal of Communication Disorders.

 

How to choose a control condition for speech therapy research

This post is an addendum to a previous post “What is a control group?”, inspired by a recently published new paper (“Control conditions for randomized trials of behavioral interventions in psychiatry: a decision framework” Early View, Lancet Psychiatry, March 2017). Following a brief review of the literature on effect sizes associated with different types of control conditions, a framework for choosing an appropriate control condition in behavioral trials is offered. The types of control conditions discussed are as follows:

  • Active comparator
  • Minimal treatment control
  • Nonspecific factors control
  • No-treatment control
  • Patient choice
  • Pill placebo
  • Specific factors component control
  • Treatment as usual
  • Waitlist control

The considerations for choosing one of these control conditions for testing a behavioral intervention are (1) participant risk; (2) trial phase; and (3) available resources. With respect to participant risk, more active interventions should be provided as the control condition when the risk of withholding treatment (especially when known effective treatments are available) is high. Therefore, when making this decision characteristics of the participant population and characteristics of the available treatments will play a role in the decision making process.

Regarding trial phase, early stage exploratory trials should be concerned with the risk of Type II error; in other words the researcher will want to maximize the chances of finding a benefit of a potentially helpful new intervention. Therefore, a waitlist control group might be appropriate at this stage of the research process given that waitlist controls are associated with large effect sizes in behavioral trials. In the later stages of the research program, the researcher should strive to minimize Type I error; in other words it is important to guard against concluding that an ineffective treatment is helpful. In this case an active comparator would be a logical choice although the sample size would need to be large given that the effect size is likely to be small in this case.

Finally, the resources available to the researchers will influence the choice of control condition. For example, in a late stage trial an active comparator provided by trained and monitored study personnel would be the best choice in most circumstances; however, in this case the provision of the control may be at least as expensive as the provision of the experimental treatment. When sufficient resources are lacking, the cost effective alternative might be to ask the usual community provider to administer treatment as usual although every effort should be made to describe the control intervention in detail.

A very nice graphic is provided (Figure 2) to illustrate the decision framework and can be applied to speech therapy trials. There are a number of interventions that have been in use or are emerging in speech therapy practice with a minimal evidence base. We can consider the choice of appropriate control condition for the assessment of these interventions.

Ultrasound intervention for school aged children with residual speech errors has been examined in quite a number of single subject studies but is now overdue for a randomized control trial. Given that the exploratory work has been completed in single subject trials I would say that we could proceed to a phase 3 RCT. The risk to the participant population is more difficult to conceptualize. You could say that it is low because these children are not at particular risk for poor school outcomes or other harmful sequels of non-intervention and the likelihood of a good speech outcome will not change much after the age of nine. The cost of providing an active control will be high because these children are often low priority for intervention in the school setting. Therefore, according to Figure 2, a no-treatment control would be appropriate when you make this assumption. On the other hand, you could argue that the participant risk of NOT improving is very high-all the evidence demonstrates that the residual errors do not improve without treatment after this age. If you consider the participant risk to be higher, especially considering community participation and psychosocial factors, then the appropriate control condition would be something more vigorous: patient choice, an active comparator, a nonspecific factors component control or a specific factors component control. Given the relatively early days of this research, small trials utilizing these control conditions in order might be advisable.

Metaphon as a treatment for four-year-olds with severe phonological delay and associated difficulties with phonological processing has not, to my knowledge, been tested with a large scale RCT. The population would be high risk by definition due to the likelihood of experiencing delays in the acquisition of literacy skills if the speech delay is not resolved prior to school entry. Effective treatment options are known to exist. Therefore, the appropriate control condition would be an active comparator-in other words, another treatment that is known to be effective with this population. Another option would be a specific factors component control that examines the efficacy of specific components of the Metaphon approach. Therefore, the meaningful minimal pairs procedure could be compared directly to the full metaphon approach with speech and phonological processing skills as the outcome variables. Similar trials have been conducted by Anne Hesketh and in my own lab (although not involving Metaphon specifically).

PROMPT has still not been tested in good quality single subject or parallel groups research. If a Phase 2 trial were planned for three-year-olds with suspected apraxia of speech, treatment as usual would be the appropriate control condition according to Figure 2. The speech condition is too severe to ethically withhold treatment and the research program is not advanced enough for a specific factors components control although this would be the next step.

Finally, an RCT of the effectiveness of Speech Buddies to stimulate /s/ in 3-year-olds with speech delay could be implemented. In this case, the participant group would low risk due to the likelihood of spontaneous resolution of the speech delay. Given a phase 2 trial, either no treatment or waitlist control could be implemented.

The authors of this framework conclude by recommending that researchers justify their choice of control condition in every trial protocol. They further recommend that waitlist controls are only acceptable when it is the only ethical choice and state that “no behavioral treatment should be included in treatment guidelines if it is only supported by trials using a waitlist control group or meta-analytic evidence driven by such trials.” To me, this is eminently sensible advice for speech and language research as well.

And this I believe concludes my trilogy of posts on the control group!

Further Reading

What is a control group? Developmental Phonological Disorders blog post, February 5, 2017

Using effect sizes to choose a speech therapy approach, Developmental Phonological Disorders blog post, January 31, 2017

Gold, S. M., Enck, P., Hasselmann, H., Friede, T., Hegerl, U., Mohr, D. C., & Otte, C. Control conditions for randomised trials of behavioural interventions in psychiatry: a decision framework. The Lancet Psychiatry. doi:10.1016/S2215-0366(17)30153-0

Hesketh, A., Dima, E., & Nelson, V. (2007). Teaching phoneme awareness to pre-literate children with speech disorder: a randomized controlled trial. International Journal of Language and Communication Disorders, 42(3), 251-271.

Rvachew, S., & Brosseau-Lapré, F. (2015). A Randomized Trial of 12-Week Interventions for the Treatment of Developmental Phonological Disorder in Francophone Children. American Journal of Speech-Language Pathology, 24(4), 637-658. doi:10.1044/2015_AJSLP-14-0056

Who to refer for speech therapy?

Morgan et al. have recently published a very important paper: Who to refer for speech therapy at 4 years of age versus who to “watch and wait”? This longitudinal study reports speech outcomes at age 7 years for children who received GFTA and DEAP assessments at age 4. The children were recruited from an Australian community cohort study (the Early Language in Victoria study) that recruited almost 2000 infants between 7 and 10 months of age for long-term follow-up.

The data reported in Morgan et al. is interesting by itself, as follows:

  1. Eleven percent of 1496 children tested at age 4 had speech errors qualifying the child for repeat assessment at age 7 years (the 11% finding interested me because we settled on 11% as the best estimate for prevalence of developmental phonological disorders at school entry in the review that we reported in DPD).
  2. At age 7 years, approximately 40% of the children who had speech errors at age 4 still had speech errors.
  3. Children at age 4 who had speech delay (typical speech errors; 60% of the sample) were most likely to show resolution of the speech problem. Specifically 70% of these children were classed as “resolved” and 30% as “persistent” at age 7 years.
  4. Children at age 4 who had a speech disorder (atypical speech errors; 40% of the sample) were less likely to show resolution of the speech problem. Specifically, 40% of these children were classed as resolved and 60% as persistent.
  5. No other variables in the study predicted speech outcome but neither did these variables predict “delay” versus “disorder” group membership (sex, SES, family history, language skills, nonverbal IQ).
  6. Apparently, reliable data on receipt of SLP services and outcomes was not available but there was some suggestion that children with “speech delay” who received therapy were more likely to resolve than children with “speech disorder” who received therapy.

Therefore, in this paper that is published in a journal for pediatricians the conclusion was “our data call into question whether the “watch and wait” approach should be universally applied to all preschool children. Rather these data suggest an efficient model may guide children with disorder at age 4 years to be fast-tracked for speech therapy…”.

The data provided in this paper are exceptionally important for SLPs and the development of service delivery guidelines but I am a little uncomfortable with the conclusions that were drawn. The first assumption I suppose is that doctors are not referring any 4 year olds so if we could get them to refer some that would help. The second assumption seems to be that the reason we refer 4 year olds with speech errors to speech therapy is to eliminate the speech errors. This is only partially true. More importantly, we have the goal of preventing the sequels that are known to be associated with delayed/disordered speech at school entry. These are mostly in the area of literacy but also in the psychosocial domains. It is clear that children who show early speech delays are at-risk for persistent literacy difficulties regardless of whether the speech problem resolves before at age 7. The important age cut-off is resolution of the speech problem before school entry. The risk for literacy difficulties is predicted by direct measures of phonological processing and not by an examination of speech error types. Certain speech error types are associated with phonological processing difficulties and a heightened risk for literacy problems but they are poor predictors of this risk. I will come back to this point with some case histories below.

The second problem that I have with the conclusions is that they are delivered to pediatricians who are in no way qualified to differentiate typical from atypical speech errors. In fact, SLPs themselves find this hard enough to do reliably. The difference between speech delay and speech disorder is both qualitative and quantitative– in other words the dividing line between delay and disorder is a very large grey area. Family doctors should not attempt to make this differentiation. In the paper, Morgan et al. do point out that the real issue is intelligibility. When the child is unintelligible past the age of 3 or 4, the physician should refer to a SLP who should determine the best course of action. In our review of the literature for SAC, Susan Raffat and I proposed wait times recommendations for children who are “ producing so many speech sound errors that speech intelligibility falls below expectations given the speaker’s age and experience with the language being spoken.” All children in the 4 to 6 year age group were considered by us to be high priority for a rapid assessment by an SLP. Any child with speech intelligibility problems who is expected to start school in the year of referral and/or presenting with phonological processing difficulties would be considered a high priority for immediate intervention.

Now to some case studies that I draw directly from our DPD text (Rvachew and Brosseau-Lapré), showing only portions here to make a point about speech delay, speech disorder and literacy outcomes. The first example is a clear case of speech disorder (data shown from age 7;4 assessment, right).

Complete information is provided in DPD, showing that two years earlier this child also presented with a severe speech disorder and severely delayed phonolCase Study DPD 8-4.JPGogical processing skills. His error types were atypical and inconsistent throughout the longitudinal follow-up period, despite much speech therapy targeting motor aspects of his speech. At age 7 his nonword reading skills were slightly below normal limits and 14 points below his receptive vocabulary scores. We can predict that he will struggle with the acquisition of reading and spelling in addition to continuing to have highly unintelligible speech for some time. Interestingly, his mother reported that his speech accuracy finally started to improve after a systematic phonics program was instituted to help him with his reading in second grade. The outcomes reported at age 7 will not surprise anyone.

The interesting findings for me were associated with the children with milder speech delay. The second child shown here (age 6;9 assessment, left) had a mild speech delay at age 4 but a severe delay in phonological processing skills that was, fortunately for him, treated appropriately by the SLP program in the local children’s hospital. At age 7 his speech delay is more-or-less resolved. His nonword reading skills are borderline normal but there is a 28 point gap between his nonword reading score and his receptive vocabulary scoreCase Study DPD 8-1. I think that this child is essentially dyslexic. He is coping well because he is exceptionally bright with excellent inputs from his family and the community service providers. That does not mean that the outcome would have been as good without those services however. The 30% of kids with speech delay who don’t resolve by themselves? Someone has to watch out for those kids, especially since they are numerically the larger group of kids. As an SLP, I make it my job to worry about them.

 

What is a control group?

I have a feeling that my blog might become less popular in the next little while because you may notice an emerging theme on research design and away from speech therapy procedures specifically! But it is important to know how to identify evidence based procedures and to do that requires knowledge of research design and it has come to my attention, as part of the process of publishing two randomized control trials (RCTs) this past year, that there are a lot of misperceptions about what an RCT is in the SLP and education communities, among both clinicians and researchers. Therefore, I am happy to draw your attention to this terrific blog by Edzard Ernst, and in particular to an especially useful post “How to differentiate good from bad research”. The writer points out that a proper treatment of this topic “must inevitably have the size of a book” because each of the indicators that he provides “is far too short to make real sense.” So I have taken it upon myself in this blog to expand upon one of his indicators of good research – one that I know causes some confusion, specifically:

  • Use of a placebo in the control group where possible.

Recently the reviewers (and editor) of one of my studies was convinced that my design was not an RCT because the children in both groups received an intervention. In the absence of a “no-treatment control” they said, the study could not be an RCT! I was mystified about the source of this strange idea until I read Ernst’s blog and realized that many people, recalling their research courses from university, must be mistaking “placebo control” for “no-treatment control.” However, a placebo control condition is not at all like the absence of treatment. Consider the classic example of a placebo control: in a drug trial, the patients randomized to the treatment arm will visit the nurse who hands him or her a white paper cup holding 2 pink pills containing active ingredient X and some other ingredients that do not impact the patient’s disease, i.e., inactive ingredients; the patients randomized to the control arm will also visit the nurse who hands him or her a white paper cup holding 2 pink pills containing only the inactive ingredients. In other words, the experiment is designed so that all patients are “treated” exactly the same except that only patients randomized to treatment receive (unknowingly) the active ingredient. Therefore, all changes in patient behavior that are due to those aspects of the treatment that are not the active treatment (visiting the nice nurse, expecting the pills to make a difference etc.) are equalized across arms of the study. These are called the “common factors” or “nonspecific factors”.

In the case of a behavioral treatment it is important to equalize the common factors across all arms of the study. Therefore in my own studies I deliberately avoid “no treatment” controls. In my very first RCT (Rvachew, 1994) for example the treatment conditions in the two arms of the study were as follows;

  • Experimental: 10 minutes of listening to sheet vs Xsheet recordings and judging correct vs incorrect “sheet” items (active ingredient) in a computer game format followed by 20 minutes of traditional “sh” articulation therapy, provided by a person blind to the computer game target.
  • Control: 10 minutes of listening to Pete vs meat recordings and judging correct vs incorrect “Pete” items in a computer game format followed by 20 minutes of traditional “sh” articulation therapy, provided by a person blind to the computer game target.

It can be seen that the study was designed to ensure that all participants experienced exactly the same treatment except for the active ingredient that was reserved for children who were randomly assigned to the experimental treatment arm, specifically exposure to the experience of listening to and making perceptual judgments about a variety of correct and incorrect versions of words beginning with “sh” or distorted versions of “sh”-the sound that the children misarticulated. Subsequently I have conducted all my randomized control studies in a similar manner. But, as I said earlier, I run across readers who vociferously assert that the studies are not RCTs because an RCT requires a “no treatment” control. In fact, a “no treatment” control is a very poor control indeed as argued in this blog that explains why the frequently used “wait list control group” is inappropriate. For example, a recent trial on the treatment of tinnitus claimed that a wait list control had merit because “While this comparison condition does not control for all potential placebo effects (e.g., positive expectation, therapeutic contact, the desire to please therapists), the wait-list control does account for the natural passing of time and spontaneous remission.” In fact, it is impossible to control for common factors when using a wait list control and it is unlikely that patients are actually “just waiting” when you randomize them to the “wait list control” condition; therefore Hesser et al.’s defense of the wait list control is  optimistic although their effort to establish how much change you get in this condition is worthwhile.

We had experience with a “wait list” comparison condition in a recent trial (Rvachew & Brosseau-Lapré, 2015). Most of the children were randomly assigned to one of four different treatment conditions, matched on all factors except the specific active ingredients of interest. However, we also had a nonexperimental wait list comparison group* to estimate change for children outside of the trial. We found that parents were savvy about maximizing the treatment that their children could receive in any given year. Our trial lasted six weeks, the public health system entitled them to six weeks of treatment and their private insurance entitled them to six to 12 weeks of therapy depending on the plan. Parents would agree to enrolled their child in the trial with randomization to a treatment arm if their child was waiting for the public service, OR they would agree to be assessed in the “wait list” arm if their child was currently enrolled in the public service. They would use their private insurance when all other options had been exhausted. Therefore the children in the “wait list” arm were actually being treated. Interestingly, we found that the parents expected their children to obtain better results from the public service because it was provided by a “real” SLP rather than the student SLPs who provided our experimental treatments even though the public service was considerably less intense! (As an aside, we were not surprised to find that the reverse was true). Similarly, as I have mentioned in previous blogs, Yoder et al. (2005) found that the children in their “no treatment” control accessed more treatment from other sources than did the children in their treatment arm. And parents randomized to the “watchful waiting” arm of the Glogowska et al. (2000) trial sometimes dropped out because parents will do what they must to meet their child’s needs.

In closing, a randomized control trial is simply a study in which participants are randomly assigned to an experimental treatment and a control condition (even in a cross-over design, in which all participants experience all conditions, as in Rvachew et al., in press). The nature of the control should be determined after careful thought about the factors that you are attempting to control, which can be many – placebo, Hawthorne, fatigue, practice, history, maturation and so on. These will vary from trial to trial obviously. Placebo control does not mean “no treatment” but rather, a treatment that excludes everything except the “active ingredient” that is the subject of your trial. As an SLP, when you are reading about studies that test the efficacy of a treatment, you need to pay attention to what happens to the control group as well as the treatment group. The trick is to think in every case – what is the active ingredient that explains the effect seen in the treatment group? what else might account for the effects seen in the treatment arm of this study? If I implement this treatment in my own practice, how likely am I to get a better result compared to the treatment that my caseload is currently receiving?

* A colleague sent me a paper (Mercer et al., 2007) in which a large number of researchers advocating for the acceptance of a broader array of research designs in order to focus more attention on external validity and translational research, got together to discuss the merits of various designs. During the symposium it arose that there was disagreement about the use of the terms “control” and “comparison” group. I use the terms in accordance with a minority of their attendees, as follows: control group means that the participants were randomly assigned to a group that did not experience the “active ingredient” of the experimental treatment; comparison group means that the participants were not randomly assigned to the group that did not experience the experimental intervention, a group that may or may not have received a treatment. This definition was ultimately not used by the attendees, I don’t know why – somehow they decided on a different definition that didn’t make any sense at all, I invite you to consult p. 141 and see if you can figure it out!

References

Glogowska, M., Roulstone, S., Enderby, P., & Peters, T. (2000). Randomised controlled trial of community based speech and language therapy in preschool children. British Medical Journal, 321, 923-928.

Hesser, H., Weise, C., Rief, W., & Andersson, G. (2011). The effect of waiting: A meta-analysis of wait-list control groups in trials for tinnitus distress. Journal of Psychosomatic Research, 70(4), 378-384. doi:http://dx.doi.org/10.1016/j.jpsychores.2010.12.006

Mercer, S. L., DeVinney, B. J., Fine, L. J., Green, L. W., & Dougherty, D. (2007). Study Designs for Effectiveness and Translation Research: Identifying Trade-offs. American Journal of Preventive Medicine, 33(2), 139-154.e132. doi:http://dx.doi.org/10.1016/j.amepre.2007.04.005

Rvachew, S. (1994). Speech perception training can facilitate sound production learning. Journal of Speech and Hearing Research, 37, 347-357.

Rvachew, S., & Brosseau-Lapré, F. (2015). A randomized trial of twelve week interventions for the treatment of developmental phonological disorder in francophone children. American Journal of Speech-Language Pathology, 24, 637-658. doi:10.1044/2015_AJSLP-14-0056

Rvachew, S., Rees, K., Carolan, E., & Nadig, A. (in press). Improving emergent literacy with school-based shared reading: Paper versus ebooks. International Journal of Child-Computer Interaction. doi:http://dx.doi.org/10.1016/j.ijcci.2017.01.002

Yoder, P. J., Camarata, S., & Gardner, E. (2005). Treatment effects on speech intelligibility and length of utterance in children with specific language and intelligibility impairments. Journal of Early Intervention, 28(1), 34-49.