How effective is phonology treatment?

Previously I asked whether it made sense to calculate effect sizes for phonology therapy at the within subject level. In other words, from the clinical point of view, do we really want to know whether the child’s rate of change is bigger during treatment than it was when the child was not being treated? Or, do we want to know if the child’s rate of change is bigger than the average amount of change observed among groups of children who get treated? If children who get treated typically change quite a bit and your client is not changing much at all, that might indicate a course correction (and note please, not a treatment rest!). From this perspective, group level effect sizes might be useful so I am providing raw and standardized effect sizes here from three of my past studies with a discussion to follow.

Rvachew, S., & Nowak, M. (2001). The effect of target selection strategy on sound production learning. Journal of Speech, Language, and Hearing Research, 44, 610-623.

The first data set involves 48 four-year-old children who scored at the second percentile, on average, on the GFTA (and 61 percent consonants correct in conversation). They were randomly assigned to receive treatment for relatively early developing stimulable sound targets (ME group, n=24) or late developing unstimulable sound targets (LL group, n=24). Each received treatment for four sounds over 2 six-week blocks, during 12 30 to 40 minute treatment sessions. The treatment approach employed traditional articulation therapy procedures. The children did not receive homework or additional speech and language interventions during this 12 week period. Outcome measures included single word naming probes covering all consonants in 3 word positions and percent consonants correct (PCC) in conversation, with 12 to 14 weeks intervening between the pre- and the post-test assessments. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) which indicates the mean pre- to post-change in percent consonants corrects on probes and in conversation; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for three outcome measures: single word naming probe scores for unstimulable phonemes, probe scores for stimulable phonemes, and percent consonants correct (PCC) obtained from conversations recorded while the child looked at a wordless picture book with the assessor.Effect size blog figure 2.

Some initial conclusions can be drawn from this table. The effect sizes for change in probe scores are all large. However, the group that received treatment for stimulable sounds showed greater improvement for both treated stimulable sounds and untreated unstimulable sounds compared to the group that received treatment for unstimulable sounds. There was almost no change in PCC derived from the conversational samples overall. I can report that 10 children in the ME group and 6 children in the LL group achieved improvements of greater than 5 PCC points, judged to be a “minimally important change”  by Thomas-Stonell et al. (2013). Half the children achieved no change at all however in PCC (conversation).

Rvachew, S., Nowak, M., & Cloutier, G. (2004). Effect of phonemic perception training on the speech production and phonological awareness skills of children with expressive phonological delay. American Journal of Speech-Language Pathology, 13, 250-263.

The second data set involves 34 four-year-old children who scored at the second percentile, on average, on the GFTA (and approximately 60 percent consonants correct in conversation). All of the children received 16 hour-long speech therapy sessions, once-weekly. The treatment that they received was entirely determined by their SLP with regard to target selection and approach to intervention. Ten SLPs provided the interventions, 3 using the Hodson cycles approach, 1 a sensory motor approach and the remainder using a traditional articulation therapy approach. The RCT element of this study is that the children were randomly assigned to an extra treatment procedure that occurred during the final 15 minutes of each session, concealed from their SLP. Children in the control group (n=17) listened to ebooks and answered questions. Children randomly assigned to the PA group (n=17) played a computer game that targeted phonemic perception and phonological awareness covering 8 phonemes in word initial and then word final position. Although the intervention lasted 4 months, the interval between pre-treatment and post-treatment assessments was 6 months long. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) indicates the mean pre- to post-change in percent consonants correct; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for two outcome measures: percent consonants correct (PCC) obtained from conversations recorded while the child looked at a wordless picture book with the assessor; and PCC-difficult, derived from the same conversations but restricted to phonemes that were produced with less than 60% accuracy at intake-in other words, phonemes that were potential treatment targets, specifically /ŋ,k,ɡ,v,ʃ,ʧ,ʤ,θ,ð,s,z,l,ɹ/.

Effect size blog figure 3

The sobering finding here is that the control group effect size for potential treatment targets is the smallest, with half the group making no change and the other half making a small change. The effect size for PCC (all) in the control group is more satisfying in that it is better than the minimally important change (i.e., 8% > 5%); 13 children in this group achieved a change of more than 5 points and only 3 made no change at all. The effect sizes are large in the group that received the Speech Perception/PA intervention in addition to their regular SLP program with good results for PCC (all) and PCC-difficult. This table shows that the SLP’s choice of treatment procedures makes a difference to speech accuracy outcomes.

Rvachew, S., & Brosseau-Lapré, F. (2015). A randomized trial of twelve week interventions for the treatment of developmental phonological disorder in francophone children. American Journal of Speech-Language Pathology, 24, 637-658. doi:10.1044/2015_AJSLP-14-0056

The third data set involves data from 64 French-speaking four-year-olds who were randomly assigned to receive either an output oriented intervention (n = 30) or an input-oriented intervention (n = 34) for remediation of their speech sound disorder. Another 10 children who were not treated also provide effect size data here. The children obtained PCC scores of approximately 70% on the Test Francophone de Phonologie, indicating severe speech sound disorder (consonant accuracy is typically higher in French-speaking children, compared to English). The children received other interventions as well as described in the research report (home programs and group phonological awareness therapy) with the complete treatment program lasting 12 weeks. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) indicates the mean pre- to post-change in percent consonants correct; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for two outcome measures: percent consonants correct with glides excluded (PCC), obtained from the Test Francophone de Phonologie, a single word naming test; PCC-difficult, derived from the same test but restricted to phonemes that were produced with less than 60% accuracy at intake-specifically /ʃ,ʒ,l,ʁ/. An outcome measure restricted to phonemes that were absent from the inventory at intake is not possible for this group because French-speaking children with speech sound disorders have good phonetic repertoires for the most part as their speech errors tend to involve syllable structure (see Brosseau-Lapré and Rvachew, 2014).

Effectsize blog figure 4

There are two satisfying findings here: first, when we do not treat children with a speech sound disorder, they do not change, and when we do treat them, they do! Second, when children receive an appropriate suite of treatment elements, large changes in PCC can be observed even over an observation interval as short as 12 weeks.

Overall Conclusions

  1. In the introductory blog to this series, I pointed out that Thomas-Stonell and her colleagues had identified a PCC change of 5 points as a “minimally important change”. The data presented here suggests that this goal can be met for most children over a 3 to 6 months period when children are receiving an appropriate intervention. The only case where this minimum standard was not met on average was in Rvachew & Nowak (2001), a study in which a strictly traditional articulation therapy approach was implemented at low intensity with no homework component.
  2. The measure that we are calling PCC-difficult might be more sensitive and more ecologically valid for 3 and 6 month intervals. This is percent consonants correct, restricted to potential treatment targets, so those consonants that are produced with less than 60% accuracy at intake. These turn out to be mid- to late-developing frequently misarticulated phonemes, therefore /ŋ,k,ɡ,v,ʃ,ʧ,ʤ,θ,ð,s,z,l,ɹ/ in English and /ʃ,ʒ,l,ʁ/ in French for these samples of 4-year-old children with severe and moderate-to-severe primary speech sound disorders. My impression is that when providing an appropriate intervention an SLP should expect at least a 10% change in these phonemes whether assessed with a broad based single word naming probe or in conversation-in fact a 15% change is closer to the average. This does not mean that you should treat the most difficult sounds first! Look carefully at the effect size data from Rvachew and Nowak (2001): when we treated stimulable phonemes we observed a 15% improvement in difficult unstimulable sounds. You can always treat a variety of phonemes from different levels of the phonological hierarchy as described in a previous blog.
  3. Approximately 10% of 4-year-old children with severe and moderate-to-severe primary speech sound disorders do not improve at all over a 3 to 6 month period, given adequate speech therapy. If a child is not improving, the SLP and the parent should be aware that this is a rare event that requires special attention.
  4. In a previous blog I cited some research evidence for the conclusion that patients treated as part of research trials achieve better outcomes than patients treated in a usual care situation. There is some evidence for that in these data. The group in Rvachew, Nowak and Cloutier that received usual care obtained a lower effect size (d=0.45) in comparison to the group that received an extra experimental intervention (d=1.31). In practical terms this difference meant that the group that received the experimental intervention made four times more improvement in the production of difficult sounds than the control group that received usual care.
  5. The variation in effect sizes that is shown in these data indicate that SLP decisions about treatment procedures and service delivery options have implications for success in therapy. What are the characteristics of the interventions that led to relatively large changes in PCC or relatively large standardized effect sizes? (i) Comprehensiveness, that is the inclusion of intervention procedures that target more than one level of representation, e.g., procedures to improve articulation accuracy and speech perception skills and/or phonological awareness; and (ii) parent involvement, specifically the inclusion of a well-structured and supported home program.

If you see other messages in these data, or have observations from your own practice or research, please write to me in the comments.




Are effect sizes in research papers useful in SLP practice?

Effect size blog figure 1Effect sizes are now required in addition to statistical significance reporting in scientific reports. As discussed in a previous blog, effect sizes are useful for research purposes because they can be aggregated across studies to draw conclusions (i.e., as, in a meta-analysis). However, they are also intended to be useful as an indication of the “practical consequences of the findings for daily life.” Therefore, Gierut, Morrisette, & Dickinson’s paper “Effect Size for Single-Subject Design in Phonological Treatment” was of considerable interest to me when it was published in 2015. They report the distribution of effect sizes for 135 multiple baseline studies using a pooled standard deviation for the baseline phase of the studies as the denominator and the mean of the treatment phase minus the mean of the baseline phase as the numerator in the equation to calculate the effect size statistic. In these studies, the mean and the variance of probe scores in the baseline phase is restricted to be very small by design, because the treatment targets and generalization probe targets must show close to stable 0% correct performance during the baseline phase. The consequence of this restriction is that the effect size number will be very large even when the raw amount of performance change is not so great. Therefore the figure above shows hypothetical data that yields exactly their average effect size of 3.66 (specifically, [8.57%-1.25%]/.02 = 3.66). This effect size is termed a medium effect size in their paper but I leave it to the reader to decide if a change of not quite 9% accuracy in speech sound production is an acceptable level of change. It may be because in these studies, a treatment effect is operationalized as probe scores (single word naming task) for all the phonemes that were absent from the child’s repertoire at intake. From the research point of view this paper provides very important information: it permits researchers to compare effect sizes and explore variables that account for between-case differences in effect sizes in those cases where the researchers have used a multiple baseline design and treatment intensities similar to those reported in this paper (5 to 19 one-hour sessions typically delivered 3 times per week).

The question I am asking myself is whether the distribution of effect sizes that is reported in this paper is helpful to clinicians who are concerned with the practical significance of these studies. I ask this because I am starting to see manuscripts reporting clinical case studies in which the data are used to claim “large treatment effects” for a single case (using Gierut et al’s standard of an effect size of 6.32 or greater). Indeed, in the clinical setting SLPs will be asked to consider whether their clients are making “enough” progress. For example, in Rvachew and Nowak (2001) we asked parents to rate their agreement with the statement “My child’s communication skills are improving as fast as can be expected.” (This question was on our standard patient satisfaction questionnaire so in fact, we asked every parent this question, not just the ones in this RCT). But the parent responses in the RCT showed that there were significant between group differences in response to this question that aligned with the dramatic differences in child response to the traditional versus complexity approach to target selection that was tested in that study (e.g., 34% vs. 17% of targets mastered in these groups respectively). It seems to me that when a parent asks themselves this question they have multiple frames of reference: not only do they consider the child’s communicative competence before and after the introduction of therapy, they consider whether their child would make more or less change with other hypothetical SLPs and other treatment approaches, given that parents actually have choices about these things. Therefore, an effect size that says effectively, the child made more progress with treatment compared to no treatment is not really answering the parent’s question. However, with a group design it is possible to calculate an effect size that reflects change relative to the average amount of change one might expect, given therapy. To my mind this kind of effect size comes closer to answering the questions about practical significance that a parent or employer might ask.

This still leaves us with the question of what kind of change to describe. It is unfortunate that there are few if any controlled studies that have reported functional measures. I can think of some examples of descriptive studies that reported functional measures however. First, Campbell (1999) reported that good functional outcomes were achieved when preschoolers with moderate and severe Speech Delay received twice-weekly therapy over a 90- to 120-day period (i.e., on average the children’s speech intelligibility improved from approximately 50% to 75% intelligible as reported by parents). Second, there are a number of studies reporting ASHA-NOMS (functional communication measures provided by treating SLPs) for children receiving speech and language therapy. However, Thomas-Stonell et al (2007) found that improvement on the ASHA-NOMS was not as sensitive as parental reports of “real life communication change” over a 3 to 6 month interval. Therefore, Thomas-Stonell and her colleagues developed the FOCUS to document parental reports of functional outcomes in a reliable and standardized manner.

Thomas-Stonell et al (2013) report changes in FOCUS scores for 97 preschool aged children who received an average of 9 hours of SLP service in Canada, comparing change during the waiting period (60 day interval) to change during the treatment period (90 day interval). FOCUS assessments demonstrated significantly more change during treatment (about 18 FOCUS points on average) than during the wait period (about 6 FOCUS points on average). Then they compared minimally important changes in PCC, the Children’s Speech Intelligibility Measure, and FOCUS scores for 28 preschool aged children. The FOCUS measure was significantly correlated with the speech accuracy and intelligibility measures but there was not perfect agreement among these measures. For example, 21/28 children obtained a minimally important change of at least 16 points on the FOCUS but 4 of those children did not show significant change on PCC/CSIM. In other words speech accuracy, speech intelligibility and functional improvements are related but not completely aligned; each provides independent information about change over time.

In controlled studies, some version of percent consonants correct is a very common treatment outcome that is used  to assess the efficacy of phonology therapy. Gierut et al (2015) focused specifically on change in those phonemes that are late developing and produced with very low accuracy, if not completely absent from the child’s repertoire at intake. This strikes me as a defensible measure of treatment outcome. Regardless of whether one chooses to treat a complex sound, an early developing sound, a medium-difficulty sound (or one of each as I demonstrated in a previous blog), presumably the SLP wants to have dramatic effects across the child’s phonological system. Evidence that the child is adding new sounds to the repertoire is a good indicator of that kind of change. Alternatively the SLP might count increases in correct use of all consonants that were potential treatment targets prior to the onset of treatment. Or, the SLP could count percent consonants correct for all the consonants because this measure is associated with intelligibility and takes into account the fact that there can be regressions in previously mastered sounds when phonological reorganization is occurring. The number of choices suggests that it would be valuable to have effect size data for a number of possible indicators of change. More to the point, Gierut et al’s single subject effect size implies that almost any change above “no change” is an acceptable level of change in a population that receives intervention because they are stalled without it. I am curious to know if this is a reasonable position to take. In my next blog post I will report effect sizes for these speech accuracy measures taken from my own studies going back to 2001. I will also discuss the clinical significance of the effect sizes that I will aggregate. I am going to calculate the effect size for paired mean differences along with the corresponding confidence intervals for groups of preschoolers treated in three different studies. I haven’t done the calculations yet, so, for those readers who are at all interested in this, you can hold your breath with me.


Campbell, T. F. (1999). Functional treatment outcomes in young children with motor speech disorders. In A. Caruso & E. A. Strand (Eds.), Clinical Management of Motor Speech Disorders in Children (pp. 385-395). New York: Thieme Medical Publishers, Inc.

Gierut, J. A., Morrisette, M. L., & Dickinson, S. L. (2015). Effect Size for Single-Subject Design in Phonological Treatment. Journal of Speech, Language, and Hearing Research, 58(5), 1464-1481. doi:10.1044/2015_JSLHR-S-14-0299

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 1-12. doi:10.3389/fpsyg.2013.00863

Thomas-Stonell, N., McConney-Ellis, S., Oddson, B., Robertson, B., & Rosenbaum, P. (2007). An evaluation of the responsiveness of the pre-kindergarten ASHA NOMS. Canadian Journal of Speech-Language Pathology and Audiology, 31(2), 74-82.

Thomas-Stonell, N., Oddson, B., Robertson, B., & Rosenbaum, P. (2013). Validation of the Focus on the Outcomes of Communication under Six outcome measure. Developmental Medicine and Child Neuroloogy, 55(6), 546-552. doi:10.1111/dmcn.12123

Rvachew, S., & Nowak, M. (2001). The effect of target selection strategy on sound production learning. Journal of Speech, Language, and Hearing Research, 44, 610-623.




Research Engagement with SLPs

I still have days when I miss my former job as a research coordinator in a hospital speech-language department. As a faculty researcher, I try to embed my research in clinical settings as often as I can but it is not easy. Administrators, in particular, and speech-language pathologists on occasion may be leery of the time requirement and often worry that the project might shine too bright a light on every day clinical practices that may not be up to the highest evidence based standard. I always try to design projects that are mutually beneficial to the research team and the clinical setting. As a potential support to the promise of mutual benefit, I was pleased to read a recent paper in the British Medical Journal “Does the engagement of clinicians and organizations in research improve healthcare performance: a three-stage review”. On the basis of an hour-glass shaped review, using an interpretive sythesis of the literature on the topic, Boaz, Hanney, Jones, and Saper drew the following conclusions:

Some papers reported an association between hospital participation in research and improved patient outcomes. Some of these findings were quite striking as for example significantly worse survival from ovarian cancer in “non study hospitals” versus hospitals involved in research trials (my sister-in-law died from this terrible disease this month so I couldn’t help but notice this).

A majority of papers reported an association between hospital participation in research and improved processes of healthcare. This includes the adoption of innovative treatments as well as better compliance with best practice guidelines.

Different causal mechanisms may account for these findings when examining impacts at the clinician versus organization level. For example, involvement in a clinical trial may include staff training and other experiences that change clinician attitudes and behaviors. Higher up, participation in the trial may require the organization to acquire new infrastructure or adopt new policies.

The direction of cause and effect may be difficult to discern. Specifically, a hospital that is open to involvement in research may have a higher proportion of research-active staff who have unique skills, specialization or personal characteristics. These characteristics may jointly improve healthcare outcomes in that setting and that make those staff more amenable to engagement with research.

This last point resonates well with my experience at the Alberta Children’s Hospital in the 80’s and 90’s. The hospital had a very large SLP department, up to 30 SLPs, permitting considerable specialization among us. Furthermore, as a teaching hospital we a had a good network of linkages to the two universities in the province and to a broad array of referral sources. Our working model, that was based on multidisciplinary teams, also supported involvement in research. Currently, in Montreal I am able to set up research clinics in healthcare and educational settings from time to time, but none of them have the resources that we enjoyed in Alberta three decades ago.

Of course, direct involvement in research is not the only way for SLPs to engage with research evidence. Another paper, published in Research in Developmental Disabilities used a survey to explore “Knowledge acquisition and research evidence in autism.” Carrington et al found that researchers and practitioners had somewhat different perspectives. The researcher group (n=256) and the practitioner group (n=422) identified sources of information that they used to stay up to date with current information on autism. Researchers were more likely to identify scientific journals and their colleagues whereas practitioners were more likely to identify conferences/PD workshops and non-academic journals. Respondents also identified sources of information that they thought would help practitioners translate research to practice. Researchers thought that nontechnical summaries and interactions with researchers would be most helpful. Practitioners identified academic journals as the best source of information (but the paper doesn’t explain why they were not using these journals as their primary source).

Finally, the most interesting finding for me was that both groups did not use or suggest social media as a helpful source of information. I thought this was odd because social media is a potential access point to academic journal articles or summaries of those articles as well as a way of interacting directly with scientists.

The authors concluded that knowledge translation requires that practitioners be engaged with research and researchers. For that to happen they suggest that “research should focus on priority areas that meet the needs of the research-user community” and that “attempts to bridge the research-practice gap need to involve greater collaboration between autism researchers and research-users.”

Given that the research shows that the involvement of practitioners in research actually improves care and outcomes for our  clients and patients, I would say that it is past time to bring down barriers to researcher-SLP collaboration and bring research right into the clinical setting.