How effective is phonology treatment?

Previously I asked whether it made sense to calculate effect sizes for phonology therapy at the within subject level. In other words, from the clinical point of view, do we really want to know whether the child’s rate of change is bigger during treatment than it was when the child was not being treated? Or, do we want to know if the child’s rate of change is bigger than the average amount of change observed among groups of children who get treated? If children who get treated typically change quite a bit and your client is not changing much at all, that might indicate a course correction (and note please, not a treatment rest!). From this perspective, group level effect sizes might be useful so I am providing raw and standardized effect sizes here from three of my past studies with a discussion to follow.

Rvachew, S., & Nowak, M. (2001). The effect of target selection strategy on sound production learning. Journal of Speech, Language, and Hearing Research, 44, 610-623.

The first data set involves 48 four-year-old children who scored at the second percentile, on average, on the GFTA (and 61 percent consonants correct in conversation). They were randomly assigned to receive treatment for relatively early developing stimulable sound targets (ME group, n=24) or late developing unstimulable sound targets (LL group, n=24). Each received treatment for four sounds over 2 six-week blocks, during 12 30 to 40 minute treatment sessions. The treatment approach employed traditional articulation therapy procedures. The children did not receive homework or additional speech and language interventions during this 12 week period. Outcome measures included single word naming probes covering all consonants in 3 word positions and percent consonants correct (PCC) in conversation, with 12 to 14 weeks intervening between the pre- and the post-test assessments. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) which indicates the mean pre- to post-change in percent consonants corrects on probes and in conversation; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for three outcome measures: single word naming probe scores for unstimulable phonemes, probe scores for stimulable phonemes, and percent consonants correct (PCC) obtained from conversations recorded while the child looked at a wordless picture book with the assessor.Effect size blog figure 2.

Some initial conclusions can be drawn from this table. The effect sizes for change in probe scores are all large. However, the group that received treatment for stimulable sounds showed greater improvement for both treated stimulable sounds and untreated unstimulable sounds compared to the group that received treatment for unstimulable sounds. There was almost no change in PCC derived from the conversational samples overall. I can report that 10 children in the ME group and 6 children in the LL group achieved improvements of greater than 5 PCC points, judged to be a “minimally important change”  by Thomas-Stonell et al. (2013). Half the children achieved no change at all however in PCC (conversation).

Rvachew, S., Nowak, M., & Cloutier, G. (2004). Effect of phonemic perception training on the speech production and phonological awareness skills of children with expressive phonological delay. American Journal of Speech-Language Pathology, 13, 250-263.

The second data set involves 34 four-year-old children who scored at the second percentile, on average, on the GFTA (and approximately 60 percent consonants correct in conversation). All of the children received 16 hour-long speech therapy sessions, once-weekly. The treatment that they received was entirely determined by their SLP with regard to target selection and approach to intervention. Ten SLPs provided the interventions, 3 using the Hodson cycles approach, 1 a sensory motor approach and the remainder using a traditional articulation therapy approach. The RCT element of this study is that the children were randomly assigned to an extra treatment procedure that occurred during the final 15 minutes of each session, concealed from their SLP. Children in the control group (n=17) listened to ebooks and answered questions. Children randomly assigned to the PA group (n=17) played a computer game that targeted phonemic perception and phonological awareness covering 8 phonemes in word initial and then word final position. Although the intervention lasted 4 months, the interval between pre-treatment and post-treatment assessments was 6 months long. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) indicates the mean pre- to post-change in percent consonants correct; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for two outcome measures: percent consonants correct (PCC) obtained from conversations recorded while the child looked at a wordless picture book with the assessor; and PCC-difficult, derived from the same conversations but restricted to phonemes that were produced with less than 60% accuracy at intake-in other words, phonemes that were potential treatment targets, specifically /ŋ,k,ɡ,v,ʃ,ʧ,ʤ,θ,ð,s,z,l,ɹ/.

Effect size blog figure 3

The sobering finding here is that the control group effect size for potential treatment targets is the smallest, with half the group making no change and the other half making a small change. The effect size for PCC (all) in the control group is more satisfying in that it is better than the minimally important change (i.e., 8% > 5%); 13 children in this group achieved a change of more than 5 points and only 3 made no change at all. The effect sizes are large in the group that received the Speech Perception/PA intervention in addition to their regular SLP program with good results for PCC (all) and PCC-difficult. This table shows that the SLP’s choice of treatment procedures makes a difference to speech accuracy outcomes.

Rvachew, S., & Brosseau-Lapré, F. (2015). A randomized trial of twelve week interventions for the treatment of developmental phonological disorder in francophone children. American Journal of Speech-Language Pathology, 24, 637-658. doi:10.1044/2015_AJSLP-14-0056

The third data set involves data from 64 French-speaking four-year-olds who were randomly assigned to receive either an output oriented intervention (n = 30) or an input-oriented intervention (n = 34) for remediation of their speech sound disorder. Another 10 children who were not treated also provide effect size data here. The children obtained PCC scores of approximately 70% on the Test Francophone de Phonologie, indicating severe speech sound disorder (consonant accuracy is typically higher in French-speaking children, compared to English). The children received other interventions as well as described in the research report (home programs and group phonological awareness therapy) with the complete treatment program lasting 12 weeks. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) indicates the mean pre- to post-change in percent consonants correct; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for two outcome measures: percent consonants correct with glides excluded (PCC), obtained from the Test Francophone de Phonologie, a single word naming test; PCC-difficult, derived from the same test but restricted to phonemes that were produced with less than 60% accuracy at intake-specifically /ʃ,ʒ,l,ʁ/. An outcome measure restricted to phonemes that were absent from the inventory at intake is not possible for this group because French-speaking children with speech sound disorders have good phonetic repertoires for the most part as their speech errors tend to involve syllable structure (see Brosseau-Lapré and Rvachew, 2014).

Effectsize blog figure 4

There are two satisfying findings here: first, when we do not treat children with a speech sound disorder, they do not change, and when we do treat them, they do! Second, when children receive an appropriate suite of treatment elements, large changes in PCC can be observed even over an observation interval as short as 12 weeks.

Overall Conclusions

  1. In the introductory blog to this series, I pointed out that Thomas-Stonell and her colleagues had identified a PCC change of 5 points as a “minimally important change”. The data presented here suggests that this goal can be met for most children over a 3 to 6 months period when children are receiving an appropriate intervention. The only case where this minimum standard was not met on average was in Rvachew & Nowak (2001), a study in which a strictly traditional articulation therapy approach was implemented at low intensity with no homework component.
  2. The measure that we are calling PCC-difficult might be more sensitive and more ecologically valid for 3 and 6 month intervals. This is percent consonants correct, restricted to potential treatment targets, so those consonants that are produced with less than 60% accuracy at intake. These turn out to be mid- to late-developing frequently misarticulated phonemes, therefore /ŋ,k,ɡ,v,ʃ,ʧ,ʤ,θ,ð,s,z,l,ɹ/ in English and /ʃ,ʒ,l,ʁ/ in French for these samples of 4-year-old children with severe and moderate-to-severe primary speech sound disorders. My impression is that when providing an appropriate intervention an SLP should expect at least a 10% change in these phonemes whether assessed with a broad based single word naming probe or in conversation-in fact a 15% change is closer to the average. This does not mean that you should treat the most difficult sounds first! Look carefully at the effect size data from Rvachew and Nowak (2001): when we treated stimulable phonemes we observed a 15% improvement in difficult unstimulable sounds. You can always treat a variety of phonemes from different levels of the phonological hierarchy as described in a previous blog.
  3. Approximately 10% of 4-year-old children with severe and moderate-to-severe primary speech sound disorders do not improve at all over a 3 to 6 month period, given adequate speech therapy. If a child is not improving, the SLP and the parent should be aware that this is a rare event that requires special attention.
  4. In a previous blog I cited some research evidence for the conclusion that patients treated as part of research trials achieve better outcomes than patients treated in a usual care situation. There is some evidence for that in these data. The group in Rvachew, Nowak and Cloutier that received usual care obtained a lower effect size (d=0.45) in comparison to the group that received an extra experimental intervention (d=1.31). In practical terms this difference meant that the group that received the experimental intervention made four times more improvement in the production of difficult sounds than the control group that received usual care.
  5. The variation in effect sizes that is shown in these data indicate that SLP decisions about treatment procedures and service delivery options have implications for success in therapy. What are the characteristics of the interventions that led to relatively large changes in PCC or relatively large standardized effect sizes? (i) Comprehensiveness, that is the inclusion of intervention procedures that target more than one level of representation, e.g., procedures to improve articulation accuracy and speech perception skills and/or phonological awareness; and (ii) parent involvement, specifically the inclusion of a well-structured and supported home program.

If you see other messages in these data, or have observations from your own practice or research, please write to me in the comments.




Are effect sizes in research papers useful in SLP practice?

Effect size blog figure 1Effect sizes are now required in addition to statistical significance reporting in scientific reports. As discussed in a previous blog, effect sizes are useful for research purposes because they can be aggregated across studies to draw conclusions (i.e., as, in a meta-analysis). However, they are also intended to be useful as an indication of the “practical consequences of the findings for daily life.” Therefore, Gierut, Morrisette, & Dickinson’s paper “Effect Size for Single-Subject Design in Phonological Treatment” was of considerable interest to me when it was published in 2015. They report the distribution of effect sizes for 135 multiple baseline studies using a pooled standard deviation for the baseline phase of the studies as the denominator and the mean of the treatment phase minus the mean of the baseline phase as the numerator in the equation to calculate the effect size statistic. In these studies, the mean and the variance of probe scores in the baseline phase is restricted to be very small by design, because the treatment targets and generalization probe targets must show close to stable 0% correct performance during the baseline phase. The consequence of this restriction is that the effect size number will be very large even when the raw amount of performance change is not so great. Therefore the figure above shows hypothetical data that yields exactly their average effect size of 3.66 (specifically, [8.57%-1.25%]/.02 = 3.66). This effect size is termed a medium effect size in their paper but I leave it to the reader to decide if a change of not quite 9% accuracy in speech sound production is an acceptable level of change. It may be because in these studies, a treatment effect is operationalized as probe scores (single word naming task) for all the phonemes that were absent from the child’s repertoire at intake. From the research point of view this paper provides very important information: it permits researchers to compare effect sizes and explore variables that account for between-case differences in effect sizes in those cases where the researchers have used a multiple baseline design and treatment intensities similar to those reported in this paper (5 to 19 one-hour sessions typically delivered 3 times per week).

The question I am asking myself is whether the distribution of effect sizes that is reported in this paper is helpful to clinicians who are concerned with the practical significance of these studies. I ask this because I am starting to see manuscripts reporting clinical case studies in which the data are used to claim “large treatment effects” for a single case (using Gierut et al’s standard of an effect size of 6.32 or greater). Indeed, in the clinical setting SLPs will be asked to consider whether their clients are making “enough” progress. For example, in Rvachew and Nowak (2001) we asked parents to rate their agreement with the statement “My child’s communication skills are improving as fast as can be expected.” (This question was on our standard patient satisfaction questionnaire so in fact, we asked every parent this question, not just the ones in this RCT). But the parent responses in the RCT showed that there were significant between group differences in response to this question that aligned with the dramatic differences in child response to the traditional versus complexity approach to target selection that was tested in that study (e.g., 34% vs. 17% of targets mastered in these groups respectively). It seems to me that when a parent asks themselves this question they have multiple frames of reference: not only do they consider the child’s communicative competence before and after the introduction of therapy, they consider whether their child would make more or less change with other hypothetical SLPs and other treatment approaches, given that parents actually have choices about these things. Therefore, an effect size that says effectively, the child made more progress with treatment compared to no treatment is not really answering the parent’s question. However, with a group design it is possible to calculate an effect size that reflects change relative to the average amount of change one might expect, given therapy. To my mind this kind of effect size comes closer to answering the questions about practical significance that a parent or employer might ask.

This still leaves us with the question of what kind of change to describe. It is unfortunate that there are few if any controlled studies that have reported functional measures. I can think of some examples of descriptive studies that reported functional measures however. First, Campbell (1999) reported that good functional outcomes were achieved when preschoolers with moderate and severe Speech Delay received twice-weekly therapy over a 90- to 120-day period (i.e., on average the children’s speech intelligibility improved from approximately 50% to 75% intelligible as reported by parents). Second, there are a number of studies reporting ASHA-NOMS (functional communication measures provided by treating SLPs) for children receiving speech and language therapy. However, Thomas-Stonell et al (2007) found that improvement on the ASHA-NOMS was not as sensitive as parental reports of “real life communication change” over a 3 to 6 month interval. Therefore, Thomas-Stonell and her colleagues developed the FOCUS to document parental reports of functional outcomes in a reliable and standardized manner.

Thomas-Stonell et al (2013) report changes in FOCUS scores for 97 preschool aged children who received an average of 9 hours of SLP service in Canada, comparing change during the waiting period (60 day interval) to change during the treatment period (90 day interval). FOCUS assessments demonstrated significantly more change during treatment (about 18 FOCUS points on average) than during the wait period (about 6 FOCUS points on average). Then they compared minimally important changes in PCC, the Children’s Speech Intelligibility Measure, and FOCUS scores for 28 preschool aged children. The FOCUS measure was significantly correlated with the speech accuracy and intelligibility measures but there was not perfect agreement among these measures. For example, 21/28 children obtained a minimally important change of at least 16 points on the FOCUS but 4 of those children did not show significant change on PCC/CSIM. In other words speech accuracy, speech intelligibility and functional improvements are related but not completely aligned; each provides independent information about change over time.

In controlled studies, some version of percent consonants correct is a very common treatment outcome that is used  to assess the efficacy of phonology therapy. Gierut et al (2015) focused specifically on change in those phonemes that are late developing and produced with very low accuracy, if not completely absent from the child’s repertoire at intake. This strikes me as a defensible measure of treatment outcome. Regardless of whether one chooses to treat a complex sound, an early developing sound, a medium-difficulty sound (or one of each as I demonstrated in a previous blog), presumably the SLP wants to have dramatic effects across the child’s phonological system. Evidence that the child is adding new sounds to the repertoire is a good indicator of that kind of change. Alternatively the SLP might count increases in correct use of all consonants that were potential treatment targets prior to the onset of treatment. Or, the SLP could count percent consonants correct for all the consonants because this measure is associated with intelligibility and takes into account the fact that there can be regressions in previously mastered sounds when phonological reorganization is occurring. The number of choices suggests that it would be valuable to have effect size data for a number of possible indicators of change. More to the point, Gierut et al’s single subject effect size implies that almost any change above “no change” is an acceptable level of change in a population that receives intervention because they are stalled without it. I am curious to know if this is a reasonable position to take. In my next blog post I will report effect sizes for these speech accuracy measures taken from my own studies going back to 2001. I will also discuss the clinical significance of the effect sizes that I will aggregate. I am going to calculate the effect size for paired mean differences along with the corresponding confidence intervals for groups of preschoolers treated in three different studies. I haven’t done the calculations yet, so, for those readers who are at all interested in this, you can hold your breath with me.


Campbell, T. F. (1999). Functional treatment outcomes in young children with motor speech disorders. In A. Caruso & E. A. Strand (Eds.), Clinical Management of Motor Speech Disorders in Children (pp. 385-395). New York: Thieme Medical Publishers, Inc.

Gierut, J. A., Morrisette, M. L., & Dickinson, S. L. (2015). Effect Size for Single-Subject Design in Phonological Treatment. Journal of Speech, Language, and Hearing Research, 58(5), 1464-1481. doi:10.1044/2015_JSLHR-S-14-0299

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 1-12. doi:10.3389/fpsyg.2013.00863

Thomas-Stonell, N., McConney-Ellis, S., Oddson, B., Robertson, B., & Rosenbaum, P. (2007). An evaluation of the responsiveness of the pre-kindergarten ASHA NOMS. Canadian Journal of Speech-Language Pathology and Audiology, 31(2), 74-82.

Thomas-Stonell, N., Oddson, B., Robertson, B., & Rosenbaum, P. (2013). Validation of the Focus on the Outcomes of Communication under Six outcome measure. Developmental Medicine and Child Neuroloogy, 55(6), 546-552. doi:10.1111/dmcn.12123

Rvachew, S., & Nowak, M. (2001). The effect of target selection strategy on sound production learning. Journal of Speech, Language, and Hearing Research, 44, 610-623.




What is a control group?

I have a feeling that my blog might become less popular in the next little while because you may notice an emerging theme on research design and away from speech therapy procedures specifically! But it is important to know how to identify evidence based procedures and to do that requires knowledge of research design and it has come to my attention, as part of the process of publishing two randomized control trials (RCTs) this past year, that there are a lot of misperceptions about what an RCT is in the SLP and education communities, among both clinicians and researchers. Therefore, I am happy to draw your attention to this terrific blog by Edzard Ernst, and in particular to an especially useful post “How to differentiate good from bad research”. The writer points out that a proper treatment of this topic “must inevitably have the size of a book” because each of the indicators that he provides “is far too short to make real sense.” So I have taken it upon myself in this blog to expand upon one of his indicators of good research – one that I know causes some confusion, specifically:

  • Use of a placebo in the control group where possible.

Recently the reviewers (and editor) of one of my studies was convinced that my design was not an RCT because the children in both groups received an intervention. In the absence of a “no-treatment control” they said, the study could not be an RCT! I was mystified about the source of this strange idea until I read Ernst’s blog and realized that many people, recalling their research courses from university, must be mistaking “placebo control” for “no-treatment control.” However, a placebo control condition is not at all like the absence of treatment. Consider the classic example of a placebo control: in a drug trial, the patients randomized to the treatment arm will visit the nurse who hands him or her a white paper cup holding 2 pink pills containing active ingredient X and some other ingredients that do not impact the patient’s disease, i.e., inactive ingredients; the patients randomized to the control arm will also visit the nurse who hands him or her a white paper cup holding 2 pink pills containing only the inactive ingredients. In other words, the experiment is designed so that all patients are “treated” exactly the same except that only patients randomized to treatment receive (unknowingly) the active ingredient. Therefore, all changes in patient behavior that are due to those aspects of the treatment that are not the active treatment (visiting the nice nurse, expecting the pills to make a difference etc.) are equalized across arms of the study. These are called the “common factors” or “nonspecific factors”.

In the case of a behavioral treatment it is important to equalize the common factors across all arms of the study. Therefore in my own studies I deliberately avoid “no treatment” controls. In my very first RCT (Rvachew, 1994) for example the treatment conditions in the two arms of the study were as follows;

  • Experimental: 10 minutes of listening to sheet vs Xsheet recordings and judging correct vs incorrect “sheet” items (active ingredient) in a computer game format followed by 20 minutes of traditional “sh” articulation therapy, provided by a person blind to the computer game target.
  • Control: 10 minutes of listening to Pete vs meat recordings and judging correct vs incorrect “Pete” items in a computer game format followed by 20 minutes of traditional “sh” articulation therapy, provided by a person blind to the computer game target.

It can be seen that the study was designed to ensure that all participants experienced exactly the same treatment except for the active ingredient that was reserved for children who were randomly assigned to the experimental treatment arm, specifically exposure to the experience of listening to and making perceptual judgments about a variety of correct and incorrect versions of words beginning with “sh” or distorted versions of “sh”-the sound that the children misarticulated. Subsequently I have conducted all my randomized control studies in a similar manner. But, as I said earlier, I run across readers who vociferously assert that the studies are not RCTs because an RCT requires a “no treatment” control. In fact, a “no treatment” control is a very poor control indeed as argued in this blog that explains why the frequently used “wait list control group” is inappropriate. For example, a recent trial on the treatment of tinnitus claimed that a wait list control had merit because “While this comparison condition does not control for all potential placebo effects (e.g., positive expectation, therapeutic contact, the desire to please therapists), the wait-list control does account for the natural passing of time and spontaneous remission.” In fact, it is impossible to control for common factors when using a wait list control and it is unlikely that patients are actually “just waiting” when you randomize them to the “wait list control” condition; therefore Hesser et al.’s defense of the wait list control is  optimistic although their effort to establish how much change you get in this condition is worthwhile.

We had experience with a “wait list” comparison condition in a recent trial (Rvachew & Brosseau-Lapré, 2015). Most of the children were randomly assigned to one of four different treatment conditions, matched on all factors except the specific active ingredients of interest. However, we also had a nonexperimental wait list comparison group* to estimate change for children outside of the trial. We found that parents were savvy about maximizing the treatment that their children could receive in any given year. Our trial lasted six weeks, the public health system entitled them to six weeks of treatment and their private insurance entitled them to six to 12 weeks of therapy depending on the plan. Parents would agree to enrolled their child in the trial with randomization to a treatment arm if their child was waiting for the public service, OR they would agree to be assessed in the “wait list” arm if their child was currently enrolled in the public service. They would use their private insurance when all other options had been exhausted. Therefore the children in the “wait list” arm were actually being treated. Interestingly, we found that the parents expected their children to obtain better results from the public service because it was provided by a “real” SLP rather than the student SLPs who provided our experimental treatments even though the public service was considerably less intense! (As an aside, we were not surprised to find that the reverse was true). Similarly, as I have mentioned in previous blogs, Yoder et al. (2005) found that the children in their “no treatment” control accessed more treatment from other sources than did the children in their treatment arm. And parents randomized to the “watchful waiting” arm of the Glogowska et al. (2000) trial sometimes dropped out because parents will do what they must to meet their child’s needs.

In closing, a randomized control trial is simply a study in which participants are randomly assigned to an experimental treatment and a control condition (even in a cross-over design, in which all participants experience all conditions, as in Rvachew et al., in press). The nature of the control should be determined after careful thought about the factors that you are attempting to control, which can be many – placebo, Hawthorne, fatigue, practice, history, maturation and so on. These will vary from trial to trial obviously. Placebo control does not mean “no treatment” but rather, a treatment that excludes everything except the “active ingredient” that is the subject of your trial. As an SLP, when you are reading about studies that test the efficacy of a treatment, you need to pay attention to what happens to the control group as well as the treatment group. The trick is to think in every case – what is the active ingredient that explains the effect seen in the treatment group? what else might account for the effects seen in the treatment arm of this study? If I implement this treatment in my own practice, how likely am I to get a better result compared to the treatment that my caseload is currently receiving?

* A colleague sent me a paper (Mercer et al., 2007) in which a large number of researchers advocating for the acceptance of a broader array of research designs in order to focus more attention on external validity and translational research, got together to discuss the merits of various designs. During the symposium it arose that there was disagreement about the use of the terms “control” and “comparison” group. I use the terms in accordance with a minority of their attendees, as follows: control group means that the participants were randomly assigned to a group that did not experience the “active ingredient” of the experimental treatment; comparison group means that the participants were not randomly assigned to the group that did not experience the experimental intervention, a group that may or may not have received a treatment. This definition was ultimately not used by the attendees, I don’t know why – somehow they decided on a different definition that didn’t make any sense at all, I invite you to consult p. 141 and see if you can figure it out!


Glogowska, M., Roulstone, S., Enderby, P., & Peters, T. (2000). Randomised controlled trial of community based speech and language therapy in preschool children. British Medical Journal, 321, 923-928.

Hesser, H., Weise, C., Rief, W., & Andersson, G. (2011). The effect of waiting: A meta-analysis of wait-list control groups in trials for tinnitus distress. Journal of Psychosomatic Research, 70(4), 378-384. doi:

Mercer, S. L., DeVinney, B. J., Fine, L. J., Green, L. W., & Dougherty, D. (2007). Study Designs for Effectiveness and Translation Research: Identifying Trade-offs. American Journal of Preventive Medicine, 33(2), 139-154.e132. doi:

Rvachew, S. (1994). Speech perception training can facilitate sound production learning. Journal of Speech and Hearing Research, 37, 347-357.

Rvachew, S., & Brosseau-Lapré, F. (2015). A randomized trial of twelve week interventions for the treatment of developmental phonological disorder in francophone children. American Journal of Speech-Language Pathology, 24, 637-658. doi:10.1044/2015_AJSLP-14-0056

Rvachew, S., Rees, K., Carolan, E., & Nadig, A. (in press). Improving emergent literacy with school-based shared reading: Paper versus ebooks. International Journal of Child-Computer Interaction. doi:

Yoder, P. J., Camarata, S., & Gardner, E. (2005). Treatment effects on speech intelligibility and length of utterance in children with specific language and intelligibility impairments. Journal of Early Intervention, 28(1), 34-49.

Thinking About ‘Dose’ and SLP Practice: Part II

I have been talking about whether it is helpful to think about dose-response relationships as an important aspect of treatment efficacy. During a recent @wespeechies exchange, we discussed whether this “medical” concept should be applied to speech therapy. One objection raised was the idea that treatment efficacy is “all about relationships” and therefore the dosage of specific inputs was not all that relevant to outcomes. In psychotherapy, objections to manualized care protocols that prescribe specific procedures for defined cases are also based on the notion that treatment efficacy is determined not by the specific ingredients of the treatment program but rather by common factors, as I discussed in a previous blog. One of the important common factors is the therapeutic alliance. How important is the therapeutic alliance to treatment outcomes? And does attention to the therapeutic alliance preclude thinking carefully about which procedures to use in which amounts with a given case?

In psychotherapy the therapeutic alliance is defined “as agreement on the goals and tasks of therapy in the context of a positive affective bond between patient and therapist.” Even when working with children, this can be an important aspect of the treatment program. For example, McCormack, McLeod, McAllister and Harrison describe children’s experience of speech impairment in a paper entitled “My Speech Problem, Your Listening Problem, My Frustration…”. This qualitative study illuminates multiple facets of an SSD and further shows that the child’s perspective and the adult’s perspective on the problem and the solution are often not aligned. Shifting the child’s attention to the role of his or her speech problem in communication breakdowns will require a genuine, caring, sensitive and trusting relationship between SLP and child. Establishing common goals and motivating the child to try new tasks to achieve those goals will also be highly dependent upon the therapeutic alliance between child and therapist.

To understand how the therapeutic alliance impacts on therapy outcomes we must return to the psychotherapy literature because I am aware of no scientific studies in the speech therapy arena that have addressed this issue directly. In mental health services, the strength of the therapeutic alliance is measured by asking clients questions about their relationship with their therapist in three domains, specifically goals (e.g., We agree on what is important for me to work on.), tasks (e.g., I agree the way we are working on my problem is correct), and bond (e.g., I believe my therapist likes me).  Very large sample studies have shown that the relationship between therapist and client accounts for about 20% of variance in outcomes. However, the relationship between outcomes and the therapeutic relationship is reciprocal: if the client gets better, they have more trust in the therapist’s guidance regarding goals and tasks. Therefore, the therapeutic relationship is theoretically independent of the techniques and procedures that the therapist uses, but in practice these variables may be related.

To put this in the speech therapy context again, Francoise Brosseau-Lapré and I are in the process of publishing the results of our RCT, Essai Clinique sur les Interventions Phonologique. We found that an input oriented approach (procedures focused on perceptual and phonological knowledge with very little articulatory practice) was as effective as an output oriented approach (all procedures focused on articulation practice) for improving children’s articulation accuracy.  Therefore, when working with a very shy child who does not like to imitate or indeed, talk at all, during speech therapy, you and the parent and the child might all agree that the input oriented approach is the ideal way to work on the child’s speech problem. Initially the therapeutic alliance might be high but what if the implementation of the approach is not competent? We find for example that it is actually quite difficult to teach students to implement the procedures (focused stimulation, error detection tasks and meaningful minimal pairs procedures) correctly. Furthermore we found that when procedures are mixed and matched in a way that is not theoretically coherent (for example, input oriented procedures in the clinic but an output oriented home practice program), we observed very poor outcomes. It is probable that in cases of poor implementation, outcomes and the therapeutic alliance will both suffer. At the very least, as I have found previously, parents are able to identify poor speech outcomes in their children even as they report good relationships with their child’s SLP.

This discussion reminds me of a very interesting article about teacher effectiveness that was circulated on twitter by @KevinWheldell. Gregory Yates makes the distinction between good teachers and effective teachers. Similarly SLPs may be readily judged to be good on the basis of personal and moral qualities such as warmth, caring, friendliness and conscientiousness, all of which contribute to positive relationships with clients, coworkers and their institution. Effectiveness requires the skillful application of specific techniques and procedures in relation to client needs however and can only be measured in reference to client outcomes. More about this in the next blogpost in this series.

Don’t get tricked: Why it pays to read original sources.

In my last blog post I suggested that you can have confidence in the effectiveness of your clinical practice if you select treatment practices that have been validated by research. Furthermore, I provided links to some resources for summaries of research evidence. In this blog post I want to caution that it is important to read the original sources and to view the summaries, including meta-analyses, with some skepticism. Excellent clinical practice requires a deep knowledge of the basic science that is the foundation for the clinical procedures that you are using. Familiarity with the details of the clinical studies that address the efficacy of those procedures is also essential. I will provide two examples where a lack of familiarity with those details has led to some perverse outcomes.

Two decades ago it was quite common for children who were receiving services from publically funded providers in Canada to receive 16-week blocks of intervention. Then we went through the recession of the nineties and there was much pressure on managers in health care to cut costs. Fey, Cleave, Long, and Hughes (1993) conveniently published an RCT demonstrating that a parent intervention was just as effective as direct intervention provided by the SLP to improve children’s expressive grammar – the icing on the cake was that the parent-provided service required half as many SLP hours as the direct SLP-provided service. All across Canada, direct service blocks were cut to 8 weeks and parent-consultation services were substituted for the direct therapy model. About a decade after that I made a little money myself giving workshops to SLPs on evidence based practiced. The audiences were always shocked when I presented the actual resource inputs for Fey et al.’s interventions: (1) direct SLP intervention –  cost = 40 hours per child over 20 weeks, versus (2) parent administered intervention – cost = 21 hours per child over 20 weeks. So you see, the SLPs had been had by their managers! The SLPs would have been better positioned to resist this harmful change in service delivery model if they had been aware of the source of the claim that you could halve your therapy time by implementing a home program and get the same result. I don’t know that our profession could have changed the situation by being more knowledgeable about the research on service delivery models because the political and financial pressures at the time were extreme – but at least we and our patients would have had a fighting chance!

Another reason that you have to be vigilant is that the authors of research summaries have been known to engage in some sleight of hand. An example of this is chapter on Complexity Approaches by Baker and Williams in the book Interventions for Speech Sound Disorders in Children. This book is pretty cool because each chapter describes a different approach  and is usually accompanied by a video demonstration. Each author was asked to identify all the studies that support the approach and put them on a “levels of evidence” table. As indicated in a previous blog post, the complexity approach to selecting targets for intervention is supposedly supported by a great many studies employing the multiple probe design which is a fairly low level of evidence because it does not control for maturation or history effects. In the Baker and Williams “levels of evidence” table all of these single subject studies are listed  so it looks pretty impressive. The evidence to support the approach looks even more impressive when you notice that two randomized controlled trials are shown at a higher level on the table. This table leads you to believe that the complexity approach is supported by a large amount of data and the highest level of evidence until you realize that neither of those two RCTs, Dodd et al. (2008) and Rvachew and Nowak (2001), support the complexity approach. Even when you read the text, it is not clear that these RCTs do not provide support for the approach because the authors are a bit wafflely about this fact.  Before I noticed this table I couldn’t understand why clinicians would tell me proudly that they were using the complexity approach because it is evidence based. It is pretty hard to keep up with the evidence when you have to watch out for tricks like this!

In the comments to my last blog post there were questions about how you can be sure that your treatment is leading to change that is better than maturation alone. An RCT is designed to answer just that question so I am going to discuss the results of Rvachew and Nowak (2001), as detailed in a later paper, Rvachew, S. (2005). Stimulability and treatment success. Topics in Language Disorders. Clinical Perspectives on Speech Sound Disorders, 25(3), 207-219. Unfortunately this paper is hard to get so a lot of SLPs are not aware of the implications of our findings for the central argument that motivates the use of the complexity approach to target selection.  Gierut (2007) grounds the complexity approach on learnability theory, paradoxically the notion that language is essentially unlearnable and thus the structure of language must be innately built in. Complex language inputs are necessary to trigger access to this knowledge. Because of the hierarchical structure of this built-in knowledge, exposure to complex structure will “unlock the whole”, having a cascading effect down through the system. On the other hand, she claims that “it has been shown that simpler input actually makes language learning more difficult because the child is provided with only partial information about linguistic structure (p. 8).”

We tested this hypothesis in our RCT. Each child received a 15 item probe of their ability produce all the consonants of English in initial, medial and final position of words. The phonemes that they had not mastered were then ordered according to productive phonological knowledge and developmental order. Michele Nowak selected potential treatment targets for each child from both ends of the continuum. I independently (blindly, without access to the child’s test information or knowledge of the targets that Michelle had selected) randomly assigned the child to treatment condition, either ME or LL. ME condition means that the child was treated for phonemes for which the child had most knowledge and which are usually early developing. LL condition means that the child was treated for phonemes for which the child had least productive phonological knowledge and which are usually late developing. The children were treated in two six week blocks with a change in treatment targets for the second block using the same procedure to select the targets. The figure below shows probe performance for several actual and potential targets per child: the phoneme being treated in a given block, the phoneme to be treated in the next block (or that was treated in the previous block) and the phonemes that would have been treated if the child had been assigned to the other treatment condition. As a clinician, I am interested in learning and retention of the treated phonemes, relative to maturation. As a scientist who is testing the complexity approach, Gierut is interested in cross-class generalization, regardless of whether the child learns the targeted phoneme. We can look at these two outcomes across the two groups.

Let’s begin with the question of whether the children learned the target phonemes and whether there is any evidence that this learning is greater than what we would see with maturation alone. In the chart, learning during treatment is shown by the solid lines whereas dotted lines indicate periods where those sounds were not being treated. A1 is the assessment before the first treatment block, A2 is the assessment after the first block and before the second block, and A3 is the last assessment after the second treatment block. On the left hand side, we see that the ME group was treated during the first block for phonemes that were mastered in one word position but not in the other two (average score of 6/15 prior to treatment). The slopes of the solid versus dotted lines show you that change from A1 to A2 was greater than change from A2 to A3. This means that these targets showed more change when they were being treated in the first block than when they were not being treated during the second block. During the second block, we treated slightly harder sounds that were not mastered in any word position, with a starting probe score of 3/15 on average. These phonemes improved from A1 to A2 even though they weren’t being treated but the rate of improvement is much higher between A2 and A3 when they were being treated. Interestingly, the slopes of the solid lines and the slopes of the dotted lines are parallel – this is your treatment effect – this is the proof that treatment is more effective than not treating. As further proof we can look at the results for the LL group. We have a similar situation with parallel solid and dotted lines for the phonemes that were treated in the first and second blocks at the bottom of the chart. We don’t have as much improvement for these phonemes because they were very difficult, unstimulable late developing sounds (targets that are consistent with the complexity approach). None-the-less the outcomes are better while the phonemes are being treated than when they are not (in fact there are slight regressions during the blocks when these sounds are not treated). At the same time, the phonemes for which the children have the most knowledge improve spontaneously (Gierut would attribute this change to cross-class generalization whereas I attribute this change to maturation). The interesting comparison however is across groups. Notice that the ME group shows a change of 4 points for treated “most knowledge” phonemes versus a change of 3 points for the untreated “most knowledge” phonemes for the LL group. This is not a very big difference but none-the-less, treating these phonemes results in slightly faster progress than not treating them.

In our 2001 paper we reported that progress for treated targets was substantially better for children in the ME condition than for children in the LL condition (in the latter group, the children remained unstimulable for 45% of targets after 6 weeks of therapy). However, the proponents of the complexity approach are not interested in this finding. If the child does not learn the hard target that is an acceptable price to pay if cross-class generalization occurs and the child learns easier untreated phonemes. If you look at the right hand side of the chart by itself, the chart can be taken as support for the complexity approach because spontaneous gains are observed for the “most knowledge” phonemes. The problem is that the proponents of this approach have argued that exposure to “simpler input actually makes language learning more difficult” – it is literally supposed to be impossible to facilitate learning of harder targets by teaching simpler targets. Therefore the real test of the complexity approach is not in the right hand chart. We have to compare the rate of change for the unstimulable targets across the two groups. It is apparent that the gain for UNTREATED unstimulable phonemes (ME group, gain = 2) is double that observed for TREATED unstimulable phonemes (LL group, gain = 1). The results shown on the left clearly show that treating the easier sounds first facilitated improvements for the difficult phonemes. I have explained this outcome by reference to dynamic systems theory in Rvachew and Bernhardt (2010). From my perspective, it is not just that my RCT shows that the complexity approach doesn’t work. It’s that my RCT is just part of a growing and broad based literature that invalidates the “learnability approach” altogether. Francoise and I describe and evaluate this evidence while promoting a developmental approach to phonology in our book Developmental Phonological Disorders: Foundations of Clinical Practice.


Probe Scores for Treated and Untreated Phonemes

Probe Scores for Treated and Untreated Phonemes









The larger point that I am trying to make here is that SLPs need to know the literature deeply. The evidence summaries tend to take a bit of a “horse race” approach, grading study quality on the basis of sometimes questionable checklists and then making conclusions on the basis of how many studies can be amassed at a given level of the evidence table. This is not always a clinically useful practice. It is necessary to understand the underlying theory, to know the details of the methods used in those studies, and to draw your own conclusions about the applicability of the treatments to your own patients. This means reading the original sources. In order to achieve this level of knowledge we need to reorganize our profession to encourage a greater number of specialists in the field because no individual SLP can have this depth of knowledge about every type of patient that you might treat. But it should be possible to encourage the development of specialists who are given the opportunity to stay current with the literature and provide consultation services to generalists on the front lines. Even if we could ensure that SLPs had access to the best evidence as a guide to practice however, there are some “common factors” that have a large impact on outcomes even when treatment approach is controlled. In my next post I will address the role of the individual clinician in ensuring excellent client outcomes.