Using effect sizes to choose a speech therapy approach

I am quite intrigued by the warning offered by Adrian Simpson in his paper “The misdirection of public policy: comparing and combining standardized effect sizes

The context for the paper is the tendency of public policy makers to rely on meta-analyses to make decisions such as, for example, should we improve teachers’ feedback skills or reduce class sizes as a means of raising student performance? Simpson shows that that meta-analyses (and meta-analyses of the meta-analyses!) are a poor tool for making these apples to oranges comparisons and cannot be relied upon as a source of information when making public policy decisions such as this. He identifies three specific issues with research design that invalidate the combining and comparing of effect sizes. I think that these are good issues to keep in mind when considering effect sizes as a clue to treatment efficacy and a source of information when choosing a speech or language therapy approach.

Recall that an effect size is a standardized mean difference, whereby the difference between means (i.e., the mean outcome of the treatment condition versus the mean outcome of the control condition) is expressed in standard deviation units. The issue is that the standard deviation units, which are supposed to reflect the variation in outcome scores between participants in the intervention trial, actually reflect many different aspects of the research design. Therefore if you compare the effect size of an intervention as obtained in one treatment trial with the effect size for another intervention as obtained in a different treatment trial, you cannot be sure that the difference is due to differences in the relative effectiveness of the two treatments. And yet, SLPs are asking themselves these kinds of questions every day: should I use a traditional articulation therapy approach or a phonological approach? Should I add nonspeech oral motor exercises to my traditional treatment protocol? Is it more efficient to focus on expressive language or receptive language goals? Should I use a parent training approach or direct therapy? And so on. Why is it unsafe to combine and compare effect sizes across studies to make these decisions?

The first issue that Simpson raises is that of comparison groups. Many, although not all, treatment trials compare an experimental intervention to either a ‘no treatment’ control group or a ‘usual care’ condition. The characteristics of the ‘no treatment’ and ‘usual care’ controls are inevitably poorly described if at all. And yet meta-analyses will combine effect sizes across many studies despite having a very poor sense of what the control condition is in the studies that are included in the final estimate of treatment effect. Control group and intervention descriptions can be so paltry that in some cases the experimental treatment of one study may be equivalent to the control condition of another study. The Law et al. (2003) review combined effect sizes for a number of RCTs evaluating phonological interventions. One intervention compared a treatment that was provided in 22 twice-weekly half hours sessions over a four month period to a wait list control (Almost & Rosenbaum, 1998). Another intervention involved monthly 45 minute sessions provided over 8 months, in comparison to a “watchful waiting” control in which many parents “dropped out” of the control condition (Glogowska et al. 2000). Inadequate information was provided about how much intervention the control group children accessed while they waited – almost anything is possible relative to the experimental condition in the Glogowska trial. For example, Yoder et al. (2005) observed that their control group actually accessed more treatment than the kids in their experimental treatment group which maybe explains why they did not obtain a main effect of their intervention (or not, who knows?). The point is that it is hard to know whether a small effect size in comparison to a robust control is more or less impressive than a large effect size in comparison to no treatment at all. Certainly, the comparison is not fair.

The second issue raised concerns range restriction in the population of interest. I realize now that I failed to take this into account when I repeated (in Rvachew & Brosseau-Lapré, 2018) the conclusion that dialogic reading interventions are more effective for low-income children than children with developmental language impairments (Mol et al., 2008). Effect sizes are inflated when the intervention is provided to only a restricted part of the population, and the selection variables are associated with the study outcomes. However, the inflation is greatest for the children near the middle of the distribution and least for children at the tails of the distribution. This fact may explain why effect sizes for vocabulary size after dialogic reading intervention are highest for middle class children (.58, Whitehurst et al. 1988), in the middle for lower class but normally developing children (.33, Lonigan & Whitehurst, 1998), and lowest for children with language impairments (.13, Crain-Thoreson & Dale, 1999). There are other potential explanatory factors in these studies but this issue with restricted range is an important variable that is of obvious importance in treatment trials directed at children with speech and language impairments. The low effect size for dialogic reading obtained by Crain-Thoreson & Dale should not by itself discourage use of dialogic reading with this population.

Finally, measurement validity plays a huge role with longer more valid tests improving effect sizes in comparison to shorter less valid tests. This might be important when comparing the relative effectiveness of therapy for different types of goals. Law et al. (2003) concluded that phonology therapy appeared to be more effective than therapy for syntax goals for example. For some reason the outcome measures in these two groups of studies tend to be very different. Phonology outcomes are typically assessed with picture naming tasks that include 25 to 100 items, with the outcome often expressed as percent consonants correct and therefore at the consonant level there are many items contributing to the test score. Sometimes the phonology outcome measure is created specifically to probe the child’s progress on the specific target of the phonology intervention. In both cases the outcome measure is likely to be a sensitive measure of the outcomes of the intervention. Surprisingly, in Law et al., the outcome of the studies of syntax interventions were quite often omnibus measures of language functioning, such as the Preschool Language Scale, or worse the Reynell Developmental Language Scale, neither test containing many items targeted specifically at the domain of the experimental intervention. When comparing effect sizes across studies, it is crucial to be sure that the outcome measures have equal reliability and validity as measures of the outcomes of interest.

My conclusion is that it is important to not make a fetish of meta-analyses and effect sizes. These kinds of studies provide just one kind of information that should be taken into account when making treatment decisions. Their value is only as good as the underlying research—overall, effect sizes are most trustworthy when they come from the same study or a series of studies involving the exact same independent and dependent variables and the same study population. Given that this is a rare occurrence in speech and language research, there is no real substitute for a deep knowledge of an entire literature on any given subject. Narrative reviews from “experts” (a much maligned concept!) still have a role to play.


Almost, D., & Rosenbaum, P. (1998). Effectiveness of speech intervention for phonological disorders: a randomized controlled trial. Developmental Medicine and Child Neuroloogy, 40, 319-325.

Crain-Thoreson, C., & Dale, P. S. (1999). Enhancing linguistic performance: Parents and teachers as book reading partners for children with language delays. Topics in Early Childhool Special Education, 19, 28-39.

Glogowska, M., Roulstone, S., Enderby, P., & Peters, T. (2000). Randomised controlled trial of community based speech and language therapy in preschool children. British Medical Journal, 321, 923-928.

Law, J., Garrett, Z., & Nye, C. (2003). Speech and language therapy interventions for children with primary speech and language delay or disorder (Cochrane Review). Cochrane Database of Systematic Reviews, Issue 3. Art. No.: CD004110. doi:10.1002/14651858.CD004110.

Lonigan, C. J., & Whitehurst, G. J. (1998). Relative efficacy of a parent teacher involvement in a shared-reading intervention for preschool children from low-income backgrounds. Early Childhood Research Quarterly, 13(2), 263-290.

Mol, S. E., Bus, A. G., de Jong, M. T., & Smeeta, D. J. H. (2008). Added value of dialogic parent-child book readings: A meta-analysis. Early Education and Development, 19, 7-26.

Rvachew, S., & Brosseau-Lapré, F. (2018). Developmental Phonological Disorders: Foundations of Clinical Practice (Second Edition). San Diego, CA: Plural Publishing.

Simpson, A. (2017). The misdirection of public policy: comparing and combining standardised effect sizes. Journal of Education Policy, 1-17. doi:10.1080/02680939.2017.1280183

Whitehurst, G. J., Falco, F., Lonigan, C. J., Fischel, J. E., DeBaryshe, B. D., Valdez-Menchaca, M. C., & Caulfield, M. (1988). Accelerating language development through picture book reading. Developmental Psychology, 24, 552-558.

Yoder, P. J., Camarata, S., & Gardner, E. (2005). Treatment effects on speech intelligibility and length of utterance in children with specific language and intelligibility impairments. Journal of Early Intervention, 28(1), 34-49.

Introduction to the Wait Times Benchmarks Project

1. Introduction to the Wait Times Benchmarks Project

Access to speech, language, swallowing and hearing services is a critical concern across Canada. One indicator of the urgency of the problem is lengthy waits for service after a need has been  identified in one of these areas. Of course this issue is not restricted to communication health as attested by the Wait Times Alliance.  The Alliance was formed by doctors in 2004 to provide solutions to the problem of long waits for medical care in Canada’s publicly funded health service. Sadly, long waits for speech, language and hearing services are not specific to Canada, as reports in Australia and the United Kingdom have highlighted similar concerns to those raised by families by of children and adults who need services in Canada.

Although access to service is a multifaceted problem there are many reasons that wait times in particular invite a common focus by clients, service providers, funders, and politicians as the essential issue to target for improvement. The recent report by the Wait Times Alliance (Time to Close the Gap, Wait Times Alliance, 2014) lists several:

  1. it is established that many other countries with universal health care have succeeded in providing timely access to service and therefore we should not tolerate long waits when they are clearly not necessary;
  2. it can be shown that long waits for necessary services impose a significant burden on patients who are waiting as well as on society in general; and
  3. long waits for service impair health system performance such that improvements to wait times should result in gains for the system as a whole.

These considerations are as crucial for speech, language, swallowing and hearing health as for any other sector of the health care system. One step toward improvements in wait times is the development of benchmarks that indicate the maximum time that an individual should wait for service after taking into account the likelihood of significant clinical consequences should the wait  be longer. The Pan Canadian Alliance of Speech-Language Pathology and Audiology Organizations has committed to establishing reasonable wait times benchmarks as the first step toward reducing wait times for services.  A series of ad hoc committees recommended benchmark wait times for different diagnostic categories (see the Speech-Language and Audiology Canada (SAC) website). These wait times are being reviewed and reformatted according to a standard template and released publicly to the clinical community one at a time along with a published paper that provides the scientific foundation for each benchmark. The Benchmark Wait Times for Pediatric Speech Sound Disorders was released at the SAC Conference in May 2014 _and the associated Report was published in CJSLPA in Spring 2014 .The revised Benchmarks for Pediatric Language Disorders will be released soon and the Benchmarks for Fluency disorders are in progress.

In addition to releasing the benchmarks and the associated scientific reports, SAC will be providing additional information about benchmarks and their use in this blog which will be cross-posted to the SAC website and We will be inviting feedback and participation from the SAC membership or other interested commenters with each release. The schedule of upcoming blogs is shown below. We hope that you will follow the blog and consider commenting or contributing to this conversation.

Upcoming Posts

2. What is a Benchmark?

3. Approaches to Developing Wait Times Benchmarks

4. Evidence Based but not Evidence Bound

5. Use of Benchmarks by Clinicians and Policy Makers

6. Potential Advantages of Having Wait Times Benchmarks

7. Potential Disadvantages of Having Wait Times Benchmarks

8. Strategies for Achieving Wait Times Benchmarks

9. Factors that Impact on the Achievement of Wait Times Benchmarks

10. Role of the Pan Canadian Alliance and SAC in the Achievement of Wait Times Benchmarks

Wait Times Benchmarks for Speech-Language and Hearing Services

An important statement in the Universal Declaration of Communication Rights (International Communication Project 2014) is “We believe that people with communication disabilities should have access to the support they need to realize their full potential”. Even in those countries where speech-language pathology and audiology services are well established, long waits for service can be a significant barrier to communication for many children and adults. Twitter is a powerful tool for sharing knowledge and strategies for problem solving. This week on @WeSpeechies (see WESPEECHIES) we can share international perspectives on perceived appropriate wait times, actual wait times and strategies for reducing wait times for services around the world. When sharing information about this topic please identify yourself and provide general information about the nature of your clients and service sector while respecting privacy and confidentiality of specific individuals and organizations.

Q1. Approximately how long do your clients with speech-language needs wait for services? #WeSpeechies
Q2. Do you work with established expectations for wait times? How were the wait time benchmarks determined? #WeSpeechies
Q3. Do you think that clients with speech-language needs should have a guaranteed wait time for service? #WeSpeechies
Q4. What kind of criteria for deciding who gets served first are most fair? #WeSpeechies