Single Subject Randomization Design For Clinical Research

Ebbels tweet Intervention ResearchDuring the week April 23 – 29, 2017 Susan Ebbels is curated WeSpeechies on the topic Carrying Out Intervention Research in SLP/SLT Practice. Susan kicked off the week with a link to her excellent paper that discusses the strengths and limitations of various procedures for conducting intervention research in the clinical setting. As we would expect, a parallel groups randomized control design was deemed to provide the best level of experimental control. Many ways of studying treatment related change within individual clients, with increasing degrees of control were also discussed. However, all of the ‘within participant’ methods described were vulnerable to confounding by threats to internal validity such history, selection, practice, fatigue, maturation or placebo effects to varying degrees.

One design was missing from the list because it is only just now appearing in the speech-language pathology literature, specifically the Single Subject Randomization Design. The design (actually a group of designs in which treatment sessions are randomly allocated to treatment conditions) provides the superior internal validity of the parallel groups randomized control trial by controlling for extraneous confounds through randomization. As an added benefit the results of a single subject randomization design can be submitted to a statistical analysis, so that clear conclusions can be drawn about the efficacy of the experimental intervention. At the same time, the design can be feasibly implemented in the clinical setting and is perfect for answering the kinds of questions that come up in daily clinical practice. For example, randomized control trials have shown than speech perception training is an effective adjunct to speech articulation therapy on average when applied to groups of children but you may want to know if it is a necessary addition to your therapy program for a speciRomeiser Logan Levels of Evidence SCRfic child.

Furthermore,  randomized single subject experiments are now acceptable as a high level of research evidence by the Oxford Centre for Evidence Based Medicine. An evidence hierarchy has been created for rating single subject trials, putting the randomized single subject experiments at the top of the evidence hierarchy as shown in the following table, taken from Romeiser Logan et al. 2008.


Tanya Matthews and I have written a tutorial showing exactly how to implement and interpret two versions of the Single Subject Randomization Design, a phase design and an alternation design. The accepted manuscript is available but behind a paywall at the Journal of Communication Disorders. In another post I will provide a mini-tutorial showing how the alternation design could be used to answer a clinical question about a single client.

Further Reading

Ebbels, Susan H. 2017. ‘Intervention research: Appraising study designs, interpreting findings and creating research in clinical practice’, International Journal of Speech-Language Pathology: 1-14.

Kratochwill, Thomas R., and Joel R. Levin. 2010. ‘Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue’, Psychological Methods, 15: 124-44.

Romeiser Logan, L., R. Hickman, R.R. Harris, S.R. Harris, and C. Heriza. 2008. ‘Single-subject research design: recommendations for levels of evidence and quality rating’, Developmental Medicine and Child Neuroloogy, 50: 99-103.

Rvachew, S. 1988. ‘Application of single subject randomization designs to communicative disorders research’, Human Communication Canada (now Canadian Journal of Speech-Language Pathology and Audiology), 12: 7-13. [open access]

Rvachew, S. 1994. ‘Speech perception training can facilitate sound production learning.’, Journal of Speech and Hearing Research, 37: 347-57.

Rvachew, Susan, and Tanya Matthews. in press. ‘Demonstrating Treatment Efficacy using the Single Subject Randomization Design: A Tutorial and Demonstration’, Journal of Communication Disorders.


Auditory Motor Integration Intervention for CAS

In March 2013 I described the research we are conducting in my lab to identify individual differences in response to two different approaches to the treatment of Childhood Apraxia of Speech. I also described the unique single subject randomization design that we are using and presented some data for one child without revealing the interventions that corresponded to the condition that worked best for this particular child. We have subsequently replicated this result with another child so today I am going to write about the features of the intervention that children with difficulties in the area of transcoding appear to benefit from most clearly. Recall that transcoding is revealed in part by addition errors on the Syllable Repetition Task. In the case of the child profiled in the previous blog, he added nasal consonants at syllable boundaries when asked to repeat the syllable strings and he was just as likely to do this for short strings as for long, e.g., “mada” → [bᴂndə] and “manabada” → [mandabad]. This child also had difficulty with multisyllable repetition during the maximum performance tests but no difficulty with the single syllable diadochokinetic rate. Within word inconsistency was borderline with inconsistent word productions largely reflecting single feature errors (voicing errors for example). Altogether the impression is of a true apraxia or motor planning disorder (as opposed to a phonological planning deficit, a more common problem that I will describe in a future post). Thus far we have assessed 18 children in this study and remarkably only 3 have presented with this particular profile.

Two of these children have shown the best response to an intervention that is directed at promoting auditory-motor integration. It includes input-oriented procedures that are described in Chapter 9 of my book combined with output-oriented procedures described in Chapter 10. The procedures are used to promote the consistent use of stimulable phonemes in the context of word shapes that are difficult for the child so that the focus is more on holistic movement patterns at the whole word level than on individual phonemes. In the case described here we taught novel “monster names” that had a strong-weak-strong stress pattern and word internal coda consonants such as “Biftenope” and “Hapnidreem” and assessed for carry-over to phrases with similar structures (pumpkin pie, bat mobile). 

One reason that we designed an intervention approach that focused on auditory-motor integration is that there is evidence from the animal literature suggesting that this might be a foundational problem in the case of apraxia. Kurt, Fisher and Ehret examined sensory-motor association learning in mice with two different FoxP2 mutations. The task involved learning to avoid electronic shock by leaping a hurdle (or not) to the other compartment of a box in response to varied tones that signaled the location of the shock. Mice with either mutation were impaired in their response, one more severe that the other, in comparison to wild-type mice that learned the task without difficulty. The second reason that we designed an intervention with an auditory-motor integration component is that the ability to modify motor plans in response to auditory feedback and in relation to an auditory target is theoretically essential to the acquisition of speech motor control.

So what does an intervention that focuses on auditory-motor integration look like? Not surprisingly it has procedures that focus attention on the auditory-perceptual aspects of speech as well as procedures that focus on motor practice, none of the procedures themselves being novel or surprising. During the prepractice portion of each treatment we ensured that the child had a good perceptual representation for the target words using auditory bombardment and focused stimulation in meaningful contexts as well as error detection tasks as described in my teaching blog (scroll down to week 22). We also taught the child to monitor his own speech and respond differentially to his own correct or incorrect productions of the target words. For example an appropriate activity might be for the child “call” the monster and to then place the monster in his sleeping bag in the tent if he heard himself produce the name correctly or to place the monster in an alternative sleeping bag out in the rain if he heard himself produce the name incorrectly (our students are endlessly creative and this variation on the game has proved to be popular with the children this year).  The practice part of the session, for the most part, proceeds as one would expect for any child with CAS, focusing on high intensity practice while the SLP provides just enough stimulation prior to each attempt to elicit a correct response more often than not. However, every effort is made to avoid providing too much feedback. Working in blocks of five trials each, summative knowledge of results is provided whenever possible – this means that the child is given an opportunity to evaluate his own responses in relation to his own auditory goal without interference from SLP input, and then compare his own judgment with the SLPs count of correct responses at the end of each 5 trial run. Edy Strand writes about the importance of giving the child time to integrate feedback in her chapter with Derbertine in Caruso and Strand (1999) and describes precisely how to do this. Given a high rate of responses (over 100 trials per 20 minute practice session) and an average of 70% correct responses, this child was able to make excellent progress as measured by both same day and next day probes (see green bars on his chart here). A second child with the same profile also showed a significant benefit in favour of this approach. A third child is still being treated and it will be some time before we will know if he completes the protocol and then many more months before blind coding of his results will be finished. But, we are hopeful!

Single Subject Randomization Design for CAS Intervention Research

I have recently returned from the very excellent Childhood Apraxia of Speech Symposium sponsored by the Childhood Apraxia of Speech Association of North America and held in Atlanta last month. The scientific presentations were wonderful and I hope to have posts related to many of them over the next few months. I begin by highlighting Larry Shriberg’s presentation as it relates to my current CASANA funded intervention study and I am, with some excitement, analyzing the data from the first cohort of participants this week since it is our winter break from teaching.

Dr. Shriberg presented data recently published in Clinical Linguistics and Phonetics (Shriberg, Lohmeier, Strand & Jakielski, 2012). In this paper the authors describe the use of the Syllable Repetition Task (SRT) for the identification of CAS. The paper, the test, and all the information you need for scoring and interpreting the test data is available for download at The Phonology Project website. The SRT consists of 18 items comprised of two to four syllables made up of the consonants /m, n, b, d/ and the vowel /ɑ/ and thus it is designed explicitly for children with speech delay. The task was administered to 4 quite large samples of children: Group 1, Typical Speech, Typical language; Group 2, Speech Delay, Typical Language; Group 3, Speech Delay, Language Impairment; and Group 4, CAS with this last group subdivided into idiopathic and neurogenetic etiological subtypes for some analyses. The test results were presented in the form of four scores: Competence, total percentage of correctly repeated consonants overall; Encoding Processes, percentage of within-class manner substitutions; Memorial Processes, ratio of sounds correct in 3-syllable-versus-2-syllable items; Transcoding processes, percentage of items containing one or more addition errors, subtracted from 100 for directional clarity. Most interestingly, the latter three scores were not correlated with each other within any of the groups although they were all moderately correlated with the competence score. The CAS group showed worse performance than the other three groups on all of these measures although their performance on the Transcoding processes measure was most distinctive. The diagnostic usefulness of the Transcoding score is much enhanced by also considering aspects of the children’s prosody in connected speech (inappropriate pauses, slow rate, lexical or phrasal stress errors). In conclusion, these findings were taken as evidence that CAS is a multiple domain disorder with low encoding scores reflecting incomplete or poorly formed phonological representations, low memorial scores reflecting difficulties with phonological memory, and low transcoding scores reflecting a motor planning/programming deficit. Given that the paper presents group data, and that the encoding, memorial and transcoding scores are not correlated with each other, it is not clear however that all children with CAS will show difficulties in all of these areas. It seems possible if not likely that there will be considerable heterogeneity within this population with different children showing variant profiles across these three speech processes. The purpose of our study is to consider this heterogeneity by examining response to three interventions in individual subjects.

In a previous post I mentioned an alternative to traditional single subject designs that does not require a stable baseline while allowing for statistical analysis. We are using one form of this design in this study, the single subject randomization design, more specifically set up as a randomized block experiment as described in my paper on the application of these designs to communication disorders research (Rvachew, 1988). We have six children participating in the study this winter and 3 more enrolled for the spring. I provide partial data for one child in this post simply as a way of demonstrating the usefulness of this design for research with low incidence disorders. The child is school age with borderline verbal and nonverbal IQ, speech delay, and ADHD. Apraxia of speech was confirmed by administration of the Kaufman Speech Praxis Test and maximum performance tasks revealing normal single syllable repetition rates but an inability to sequence three syllables consistently and at a normal rate. The results of the Syllable Repetition Task indicated an extremely low competence score despite encoding and memorial processing within the average range for his age. He did have difficulties with transcoding however as indicated by the characteristic addition of nasal consonants.

Three speech targets were selected for this boy: word internal codas, word-initial /l/ clusters, and word initial velar stops (with baseline performance in single word naming being 50, 29, and 33 percent correct respectively). All targets were addressed via pseudowords linked to nonsense referents in a functional context. All targets received 20 minutes of concentrated practice per week using the integral stimulation hierarchy as described by Christine Gildersleeve-Neuman. However, the prepractice condition (which was implemented for 20 minutes prior to the practice session) varied for each target. The three prepractice conditions being compared in this study were randomly assigned to the targets with the following result: word internal codas were treated using input oriented prepractice procedures, word-internal /l/ clusters were associated with sham prepractice procedures (control condition) and velar stops were treated with output oriented prepractice conditions. The input oriented prepractice conditions included auditory bombardment and error detection tasks as described by Rvachew and Brosseau-Lapre (see also Chapter 9 of our book, The output oriented procedures are described by Dodd and colleagues for improving the child’s ability to independently build a phonological plan for the word by linking syllables and phonemes to graphical cues and then chaining the subword units. Phonetic placement was also incorporated into this condition as needed.

Raw Session and Next Day Probe Scores for One Child By Treatment Condition

Raw Session and Next Day Probe Scores for One Child By Treatment Condition

In-keeping with the randomized block design, the child received three treatment sessions per week, with each treatment condition/treatment target pair assigned at random to one of the three days on a week by week basis. Two outcome measures were recorded: the child’s responses to imitative phrase probes that were administered at the end of the session to assess learning during a given intervention session, and the child’s responses to imitative phrase probes that were administered at the beginning of the next session to assess maintenance of learning. The child’s performance on these probes is shown on the figure below: pastel bars are the session probes indexing session performance and solid bars are the next day probes indexing maintenance of learning to the next session. Different colours represent different prepractice conditions. These probe scores were submitted to a nonparametric randomization test as described in Rvachew (1988) with the results indicating that there was no difference in probe performance at the end of each session as a function of prepractice condition, F(2,5) = 1.19, p = .392. However, there is a significant effect of prepractice condition when considering next day probe performance, F(2,5) = 23.01, p = .002. Now, I am going to make you crazy by not revealing which prepractice condition is associated with each colour! The reason is that this is just one child and I want to see the results for the other children –  I have observed the responses of the other children and have reason to believe that in fact there are differences in actual learning as a function of prepractice condition but we will feel more confident after having blinded transcriptions of probe data from more children. It should be obvious with this design that there are many other variables that can influence the outcome such as intrinsic differences in the difficulty of the targets, differences associated with the days of the week, and differences in clinician (although some of the same people were in the room during every session, the treating clinician was not the same during every session). Therefore we need to replicate the result many times before we feel confident interpreting these results. However, I wanted to introduce readers to the SRT, the notion of CAS as a multiple domain disorder, and the single subject randomization design as a way of looking at the relationship between response to intervention and underlying psycholinguistic profile. I hope that you will stay tuned – we hope to take data from the first six children to ASHA13.

Single Subject Designs and Evidence Based Practice in Speech Therapy

I was really happy to see the tutorial on Single Subject Experimental designs in November’s issue of the American Journal of Speech-Language Pathology and Audiology, by Byiers, Reichle, and Symons. The paper does not really present anything new since it covers ground previously published by authors such as Kearns (1986). However, with the current focus on RCTs as the be-all and end-all for evidence based practice, it was a timely reminder that single-subject designs have a lot to offer for EPB in speech therapy. It really irritates me when I see profs tell their students that speech therapy practice does not have an evidentiary base: many of our standard practices are well grounded in good quality single subject research (not to mention some rather nice RCTs from the sixties as well but that is another story, maybe for another post).

Byiers et al. do a nice job of outlining the primary features of a valid single-subject experiment. The internal validity of the standard designs is completely dependent upon the stable baseline with no improving trend in the data prior to the introduction of the treatment. They indicate that “by convention, a minimum of three baseline data points are required to establish dependent measure stability.” Furthermore, it is essential to not see carry-over effects of treatment of one target to the second target prior to the introduction of treatment for the second target; in other words, performance on any given target must remain stable until treatment for that specific target is introduced. The internal validity of the experiment is voided when stable baselines for each target are not established and maintained throughout their respective baseline periods. This is true even for the multiple-probe design which is a variation on the multiple-baseline design in which the dependent measure is sampled at irregular intervals tied to the introduction of successive phases of the treatment program (as opposed to regular and repeated measurement  that occurs during each and every session of a multiple baseline design). Even with the multiple probe design, a series of closely spaced baseline probes are required at certain intervals to demonstrate stability of baselines just before you begin a new treatment phase. Furthermore, the design is an inappropriate choice unless a “strong a priori assumption of stability can be made” (see Horner and Baer, 1978).

I am interested in the multiple probe design because it is the preferred design of the research teams that claim that the “complexity approach” to target selection in phonology interventions is effective and efficient. However, it is clear that the design is not appropriate in this context (in fact, given the research question, I would argue that all single subject designs are inappropriate in this context).  The reasoning behind the complexity approach is that treating complex targets results in generalization of learning to less complex targets. This is supposed to be more efficient than treating the less complex targets first because these targets are expected to improve spontaneously without treatment (e.g., as a result of maturation) while not resulting in generalization to more complex targets. The problem of course is that improvements in less complex targets while you are treating a more complex one (especially when you get no improvement on the treatment target, see Cummings and Barlow, 2011) cannot be interpreted as a treatment effect. By the logic of a single-subject experiment, this outcome indicates that you do not have experimental control. To make matters worse, these improvements in generalization targets are often observed prior to the introduction of treatment –  and indeed the a priori assumption is that these improvements in less complex targets will occur without treatment – that is the whole rationale behind avoiding them as treatment targets! And therefore, by definition, both the multiple baseline and multiple probe designs are invalid approaches to the test of the complexity hypothesis. Without a randomized control trial one can only conclude that the changes observed in less complex targets in these studies are the result of maturation or history effects. (If you want to see what happens when you test the efficacy of the complexity approach using a randomized control trial, check out my publications: Rvachew & Nowak, 2001; Rvachew & Nowak, 2003; Rvachew, 2005; Rvachew & Bernhardt, 2010).

Some recent single subject studies have had some really nice outcomes for some children. Ballard, Robin and McCabe (2010) demonstrated an effective treatment for improving prosody in children with apraxia of speech, showing that work on pseudoword targets generalizes to real word dependent measures. Skelton (2004) showed that you can literally randomize your task sequence and get excellent results for the treatment of /s/ with carryover to the nonclinic environment (in other words you don’t have to follow the usual isolation-syllable- word-phrase-sentence sequence; rather, you can mix it up by practicing items with random difficulty level on every trial). Both of these studies showed uneven outcomes for different children however. Francoise and I suggested at ASHA2012 that the “challenge point framework” helps to explain variability in outcomes across children. The trick is to teach targets that are at the challenge point for the child – not uniformly complex but carefully selected to be neither too simple nor too complex for each individual child.

Both of these studies (Ballard et al. and the Skelton study) used a multiple baseline design. This design tends to encourage the selection of complex targets because consistent 0% correct is as stable as you can get in a baseline. If you want to pick targets that are at the “challenge point” you may be working on targets for which the child is demonstrating less stable performance. Fortunately there is a single subject design that does not require a stable baseline for internal validity – it is called a single subject randomization design. We are using two different variations on this design in our current study of different treatments for childhood apraxia of speech. I will describe our application of the design in another post.