How to score iPad SAILS

As the evidence accrues for the effectiveness of SAILS as a tool for assessing and treating children’s (in)ability to perceive certain phoneme contrasts (see blog post on the evidence here), the popularity of the new iPad SAILS app is growing. Now I am getting questions about how to score the new SAILS app on the iPad so I provide a brief tutorial here. The norms are not built into the app since most of the modules are not normed. However, four of the modules are associated with normative data and can be used to give a sense of whether children’s performance is within the expected range according to age/grade level. Those normative data have been published in our text “Developmental Phonological Disorders: Foundations of Clinical Practice” (derived from the sample described in Rvachew, 2007) but I reproduce the table here and show how to use it.

When you administer the modules lake, cat, rat and Sue you will be provided with an overall Level score for all the Levels in each module as well as item by item scores on the Results page. As an example, I show the results page below after administering the  rat module.

SAILS results screenshot rat

The screen shot shows the item-by-item performance on the right hand side for Level 2 of the rat module. On the left hand side we can see that the total score for Level 2 was 7/10 correct responses and the total score for Level 1 was 9/10 correct responses (we ignore responding to the Practice Level). To determine if the child’s perception of “r” is within normal limits, average performance across Levels 1 and 2: [(9+7)/20]*100 = 80% correct responses. This score can be compared to the normative data provided in Table 5-7 of the second edition of the DPD text, as reproduced below:

SAILS Norms RBL 2018

Specifically a z-score should be calculated: (80-85.70)/12.61 = -.45. In other words, if the child is in first grade, the z score is calculated by taking the obtained score of 80% minus the expected score of 85.70% and dividing the result by the standard deviation of 12.61 which gives a z score that is less than one standard deviation below the mean. Therefore, we are not concerned about this child’s perceptual abilities for the “r” sound. When calculating these scores, observe that some modules have one test level, some have two and some have three. Therefore the average score is sometimes based on 10 total responses, sometimes on 20 total responses as shown here, and sometimes on 30 total responses.

The child’s total score across the four modules lake, cat, rat and Sue can be averaged (ignoring all the practice levels) and compared against the means in the row labeled “all four”. Typically you want to know about the child’s performance on a particular phoneme however because generally children’s perceptual difficulties are linked to those phonemes that they misarticulate.

Normative data has not been obtained for any of the other modules. Typically however, a score of 7/10 or less than 7/10 is not a good score – a score this low suggests guessing or not much better than guessing given that this is a two alternative forced choice task.

Previously we have found that children’s performance on this test is useful for treatment planning in that children with these speech perception problems will achieve speech accuracy faster when the underlying speech perception problem is treated. Furthermore, poor overall speech perception performance  in children with speech delay is associated with slower development of phonological awareness and early reading skills.

I hope that you and your clients enjoy the SAILS task which can be found on the App Store, with new modules uploaded from time to time: https://itunes.apple.com/ca/app/sails/id1207583276?mt=8

 

Advertisements

Single Subject Designs and Evidence Based Practice in Speech Therapy

I was really happy to see the tutorial on Single Subject Experimental designs in November’s issue of the American Journal of Speech-Language Pathology and Audiology, by Byiers, Reichle, and Symons. The paper does not really present anything new since it covers ground previously published by authors such as Kearns (1986). However, with the current focus on RCTs as the be-all and end-all for evidence based practice, it was a timely reminder that single-subject designs have a lot to offer for EPB in speech therapy. It really irritates me when I see profs tell their students that speech therapy practice does not have an evidentiary base: many of our standard practices are well grounded in good quality single subject research (not to mention some rather nice RCTs from the sixties as well but that is another story, maybe for another post).

Byiers et al. do a nice job of outlining the primary features of a valid single-subject experiment. The internal validity of the standard designs is completely dependent upon the stable baseline with no improving trend in the data prior to the introduction of the treatment. They indicate that “by convention, a minimum of three baseline data points are required to establish dependent measure stability.” Furthermore, it is essential to not see carry-over effects of treatment of one target to the second target prior to the introduction of treatment for the second target; in other words, performance on any given target must remain stable until treatment for that specific target is introduced. The internal validity of the experiment is voided when stable baselines for each target are not established and maintained throughout their respective baseline periods. This is true even for the multiple-probe design which is a variation on the multiple-baseline design in which the dependent measure is sampled at irregular intervals tied to the introduction of successive phases of the treatment program (as opposed to regular and repeated measurement  that occurs during each and every session of a multiple baseline design). Even with the multiple probe design, a series of closely spaced baseline probes are required at certain intervals to demonstrate stability of baselines just before you begin a new treatment phase. Furthermore, the design is an inappropriate choice unless a “strong a priori assumption of stability can be made” (see Horner and Baer, 1978).

I am interested in the multiple probe design because it is the preferred design of the research teams that claim that the “complexity approach” to target selection in phonology interventions is effective and efficient. However, it is clear that the design is not appropriate in this context (in fact, given the research question, I would argue that all single subject designs are inappropriate in this context).  The reasoning behind the complexity approach is that treating complex targets results in generalization of learning to less complex targets. This is supposed to be more efficient than treating the less complex targets first because these targets are expected to improve spontaneously without treatment (e.g., as a result of maturation) while not resulting in generalization to more complex targets. The problem of course is that improvements in less complex targets while you are treating a more complex one (especially when you get no improvement on the treatment target, see Cummings and Barlow, 2011) cannot be interpreted as a treatment effect. By the logic of a single-subject experiment, this outcome indicates that you do not have experimental control. To make matters worse, these improvements in generalization targets are often observed prior to the introduction of treatment –  and indeed the a priori assumption is that these improvements in less complex targets will occur without treatment – that is the whole rationale behind avoiding them as treatment targets! And therefore, by definition, both the multiple baseline and multiple probe designs are invalid approaches to the test of the complexity hypothesis. Without a randomized control trial one can only conclude that the changes observed in less complex targets in these studies are the result of maturation or history effects. (If you want to see what happens when you test the efficacy of the complexity approach using a randomized control trial, check out my publications: Rvachew & Nowak, 2001; Rvachew & Nowak, 2003; Rvachew, 2005; Rvachew & Bernhardt, 2010).

Some recent single subject studies have had some really nice outcomes for some children. Ballard, Robin and McCabe (2010) demonstrated an effective treatment for improving prosody in children with apraxia of speech, showing that work on pseudoword targets generalizes to real word dependent measures. Skelton (2004) showed that you can literally randomize your task sequence and get excellent results for the treatment of /s/ with carryover to the nonclinic environment (in other words you don’t have to follow the usual isolation-syllable- word-phrase-sentence sequence; rather, you can mix it up by practicing items with random difficulty level on every trial). Both of these studies showed uneven outcomes for different children however. Francoise and I suggested at ASHA2012 that the “challenge point framework” helps to explain variability in outcomes across children. The trick is to teach targets that are at the challenge point for the child – not uniformly complex but carefully selected to be neither too simple nor too complex for each individual child.

Both of these studies (Ballard et al. and the Skelton study) used a multiple baseline design. This design tends to encourage the selection of complex targets because consistent 0% correct is as stable as you can get in a baseline. If you want to pick targets that are at the “challenge point” you may be working on targets for which the child is demonstrating less stable performance. Fortunately there is a single subject design that does not require a stable baseline for internal validity – it is called a single subject randomization design. We are using two different variations on this design in our current study of different treatments for childhood apraxia of speech. I will describe our application of the design in another post.