Is Acoustic Feedback Effective for Remediating “r” Errors?

I am very pleased to see a third paper published in the speech-language pathology literature using the single-subject randomization design that I have described in two tutorials, the first in 1988 and the second more recently. Tara McAllister Byun used the design to investigate the effectiveness of acoustic biofeedback treatment to remediate persistent “r” errors in 7 children aged 9 to 15 years. She used the single subject randomized alternation design with block randomization, including a few unique elements in her implementation of the design. She and her research team provided one traditional treatment session and one biofeedback treatment session each week for ten weeks. However the order of the traditional and biofeedback sessions was randomized each week. Interestingly, each session targeted the same items (i.e., “r” was the speech sound target  in both treatment conditions): rhotic vowels were tackled first and consonantal “r” was introduced later, in a variety of phonetic contexts. (This procedure is a variance from my experience in which, for example, Tanya Matthews and I randomly assign different targets to different treatment conditions). Another innovation is the outcome measure: a probe constructed of untreated “r” words was given at the beginning and end of each session so that change (Mdif) over the session was the outcome measure submitted to statistical analysis (our tutorial explains that the advantage of the SSRD is that a nonparametric randomization test can be used to assess the outcome of the study, yielding a p value).  In addition, 3 baseline probes and 3 maintenance probes were collected so that an effect size for overall improvement could be calculated. In this way there are actually 3 time scales for measuring change in this study: (1) change from baseline to maintenance probes; (2) change from baseline to treatment performance as reflected in the probes obtained at the beginning of each session and plotted over time; and (3) change over a session, reflected in the probes given at the beginning and the end of each session. Furthermore, it is possible to compare differences in within session change for sessions provided with and without acoustic feedback.

I was really happy to see the implementation of the design but it is fair to say that the results were a dog’s breakfast, as summarized below:

Byun 2017 acoustic biofeedback

The table indicates that two participants (Piper, Clara) showed an effect of biofeedback treatment and generalization learning. Both showed rapid change in accuracy overall after treatment was introduced in both conditions and maintained at least some of that improvement after treatment was withdrawn. Garrat and Ian showed identical trajectories in the traditional and biofeedback conditions with a late rise in accuracy during treatment session, large within session improvements during the latter part of the treatment period, and good maintenance of those gains. Neither boy achieved 60% correct responding however at any point in the treatment program. Felix, Lucas and Evan demonstrated no change in probe scores across the twenty weeks of the experiment in both conditions. Lucas started at a higher level and therefore his probe performance is more variable: because he actually showed a within session decline during traditional sessions while showing stable performance within biofeedback sessions, the statistics indicate a treatment effect in favour of acoustic biofeedback but in fact no actual gains are observed.

So, this is a long description of the results that brings me to two conclusions: (1) the alternation design was the wrong choice for the hypothesis in these experiments; and (2) biofeedback was not effective for these children; even in those cases where it looks like there was an effect, the children were responsive to both biofeedback and the traditional intervention.

In a previous blog, I described the alternation design; there is another version of the single subject randomization design that would be more appropriate for Tara’s hypothesis however.  The thing about acoustic biofeedback is that it is not fundamentally different from traditional speech therapy, involving a similar sequence of events: (i) SLP says a word as an imitative model; (ii) child imitates the word; (iii) SLP provides informative or corrective feedback. In the case of incorrect responses in the traditional condition in Byun’s study, the SLP provided information about articulatory placement and reminded the child that the target involved certain articulatory movements (“make the back part of your tongue go back”). In the case of incorrect responses in the acoustic biofeedback condition, the SLP made reference to the acoustic spectrogram when providing feedback and reminded the child that the target involved certain formant movements (“make the third bump move over”). Firstly, the first two steps are completely overlapping in both conditions and secondly it can be expected that the articulatory cues given in the traditional condition will be remembered and their effects will carry-over into the biofeedback sessions. Therefore we can consider the acoustic biofeedback to be an add-on to traditional therapy. We want to know about the value added. Therefore the phase design is more appropriate: in this case, there would be 20 sessions (2 per week over 10 weeks as in Byun’s study), each session would be planned with the same format: beginning probe (optional), 100 practice trials with feedback, ending probe. The difference is that the starting point for the introduction of acoustic biofeedback would be selected at random. All the sessions that precede the randomly selected start point would be conducted with traditional feedback and all the remainder would be conducted with acoustic biofeedback. The first three would be designated as traditional and the last 3 would be designated as biofeedback for a 26 session protocol as described by Byun. Across the 7 children this would end up looking like a multiple baseline design except that (1) the duration of the baseline phase would be determined by random selection for each child; and (2) the baseline phase is actually the traditional treatment with the experimental phase testing the value added benefit of biofeedback. There are three possible categories of outcomes: no change after introduction of the biofeedback, an immediate change, or a late change. As with any single subject design, the change might be in level, trend or variance and the test statistic can be designed to capture any of those types of changes. The statistical analysis asks whether the obtained test statistic is bigger than all possible results given all of the possible random selection of starting points. Rvachew & Matthews (2016) provides a more complete  explanation of the statistical analysis.

I show below an imaginary result for Clara, using the data presented for her in Byun’s paper, as if the traditional treatment came first and then the biofeedback intervention. If we pretend that the randomly selected start point for the biofeedback intervention occurred exactly in the middle of the treatment period, the test statistic is the difference of the M(bf) and the M(trad) scores resulting in -2.308. All other possible random selections of starting points for intervention lead to 19 other possible mean differences, and 18 of them are bigger than the obtained test statistic leading to a p value of 18/20 = .9. In this data set the probe scores are actually bigger in the earlier part of the intervention when the traditional treatment is used and they do not get bigger when the biofeedback is introduced. These are the beginning probe scores obtained by Clara but Byun obtained a significant result in favour of biofeedback by block randomization and by examining change across each session. However, I am not completely sure that the improvements from beginning to ending probes are a positive sign—this result might reflect a failure to maintain gains from the previous session in one or the other condition.

Hypothetical Clara in SSR Phase Design

There are several reasons to think that both interventions that were used in Byun’s study might result in unsatisfactory generalization and maintenance. We discuss the principles of generalization in relation to theories of motor learning in Developmental Phonological Disorders: Foundations of Clinical Practice. One important principle is that the child needs a well-established representation of the acoustic-phonetic target. All seven of the children in Byun’s study had poor auditory processing skills but no part of the treatment program addressed phonological processing, phonological knowledge or acoustic phonetic representations. Second, it is essential to have the tools to monitor and use self-produced feedback (auditory, somatosensory) to evaluate success in achieving the target. Both the traditional and the biofeedback intervention put the child in the position of being dependent upon external feedback. The outcome measure focused attention on improvements from the beginning of the practice session to the end. The first principle of motor learning is that practice performance is not an indication of learning however.  The focus should have been on the sometimes large decrements in probe scores from the end of one session to the beginning of the next. The children had no means of maintaining any of those performance gains. Acoustic feedback may be a powerful means of establishing a new response but it is a counterproductive tool for maintenance and generalization learning.

Reading

McAllister Byun, T. (2017). Efficacy of Visual–Acoustic Biofeedback Intervention for Residual Rhotic Errors: A Single-Subject Randomization Study. Journal of Speech, Language, and Hearing Research, 60(5), 1175-1193. doi:10.1044/2016_JSLHR-S-16-0038

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders, 67, 1-13. doi:https://doi.org/10.1016/j.jcomdis.2017.04.003

 

Advertisements

How effective is phonology treatment?

Previously I asked whether it made sense to calculate effect sizes for phonology therapy at the within subject level. In other words, from the clinical point of view, do we really want to know whether the child’s rate of change is bigger during treatment than it was when the child was not being treated? Or, do we want to know if the child’s rate of change is bigger than the average amount of change observed among groups of children who get treated? If children who get treated typically change quite a bit and your client is not changing much at all, that might indicate a course correction (and note please, not a treatment rest!). From this perspective, group level effect sizes might be useful so I am providing raw and standardized effect sizes here from three of my past studies with a discussion to follow.

Rvachew, S., & Nowak, M. (2001). The effect of target selection strategy on sound production learning. Journal of Speech, Language, and Hearing Research, 44, 610-623.

The first data set involves 48 four-year-old children who scored at the second percentile, on average, on the GFTA (and 61 percent consonants correct in conversation). They were randomly assigned to receive treatment for relatively early developing stimulable sound targets (ME group, n=24) or late developing unstimulable sound targets (LL group, n=24). Each received treatment for four sounds over 2 six-week blocks, during 12 30 to 40 minute treatment sessions. The treatment approach employed traditional articulation therapy procedures. The children did not receive homework or additional speech and language interventions during this 12 week period. Outcome measures included single word naming probes covering all consonants in 3 word positions and percent consonants correct (PCC) in conversation, with 12 to 14 weeks intervening between the pre- and the post-test assessments. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) which indicates the mean pre- to post-change in percent consonants corrects on probes and in conversation; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for three outcome measures: single word naming probe scores for unstimulable phonemes, probe scores for stimulable phonemes, and percent consonants correct (PCC) obtained from conversations recorded while the child looked at a wordless picture book with the assessor.Effect size blog figure 2.

Some initial conclusions can be drawn from this table. The effect sizes for change in probe scores are all large. However, the group that received treatment for stimulable sounds showed greater improvement for both treated stimulable sounds and untreated unstimulable sounds compared to the group that received treatment for unstimulable sounds. There was almost no change in PCC derived from the conversational samples overall. I can report that 10 children in the ME group and 6 children in the LL group achieved improvements of greater than 5 PCC points, judged to be a “minimally important change”  by Thomas-Stonell et al. (2013). Half the children achieved no change at all however in PCC (conversation).

Rvachew, S., Nowak, M., & Cloutier, G. (2004). Effect of phonemic perception training on the speech production and phonological awareness skills of children with expressive phonological delay. American Journal of Speech-Language Pathology, 13, 250-263.

The second data set involves 34 four-year-old children who scored at the second percentile, on average, on the GFTA (and approximately 60 percent consonants correct in conversation). All of the children received 16 hour-long speech therapy sessions, once-weekly. The treatment that they received was entirely determined by their SLP with regard to target selection and approach to intervention. Ten SLPs provided the interventions, 3 using the Hodson cycles approach, 1 a sensory motor approach and the remainder using a traditional articulation therapy approach. The RCT element of this study is that the children were randomly assigned to an extra treatment procedure that occurred during the final 15 minutes of each session, concealed from their SLP. Children in the control group (n=17) listened to ebooks and answered questions. Children randomly assigned to the PA group (n=17) played a computer game that targeted phonemic perception and phonological awareness covering 8 phonemes in word initial and then word final position. Although the intervention lasted 4 months, the interval between pre-treatment and post-treatment assessments was 6 months long. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) indicates the mean pre- to post-change in percent consonants correct; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for two outcome measures: percent consonants correct (PCC) obtained from conversations recorded while the child looked at a wordless picture book with the assessor; and PCC-difficult, derived from the same conversations but restricted to phonemes that were produced with less than 60% accuracy at intake-in other words, phonemes that were potential treatment targets, specifically /ŋ,k,ɡ,v,ʃ,ʧ,ʤ,θ,ð,s,z,l,ɹ/.

Effect size blog figure 3

The sobering finding here is that the control group effect size for potential treatment targets is the smallest, with half the group making no change and the other half making a small change. The effect size for PCC (all) in the control group is more satisfying in that it is better than the minimally important change (i.e., 8% > 5%); 13 children in this group achieved a change of more than 5 points and only 3 made no change at all. The effect sizes are large in the group that received the Speech Perception/PA intervention in addition to their regular SLP program with good results for PCC (all) and PCC-difficult. This table shows that the SLP’s choice of treatment procedures makes a difference to speech accuracy outcomes.

Rvachew, S., & Brosseau-Lapré, F. (2015). A randomized trial of twelve week interventions for the treatment of developmental phonological disorder in francophone children. American Journal of Speech-Language Pathology, 24, 637-658. doi:10.1044/2015_AJSLP-14-0056

The third data set involves data from 64 French-speaking four-year-olds who were randomly assigned to receive either an output oriented intervention (n = 30) or an input-oriented intervention (n = 34) for remediation of their speech sound disorder. Another 10 children who were not treated also provide effect size data here. The children obtained PCC scores of approximately 70% on the Test Francophone de Phonologie, indicating severe speech sound disorder (consonant accuracy is typically higher in French-speaking children, compared to English). The children received other interventions as well as described in the research report (home programs and group phonological awareness therapy) with the complete treatment program lasting 12 weeks. The table below shows two kinds of effect sizes for the ME group and the LL group: the raw effect size (raw ES) with the associated confidence interval (CI) indicates the mean pre- to post-change in percent consonants correct; next is the standardized mean difference, Cohen’s d(z); finally, I show the number and percentage of children who did not change (0 and negative change scores). These effect sizes are shown for two outcome measures: percent consonants correct with glides excluded (PCC), obtained from the Test Francophone de Phonologie, a single word naming test; PCC-difficult, derived from the same test but restricted to phonemes that were produced with less than 60% accuracy at intake-specifically /ʃ,ʒ,l,ʁ/. An outcome measure restricted to phonemes that were absent from the inventory at intake is not possible for this group because French-speaking children with speech sound disorders have good phonetic repertoires for the most part as their speech errors tend to involve syllable structure (see Brosseau-Lapré and Rvachew, 2014).

Effectsize blog figure 4

There are two satisfying findings here: first, when we do not treat children with a speech sound disorder, they do not change, and when we do treat them, they do! Second, when children receive an appropriate suite of treatment elements, large changes in PCC can be observed even over an observation interval as short as 12 weeks.

Overall Conclusions

  1. In the introductory blog to this series, I pointed out that Thomas-Stonell and her colleagues had identified a PCC change of 5 points as a “minimally important change”. The data presented here suggests that this goal can be met for most children over a 3 to 6 months period when children are receiving an appropriate intervention. The only case where this minimum standard was not met on average was in Rvachew & Nowak (2001), a study in which a strictly traditional articulation therapy approach was implemented at low intensity with no homework component.
  2. The measure that we are calling PCC-difficult might be more sensitive and more ecologically valid for 3 and 6 month intervals. This is percent consonants correct, restricted to potential treatment targets, so those consonants that are produced with less than 60% accuracy at intake. These turn out to be mid- to late-developing frequently misarticulated phonemes, therefore /ŋ,k,ɡ,v,ʃ,ʧ,ʤ,θ,ð,s,z,l,ɹ/ in English and /ʃ,ʒ,l,ʁ/ in French for these samples of 4-year-old children with severe and moderate-to-severe primary speech sound disorders. My impression is that when providing an appropriate intervention an SLP should expect at least a 10% change in these phonemes whether assessed with a broad based single word naming probe or in conversation-in fact a 15% change is closer to the average. This does not mean that you should treat the most difficult sounds first! Look carefully at the effect size data from Rvachew and Nowak (2001): when we treated stimulable phonemes we observed a 15% improvement in difficult unstimulable sounds. You can always treat a variety of phonemes from different levels of the phonological hierarchy as described in a previous blog.
  3. Approximately 10% of 4-year-old children with severe and moderate-to-severe primary speech sound disorders do not improve at all over a 3 to 6 month period, given adequate speech therapy. If a child is not improving, the SLP and the parent should be aware that this is a rare event that requires special attention.
  4. In a previous blog I cited some research evidence for the conclusion that patients treated as part of research trials achieve better outcomes than patients treated in a usual care situation. There is some evidence for that in these data. The group in Rvachew, Nowak and Cloutier that received usual care obtained a lower effect size (d=0.45) in comparison to the group that received an extra experimental intervention (d=1.31). In practical terms this difference meant that the group that received the experimental intervention made four times more improvement in the production of difficult sounds than the control group that received usual care.
  5. The variation in effect sizes that is shown in these data indicate that SLP decisions about treatment procedures and service delivery options have implications for success in therapy. What are the characteristics of the interventions that led to relatively large changes in PCC or relatively large standardized effect sizes? (i) Comprehensiveness, that is the inclusion of intervention procedures that target more than one level of representation, e.g., procedures to improve articulation accuracy and speech perception skills and/or phonological awareness; and (ii) parent involvement, specifically the inclusion of a well-structured and supported home program.

If you see other messages in these data, or have observations from your own practice or research, please write to me in the comments.

 

 

Are effect sizes in research papers useful in SLP practice?

Effect size blog figure 1Effect sizes are now required in addition to statistical significance reporting in scientific reports. As discussed in a previous blog, effect sizes are useful for research purposes because they can be aggregated across studies to draw conclusions (i.e., as, in a meta-analysis). However, they are also intended to be useful as an indication of the “practical consequences of the findings for daily life.” Therefore, Gierut, Morrisette, & Dickinson’s paper “Effect Size for Single-Subject Design in Phonological Treatment” was of considerable interest to me when it was published in 2015. They report the distribution of effect sizes for 135 multiple baseline studies using a pooled standard deviation for the baseline phase of the studies as the denominator and the mean of the treatment phase minus the mean of the baseline phase as the numerator in the equation to calculate the effect size statistic. In these studies, the mean and the variance of probe scores in the baseline phase is restricted to be very small by design, because the treatment targets and generalization probe targets must show close to stable 0% correct performance during the baseline phase. The consequence of this restriction is that the effect size number will be very large even when the raw amount of performance change is not so great. Therefore the figure above shows hypothetical data that yields exactly their average effect size of 3.66 (specifically, [8.57%-1.25%]/.02 = 3.66). This effect size is termed a medium effect size in their paper but I leave it to the reader to decide if a change of not quite 9% accuracy in speech sound production is an acceptable level of change. It may be because in these studies, a treatment effect is operationalized as probe scores (single word naming task) for all the phonemes that were absent from the child’s repertoire at intake. From the research point of view this paper provides very important information: it permits researchers to compare effect sizes and explore variables that account for between-case differences in effect sizes in those cases where the researchers have used a multiple baseline design and treatment intensities similar to those reported in this paper (5 to 19 one-hour sessions typically delivered 3 times per week).

The question I am asking myself is whether the distribution of effect sizes that is reported in this paper is helpful to clinicians who are concerned with the practical significance of these studies. I ask this because I am starting to see manuscripts reporting clinical case studies in which the data are used to claim “large treatment effects” for a single case (using Gierut et al’s standard of an effect size of 6.32 or greater). Indeed, in the clinical setting SLPs will be asked to consider whether their clients are making “enough” progress. For example, in Rvachew and Nowak (2001) we asked parents to rate their agreement with the statement “My child’s communication skills are improving as fast as can be expected.” (This question was on our standard patient satisfaction questionnaire so in fact, we asked every parent this question, not just the ones in this RCT). But the parent responses in the RCT showed that there were significant between group differences in response to this question that aligned with the dramatic differences in child response to the traditional versus complexity approach to target selection that was tested in that study (e.g., 34% vs. 17% of targets mastered in these groups respectively). It seems to me that when a parent asks themselves this question they have multiple frames of reference: not only do they consider the child’s communicative competence before and after the introduction of therapy, they consider whether their child would make more or less change with other hypothetical SLPs and other treatment approaches, given that parents actually have choices about these things. Therefore, an effect size that says effectively, the child made more progress with treatment compared to no treatment is not really answering the parent’s question. However, with a group design it is possible to calculate an effect size that reflects change relative to the average amount of change one might expect, given therapy. To my mind this kind of effect size comes closer to answering the questions about practical significance that a parent or employer might ask.

This still leaves us with the question of what kind of change to describe. It is unfortunate that there are few if any controlled studies that have reported functional measures. I can think of some examples of descriptive studies that reported functional measures however. First, Campbell (1999) reported that good functional outcomes were achieved when preschoolers with moderate and severe Speech Delay received twice-weekly therapy over a 90- to 120-day period (i.e., on average the children’s speech intelligibility improved from approximately 50% to 75% intelligible as reported by parents). Second, there are a number of studies reporting ASHA-NOMS (functional communication measures provided by treating SLPs) for children receiving speech and language therapy. However, Thomas-Stonell et al (2007) found that improvement on the ASHA-NOMS was not as sensitive as parental reports of “real life communication change” over a 3 to 6 month interval. Therefore, Thomas-Stonell and her colleagues developed the FOCUS to document parental reports of functional outcomes in a reliable and standardized manner.

Thomas-Stonell et al (2013) report changes in FOCUS scores for 97 preschool aged children who received an average of 9 hours of SLP service in Canada, comparing change during the waiting period (60 day interval) to change during the treatment period (90 day interval). FOCUS assessments demonstrated significantly more change during treatment (about 18 FOCUS points on average) than during the wait period (about 6 FOCUS points on average). Then they compared minimally important changes in PCC, the Children’s Speech Intelligibility Measure, and FOCUS scores for 28 preschool aged children. The FOCUS measure was significantly correlated with the speech accuracy and intelligibility measures but there was not perfect agreement among these measures. For example, 21/28 children obtained a minimally important change of at least 16 points on the FOCUS but 4 of those children did not show significant change on PCC/CSIM. In other words speech accuracy, speech intelligibility and functional improvements are related but not completely aligned; each provides independent information about change over time.

In controlled studies, some version of percent consonants correct is a very common treatment outcome that is used  to assess the efficacy of phonology therapy. Gierut et al (2015) focused specifically on change in those phonemes that are late developing and produced with very low accuracy, if not completely absent from the child’s repertoire at intake. This strikes me as a defensible measure of treatment outcome. Regardless of whether one chooses to treat a complex sound, an early developing sound, a medium-difficulty sound (or one of each as I demonstrated in a previous blog), presumably the SLP wants to have dramatic effects across the child’s phonological system. Evidence that the child is adding new sounds to the repertoire is a good indicator of that kind of change. Alternatively the SLP might count increases in correct use of all consonants that were potential treatment targets prior to the onset of treatment. Or, the SLP could count percent consonants correct for all the consonants because this measure is associated with intelligibility and takes into account the fact that there can be regressions in previously mastered sounds when phonological reorganization is occurring. The number of choices suggests that it would be valuable to have effect size data for a number of possible indicators of change. More to the point, Gierut et al’s single subject effect size implies that almost any change above “no change” is an acceptable level of change in a population that receives intervention because they are stalled without it. I am curious to know if this is a reasonable position to take. In my next blog post I will report effect sizes for these speech accuracy measures taken from my own studies going back to 2001. I will also discuss the clinical significance of the effect sizes that I will aggregate. I am going to calculate the effect size for paired mean differences along with the corresponding confidence intervals for groups of preschoolers treated in three different studies. I haven’t done the calculations yet, so, for those readers who are at all interested in this, you can hold your breath with me.

References

Campbell, T. F. (1999). Functional treatment outcomes in young children with motor speech disorders. In A. Caruso & E. A. Strand (Eds.), Clinical Management of Motor Speech Disorders in Children (pp. 385-395). New York: Thieme Medical Publishers, Inc.

Gierut, J. A., Morrisette, M. L., & Dickinson, S. L. (2015). Effect Size for Single-Subject Design in Phonological Treatment. Journal of Speech, Language, and Hearing Research, 58(5), 1464-1481. doi:10.1044/2015_JSLHR-S-14-0299

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 1-12. doi:10.3389/fpsyg.2013.00863

Thomas-Stonell, N., McConney-Ellis, S., Oddson, B., Robertson, B., & Rosenbaum, P. (2007). An evaluation of the responsiveness of the pre-kindergarten ASHA NOMS. Canadian Journal of Speech-Language Pathology and Audiology, 31(2), 74-82.

Thomas-Stonell, N., Oddson, B., Robertson, B., & Rosenbaum, P. (2013). Validation of the Focus on the Outcomes of Communication under Six outcome measure. Developmental Medicine and Child Neuroloogy, 55(6), 546-552. doi:10.1111/dmcn.12123

Rvachew, S., & Nowak, M. (2001). The effect of target selection strategy on sound production learning. Journal of Speech, Language, and Hearing Research, 44, 610-623.

 

 

 

Maternal Responsiveness to Babbling

Over the course of my career the most exciting change in speech-language pathology practice has been the realization that we can have an impact on speech and language development by working with the youngest patients, intervening even before the child “starts to talk”. Our effectiveness with these young patients is dependent upon the growing body of research on the developmental processes that underlie speech development during the first year of life. Now that we know that the emergence of babbling is a learned behavior, influenced by auditory and social inputs, this kind of research has mushroomed although our knowledge remains constrained because these studies are hugely expensive, technically difficult and time consuming to conduct. Therefore I was very excited to see a new paper on the topic in JSLHR this month:

Fagan, M. K., & Doveikis, K. N. (2017). Ordinary Interactions Challenge Proposals That Maternal Verbal Responses Shape Infant Vocal Development. Journal of Speech, Language, and Hearing Research, 60(10), 2819-2827. doi:10.1044/2017_JSLHR-S-16-0005

The purpose of this paper was to examine the hypothesis that maternal responses to infant vocalizations are a primary cause of the age related change in the maturity of infant speech during the period 4 through 10 months of age. This time period encompasses three stages of infant vocal development: (1) expansion stage, that is producing vowels and a broad variety of vocalizations that are not speech like but nonetheless exercise vocal parameters such as pitch, resonance and vocal tract closures; (2) canonical babbling stage, that is producing speech like CV syllables, singly or in repetitive strings; and, (3) integrative stage, that is producing a mix of babbling and meaningful words. In the laboratory, contingent verbal responses from adults increase the production rate of mature syllables by infants. Fagan and Doveikis asked whether this shaping mechanism, demonstrated in the laboratory, explains the course of infant speech development in natural interactions in real world settings. They coded 5 and a quarter hours of natural interactions recorded between mothers and infants in the home environment from 35 dyads in a cross-sectional study. Their analysis focuses on maternal behaviors in the 3 second interval following an infant vocalization, defined as a speech-like vowel or syllable type utterance. They were specifically interested to know whether maternal vocalizations in this interval would be responsive (prompt, contingent, relevant to the infant’s vocal behavior, e.g., affirmations, questions, imitations) or nonresponsive (prompt but not meaningfully related to the infant’s vocal behavior, e.g., activity comment, unrelated comment, redirect). This is a summary of their findings:

  • Mothers vocalized 3 times more frequently than infants.
  • One quarter of maternal vocalizations fell within the 3 sec interval after an infant vocalization.
  • About 40% of the prompt maternal vocalizations were responsive and the remainder were nonresponsive, according to their definitions derived from Bornstein et al., 2008).
  • Within the category of responsive maternal vocalizations, the most common were questions and affirmations.
  • A maternal vocalization of some kind occurred promptly after 85% of all infant utterances.
  • Imitations of the infant utterance (also in the responsive category) occurred after approximately 11% of infant utterances (my estimate from their data).
  • Mothers responded preferentially to speech-like vocalizations but not differentially to CV syllables versus vowel-only syllables. In other words, it did not appear that maternal reinforcement or shaping of mature syllables could account for the emergence and increase in this behavior with infant age.

One reason I like this paper so much is that some of the results accord with data that we are collecting in my lab in a project coordinated by my doctoral student Pegah Athari who is showing great skill and patience, having worked her way through through 10 hours of recordings from 5 infants in a longitudinal study (3 months of recording from each infant but covering ages 6 through 14 months overall). The study is designed to explore mimicry specifically as a responsive utterance that may be particularly powerful (mimicry involves full or partial imitation of the preceding utterance). We want to be able to predict when mimicry will occur and to understand its function. In our study we examine the 2 second intervals that precede and follow each infant utterance. Another important difference is that we record the interactions in the lab but there are no experimental procedures, we arrange the setting and materials to support interactions that are as naturalistic as possible. These are some of our findings:

  • Mothers produced 1.6 times as many utterances as their infants.
  • Mothers said something after the vast majority of the infant’s vocalizations just as observed by Fagan and Doveikis.
  • Instances in which one member of the dyad produced an utterance that is similar to the other were rare, but twice as common in the direction of mother mimicking the infant (10%), compared to the baby mimicking the mother (5%).
  • Infant mimicry of the mother is significantly (but not completely) contingent on the mother modeling one of the infant’s preferred sounds in her utterance (mean contingency coefficient = .34).
  • Maternal mimicry is significantly (but not completely) contingent on perceived meaningfulness of the child’s vocalization (mean contingency coefficient = .35). In other words, it seems that the mother is not specifically responding to the phonetic character of her infant’s speech output; rather, she makes a deliberate attempt to teach meaningful communication throughout early development.
  • The number of utterances that the mother perceives to be meaning increase with the infant’s age although this is not a hard and fast rule because regressions occur when the infant is ill and the canonical babbling ratio declines. Mothers will also respond to nonspeechlike utterances in the precanonical stage as being meaningful (animal noises, kissing and so forth).

We want to replicate our findings with another 5 infants before we try to publish our data but I feel confident that our conclusions will be subtly different from Fagan and Doveikis’ despite general agreement with their suggestion that self-motivation factors and access to auditory feedback of their own vocal output plays a primary role in infant vocal development. I think that maternal behavior may yet prove to have an important function however. It is necessary to think about learning mechanisms in which low frequency random inputs are actually helpful. I have talked about this before on this blog in a post about the difference between exploration and exploitation in learning. Exploration is a phase during which trial and error actions help to define the boundaries of the effective action space and permit discovery of actions that are most rewarding. Without exploration one might settle on a small repertoire of actions that are moderately rewarding and never discover others that will be needed as your problems become more complex. Exploitation is the phase during which you use the actions that you have learned to accomplish increasingly complex goals.

The basic idea behind the exploration-exploitation paradox is that long term learning is supported by using an exploration strategy early in the learning process. Specifically, many studies have shown that more variable responding early in learning is associated with easier learning of difficult skills later in the learning process. For early vocal learning, the expansion stage corresponds to this principle nicely: the infant produces a broad variety of vocalizations—squeals, growls, yells, raspberries, vowels, quasiresonants, fully resonant vowels and combinations called marginal babbles. These varied productions lay the foundations for the production of speech like syllables during the coming canonical babbling stage. Learning theorists have demonstrated that environmental inputs can support this kind of free exploration. Specifically, a high reinforcement rate will promote a high response rate but it is important to reinforce variable responses early in the learning process.

In the context of mother-infant interactions, it may be that mothers are reinforcing many different kinds of infant vocalizations in the early stages because they are trying to teach words but the infant is not really capable of producing real words and she has to work with what she hears. She does do something after almost every infant utterance however so she encourages many different practice trials on the part of the infant. It is also possible (although not completely proven) that imitative responses on the part of the mother are particularly reinforcing to the infant. In the short excerpt of a “conversation” between a mum and her 11 month old infant shown here, it can be seen that she responds to every one of the infant’s utterances, encouraging a number of variable responses, specifically mimicking those that are most closely aligned with her intentions.

IDV11E03A EXCERPT

It is likely that when alone in the crib, the infant’s vocalizations will be more repetitive, permitting more specific practice of preferred phonetic forms such as “da” (infants are known to babble more when alone than in dyadic interactions, especially when scientists feed back their vocalizations over loud speakers). The thing is, the infant’s goals are not aligned with the mothers. In my view, the most likely explanation for infant vocal learning is self-supervised learning. The infant is motivated to produce specific utterances and finds achievement of those utterances to be intrinsically motivating. What kind of utterances does the infant want to produce? Computer models of this process have settled on two factors: salience and learning progress. That is, the infant enjoys producing sounds that are interesting and that are not yet mastered. The mother’s goals are completely different (teach real words) but her behaviors in this regard serve the infant’s goals nonetheless by: (1) supporting perceptual learning of targets that correspond to the ambient language; (2) encouraging sound play/practice by responding to the infant’s attempts with a variety of socially positive behaviors; (3) reinforcing variable productions by modeling a variety of forms and accepting a variety of attempts as approximations of meaningful utterances when possible; and (4) increasing the salience of speech-like utterances through mimicry of these rare utterances. The misalignment of the infant’s and the mother’s goals is helpful to the process because if the mother were trying to teach the infant specific phonetic forms (CV syllables for example), the exploration process might be curtailed prematurely and self-motivation mechanisms might be hampered.

What are the clinical implications of these observations? I am not sure yet. I need a lot more data to feel more confident that I can predict maternal behavior in relation to infant behavior. But in the meantime it strikes me that SLPs engage in a number of parent teaching practices that assume that responsiveness by the parent is a “good thing”. However, it is not certain that parents typically respond to their infant’s vocalizations in quite the ways that we expect. In the mean time, procedures to encourage vocal play are a valuable part of your tool box, as described in Chapter 10 of our book:

Rvachew, S., & Brosseau-Lapre, F. (2018). Developmental Phonological Disorders: Foundations of Clinical Practice (Second ed.). San Diego, CA: Plural Publishing, Inc.

 

Testing Client Response to Alternative Speech Therapies

Buchwald et al published one of the many interesting papers in a recent special issue on motor speech disorders in the Journal of Speech, Language and Hearing Research. In their paper they outline a common approach to speech production, one that is illustrated and discussed in some detail in Chapters 3 and 7 of our book, Developmental Phonological Disorders: Foundations of Clinical Practice. Buchwald et al. apply it in the context of Acquired Apraxia of Speech however. They distinguish between patients who produce speech errors subsequent to left hemisphere cardiovascular accident as a consequence of motor planning difficulties versus phonological planning difficulties. Specifically, in their study there are four such patients, two in each subgroup. Acoustic analysis was used to determine whether their cluster errors arose during phonological planning or in the next stage of speech production – during motor planning. The analysis involves comparing the durations of segments in triads of words like this: /skæmp/ → [skæmp], /skæmp/ → [skæm], /skæm/ → [skæm]. The basic idea is that if segments such as [k] in /sk/ → [k] or [m] in /mp/ → [m] are produced as they would be in a singleton context, then the errors arise during phonological planning; alternatively, if they are produced as they would be in the cluster context, then the deletion errors arise during motor planning. This leads the authors to hypothesize that patients with these different error types would respond differently to intervention. So they treated all four patients with the same treatment, described as “repetition based speech motor learning practice”. Consistent with their hypothesis, the two patients with motor planning errors responded to this treatment and the two with phonological planning errors did not as shown in the table of pre- versus post-treatment results.

Buchwald et al results corrected table

However, as the authors point out, a significant limitation of this study is that the design is not experimental. Having failed to establish experimental control either within or across speakers it is difficult to draw conclusions.

I find the paper to be of interest on two accounts nonetheless. Firstly, their hypothesis is exactly the same hypothesis that Tanya Matthews and I posed for children who present with phonological versus motor planning deficits. Secondly, their hypothesis is fully compatible with the application of a single subject randomization design. Therefore it provides me with an opportunity to follow through with my promise from the previous blog, to demonstrate how to set up this design for clinical research.

For her dissertation research, Tanya identified 11 children with severe speech disorders and inconsistent speech sound errors who completed our full experimental paradigm. These children were diagnosed with either a phonological planning disorder or a motor planning disorder using the Syllable Repetition Task and other assessments as described in our recently CJSLPA paper, available open access here. Using those procedures, we found that 6 had a motor planning deficit and 5 had a phonological planning deficit.

Then we hypothesized that the children with motor planning disorders would respond to a treatment that targeted speech motor control: much like Brumbach et al., it included repetition practice according to the principles of motor practice during the practice parts of the session but during prepractice, children were taught to identify the target words and to identify mispronunciations of the target words so that they would be better able to integrate feedback and self-correct during repetition practice. Notice that direct and delayed imitation are important procedures in this approach. We called this the auditory-motor integration (AMI approach).

For children with Phonological Planning disorders we hypothesized that they would respond to a treatment similar to the principles suggested by Dodd et al (i.e., see core vocabulary approach). Specifically the children are taught to segment the target words into phonemes, associating the phonemes with visual cues. Then we taught the children to chain the phonemes back together into a single word. Finally, during the practice component of each session, we encouraged the children to produce the words using the visual cues when necessary. An important component of this approach is that auditory-visual models are not provided prior to the child’s production attempt-the child is forced to construct the phonological plan independently. We called this the phonological memory & planning (PMP) approach.

We also had a control condition that consisted solely of repetition practice (CON condition).

The big difference between our work and Brumbach et al. is that we tested our hypothesis using a single subject block randomization design, as described in our recent tutorial in Journal of Communication Disorders. The design was set up so that each of the 11 children experienced all three treatments. We chose 3 treatment targets for each child, randomly assigned the targets to each of the three treatments, and then randomly assigned the treatments to each of three sessions, scheduled to occur on different days of the week, 3 sessions per week for 6 weeks. You can see from the table below that each week counts as one block, so there are 6 blocks of 3 sessions for 18 sessions in total. The randomization scheme was generated blindly and independently using computer software for each child. The diagram below shows the treatment schedule for one of the children with a motor planning disorder.

Block Randomization TASC02 DPD Blog

This design allowed us to compare response to the three treatments within each child using a randomization test. For this child, the randomization test revealed a highly significant difference in favour of the AMI treatment as compared to the PMP treatment, as hypothesized for children with motor planning deficits. I don’t want to scoop Tanya’s thesis because she will finish it soon, before the end of 2017 I’m sure, but the long and the short of it is that we have a very clear results in favour of our hypothesis using this fully experimental design and the statistics that are licensed by it. I hope you will check out our tutorial on the application of this design: we show how flexible and versatile this design can be for addressing many different questions about speech-language practice. There is much exciting work being done in the area of speech motor control and this is a design that gives researchers and clinicians an opportunity to obtain interpretable results with small samples of children with rare or idiosyncratic profiles.

Reading

Buchwald, A., & Miozzo, M. (2012). Phonological and Motor Errors in Individuals With Acquired Sound Production Impairment. Journal of Speech, Language, and Hearing Research, 55(5), S1573-S1586. doi:10.1044/1092-4388(2012/11-0200)

Rvachew, S., & Matthews, T. (2017). Using the Syllable Repetition Task to Reveal Underlying Speech Processes in Childhood Apraxia of Speech: A Tutorial. Canadian Journal of Speech-Language Pathology and Audiology, 41(1), 106-126.

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders, 67, 1-13. doi:https://doi.org/10.1016/j.jcomdis.2017.04.003

 

How to choose a control condition for speech therapy research

This post is an addendum to a previous post “What is a control group?”, inspired by a recently published new paper (“Control conditions for randomized trials of behavioral interventions in psychiatry: a decision framework” Early View, Lancet Psychiatry, March 2017). Following a brief review of the literature on effect sizes associated with different types of control conditions, a framework for choosing an appropriate control condition in behavioral trials is offered. The types of control conditions discussed are as follows:

  • Active comparator
  • Minimal treatment control
  • Nonspecific factors control
  • No-treatment control
  • Patient choice
  • Pill placebo
  • Specific factors component control
  • Treatment as usual
  • Waitlist control

The considerations for choosing one of these control conditions for testing a behavioral intervention are (1) participant risk; (2) trial phase; and (3) available resources. With respect to participant risk, more active interventions should be provided as the control condition when the risk of withholding treatment (especially when known effective treatments are available) is high. Therefore, when making this decision characteristics of the participant population and characteristics of the available treatments will play a role in the decision making process.

Regarding trial phase, early stage exploratory trials should be concerned with the risk of Type II error; in other words the researcher will want to maximize the chances of finding a benefit of a potentially helpful new intervention. Therefore, a waitlist control group might be appropriate at this stage of the research process given that waitlist controls are associated with large effect sizes in behavioral trials. In the later stages of the research program, the researcher should strive to minimize Type I error; in other words it is important to guard against concluding that an ineffective treatment is helpful. In this case an active comparator would be a logical choice although the sample size would need to be large given that the effect size is likely to be small in this case.

Finally, the resources available to the researchers will influence the choice of control condition. For example, in a late stage trial an active comparator provided by trained and monitored study personnel would be the best choice in most circumstances; however, in this case the provision of the control may be at least as expensive as the provision of the experimental treatment. When sufficient resources are lacking, the cost effective alternative might be to ask the usual community provider to administer treatment as usual although every effort should be made to describe the control intervention in detail.

A very nice graphic is provided (Figure 2) to illustrate the decision framework and can be applied to speech therapy trials. There are a number of interventions that have been in use or are emerging in speech therapy practice with a minimal evidence base. We can consider the choice of appropriate control condition for the assessment of these interventions.

Ultrasound intervention for school aged children with residual speech errors has been examined in quite a number of single subject studies but is now overdue for a randomized control trial. Given that the exploratory work has been completed in single subject trials I would say that we could proceed to a phase 3 RCT. The risk to the participant population is more difficult to conceptualize. You could say that it is low because these children are not at particular risk for poor school outcomes or other harmful sequels of non-intervention and the likelihood of a good speech outcome will not change much after the age of nine. The cost of providing an active control will be high because these children are often low priority for intervention in the school setting. Therefore, according to Figure 2, a no-treatment control would be appropriate when you make this assumption. On the other hand, you could argue that the participant risk of NOT improving is very high-all the evidence demonstrates that the residual errors do not improve without treatment after this age. If you consider the participant risk to be higher, especially considering community participation and psychosocial factors, then the appropriate control condition would be something more vigorous: patient choice, an active comparator, a nonspecific factors component control or a specific factors component control. Given the relatively early days of this research, small trials utilizing these control conditions in order might be advisable.

Metaphon as a treatment for four-year-olds with severe phonological delay and associated difficulties with phonological processing has not, to my knowledge, been tested with a large scale RCT. The population would be high risk by definition due to the likelihood of experiencing delays in the acquisition of literacy skills if the speech delay is not resolved prior to school entry. Effective treatment options are known to exist. Therefore, the appropriate control condition would be an active comparator-in other words, another treatment that is known to be effective with this population. Another option would be a specific factors component control that examines the efficacy of specific components of the Metaphon approach. Therefore, the meaningful minimal pairs procedure could be compared directly to the full metaphon approach with speech and phonological processing skills as the outcome variables. Similar trials have been conducted by Anne Hesketh and in my own lab (although not involving Metaphon specifically).

PROMPT has still not been tested in good quality single subject or parallel groups research. If a Phase 2 trial were planned for three-year-olds with suspected apraxia of speech, treatment as usual would be the appropriate control condition according to Figure 2. The speech condition is too severe to ethically withhold treatment and the research program is not advanced enough for a specific factors components control although this would be the next step.

Finally, an RCT of the effectiveness of Speech Buddies to stimulate /s/ in 3-year-olds with speech delay could be implemented. In this case, the participant group would low risk due to the likelihood of spontaneous resolution of the speech delay. Given a phase 2 trial, either no treatment or waitlist control could be implemented.

The authors of this framework conclude by recommending that researchers justify their choice of control condition in every trial protocol. They further recommend that waitlist controls are only acceptable when it is the only ethical choice and state that “no behavioral treatment should be included in treatment guidelines if it is only supported by trials using a waitlist control group or meta-analytic evidence driven by such trials.” To me, this is eminently sensible advice for speech and language research as well.

And this I believe concludes my trilogy of posts on the control group!

Further Reading

What is a control group? Developmental Phonological Disorders blog post, February 5, 2017

Using effect sizes to choose a speech therapy approach, Developmental Phonological Disorders blog post, January 31, 2017

Gold, S. M., Enck, P., Hasselmann, H., Friede, T., Hegerl, U., Mohr, D. C., & Otte, C. Control conditions for randomised trials of behavioural interventions in psychiatry: a decision framework. The Lancet Psychiatry. doi:10.1016/S2215-0366(17)30153-0

Hesketh, A., Dima, E., & Nelson, V. (2007). Teaching phoneme awareness to pre-literate children with speech disorder: a randomized controlled trial. International Journal of Language and Communication Disorders, 42(3), 251-271.

Rvachew, S., & Brosseau-Lapré, F. (2015). A Randomized Trial of 12-Week Interventions for the Treatment of Developmental Phonological Disorder in Francophone Children. American Journal of Speech-Language Pathology, 24(4), 637-658. doi:10.1044/2015_AJSLP-14-0056

What is a control group?

I have a feeling that my blog might become less popular in the next little while because you may notice an emerging theme on research design and away from speech therapy procedures specifically! But it is important to know how to identify evidence based procedures and to do that requires knowledge of research design and it has come to my attention, as part of the process of publishing two randomized control trials (RCTs) this past year, that there are a lot of misperceptions about what an RCT is in the SLP and education communities, among both clinicians and researchers. Therefore, I am happy to draw your attention to this terrific blog by Edzard Ernst, and in particular to an especially useful post “How to differentiate good from bad research”. The writer points out that a proper treatment of this topic “must inevitably have the size of a book” because each of the indicators that he provides “is far too short to make real sense.” So I have taken it upon myself in this blog to expand upon one of his indicators of good research – one that I know causes some confusion, specifically:

  • Use of a placebo in the control group where possible.

Recently the reviewers (and editor) of one of my studies was convinced that my design was not an RCT because the children in both groups received an intervention. In the absence of a “no-treatment control” they said, the study could not be an RCT! I was mystified about the source of this strange idea until I read Ernst’s blog and realized that many people, recalling their research courses from university, must be mistaking “placebo control” for “no-treatment control.” However, a placebo control condition is not at all like the absence of treatment. Consider the classic example of a placebo control: in a drug trial, the patients randomized to the treatment arm will visit the nurse who hands him or her a white paper cup holding 2 pink pills containing active ingredient X and some other ingredients that do not impact the patient’s disease, i.e., inactive ingredients; the patients randomized to the control arm will also visit the nurse who hands him or her a white paper cup holding 2 pink pills containing only the inactive ingredients. In other words, the experiment is designed so that all patients are “treated” exactly the same except that only patients randomized to treatment receive (unknowingly) the active ingredient. Therefore, all changes in patient behavior that are due to those aspects of the treatment that are not the active treatment (visiting the nice nurse, expecting the pills to make a difference etc.) are equalized across arms of the study. These are called the “common factors” or “nonspecific factors”.

In the case of a behavioral treatment it is important to equalize the common factors across all arms of the study. Therefore in my own studies I deliberately avoid “no treatment” controls. In my very first RCT (Rvachew, 1994) for example the treatment conditions in the two arms of the study were as follows;

  • Experimental: 10 minutes of listening to sheet vs Xsheet recordings and judging correct vs incorrect “sheet” items (active ingredient) in a computer game format followed by 20 minutes of traditional “sh” articulation therapy, provided by a person blind to the computer game target.
  • Control: 10 minutes of listening to Pete vs meat recordings and judging correct vs incorrect “Pete” items in a computer game format followed by 20 minutes of traditional “sh” articulation therapy, provided by a person blind to the computer game target.

It can be seen that the study was designed to ensure that all participants experienced exactly the same treatment except for the active ingredient that was reserved for children who were randomly assigned to the experimental treatment arm, specifically exposure to the experience of listening to and making perceptual judgments about a variety of correct and incorrect versions of words beginning with “sh” or distorted versions of “sh”-the sound that the children misarticulated. Subsequently I have conducted all my randomized control studies in a similar manner. But, as I said earlier, I run across readers who vociferously assert that the studies are not RCTs because an RCT requires a “no treatment” control. In fact, a “no treatment” control is a very poor control indeed as argued in this blog that explains why the frequently used “wait list control group” is inappropriate. For example, a recent trial on the treatment of tinnitus claimed that a wait list control had merit because “While this comparison condition does not control for all potential placebo effects (e.g., positive expectation, therapeutic contact, the desire to please therapists), the wait-list control does account for the natural passing of time and spontaneous remission.” In fact, it is impossible to control for common factors when using a wait list control and it is unlikely that patients are actually “just waiting” when you randomize them to the “wait list control” condition; therefore Hesser et al.’s defense of the wait list control is  optimistic although their effort to establish how much change you get in this condition is worthwhile.

We had experience with a “wait list” comparison condition in a recent trial (Rvachew & Brosseau-Lapré, 2015). Most of the children were randomly assigned to one of four different treatment conditions, matched on all factors except the specific active ingredients of interest. However, we also had a nonexperimental wait list comparison group* to estimate change for children outside of the trial. We found that parents were savvy about maximizing the treatment that their children could receive in any given year. Our trial lasted six weeks, the public health system entitled them to six weeks of treatment and their private insurance entitled them to six to 12 weeks of therapy depending on the plan. Parents would agree to enrolled their child in the trial with randomization to a treatment arm if their child was waiting for the public service, OR they would agree to be assessed in the “wait list” arm if their child was currently enrolled in the public service. They would use their private insurance when all other options had been exhausted. Therefore the children in the “wait list” arm were actually being treated. Interestingly, we found that the parents expected their children to obtain better results from the public service because it was provided by a “real” SLP rather than the student SLPs who provided our experimental treatments even though the public service was considerably less intense! (As an aside, we were not surprised to find that the reverse was true). Similarly, as I have mentioned in previous blogs, Yoder et al. (2005) found that the children in their “no treatment” control accessed more treatment from other sources than did the children in their treatment arm. And parents randomized to the “watchful waiting” arm of the Glogowska et al. (2000) trial sometimes dropped out because parents will do what they must to meet their child’s needs.

In closing, a randomized control trial is simply a study in which participants are randomly assigned to an experimental treatment and a control condition (even in a cross-over design, in which all participants experience all conditions, as in Rvachew et al., in press). The nature of the control should be determined after careful thought about the factors that you are attempting to control, which can be many – placebo, Hawthorne, fatigue, practice, history, maturation and so on. These will vary from trial to trial obviously. Placebo control does not mean “no treatment” but rather, a treatment that excludes everything except the “active ingredient” that is the subject of your trial. As an SLP, when you are reading about studies that test the efficacy of a treatment, you need to pay attention to what happens to the control group as well as the treatment group. The trick is to think in every case – what is the active ingredient that explains the effect seen in the treatment group? what else might account for the effects seen in the treatment arm of this study? If I implement this treatment in my own practice, how likely am I to get a better result compared to the treatment that my caseload is currently receiving?

* A colleague sent me a paper (Mercer et al., 2007) in which a large number of researchers advocating for the acceptance of a broader array of research designs in order to focus more attention on external validity and translational research, got together to discuss the merits of various designs. During the symposium it arose that there was disagreement about the use of the terms “control” and “comparison” group. I use the terms in accordance with a minority of their attendees, as follows: control group means that the participants were randomly assigned to a group that did not experience the “active ingredient” of the experimental treatment; comparison group means that the participants were not randomly assigned to the group that did not experience the experimental intervention, a group that may or may not have received a treatment. This definition was ultimately not used by the attendees, I don’t know why – somehow they decided on a different definition that didn’t make any sense at all, I invite you to consult p. 141 and see if you can figure it out!

References

Glogowska, M., Roulstone, S., Enderby, P., & Peters, T. (2000). Randomised controlled trial of community based speech and language therapy in preschool children. British Medical Journal, 321, 923-928.

Hesser, H., Weise, C., Rief, W., & Andersson, G. (2011). The effect of waiting: A meta-analysis of wait-list control groups in trials for tinnitus distress. Journal of Psychosomatic Research, 70(4), 378-384. doi:http://dx.doi.org/10.1016/j.jpsychores.2010.12.006

Mercer, S. L., DeVinney, B. J., Fine, L. J., Green, L. W., & Dougherty, D. (2007). Study Designs for Effectiveness and Translation Research: Identifying Trade-offs. American Journal of Preventive Medicine, 33(2), 139-154.e132. doi:http://dx.doi.org/10.1016/j.amepre.2007.04.005

Rvachew, S. (1994). Speech perception training can facilitate sound production learning. Journal of Speech and Hearing Research, 37, 347-357.

Rvachew, S., & Brosseau-Lapré, F. (2015). A randomized trial of twelve week interventions for the treatment of developmental phonological disorder in francophone children. American Journal of Speech-Language Pathology, 24, 637-658. doi:10.1044/2015_AJSLP-14-0056

Rvachew, S., Rees, K., Carolan, E., & Nadig, A. (in press). Improving emergent literacy with school-based shared reading: Paper versus ebooks. International Journal of Child-Computer Interaction. doi:http://dx.doi.org/10.1016/j.ijcci.2017.01.002

Yoder, P. J., Camarata, S., & Gardner, E. (2005). Treatment effects on speech intelligibility and length of utterance in children with specific language and intelligibility impairments. Journal of Early Intervention, 28(1), 34-49.

Using effect sizes to choose a speech therapy approach

I am quite intrigued by the warning offered by Adrian Simpson in his paper “The misdirection of public policy: comparing and combining standardized effect sizes

The context for the paper is the tendency of public policy makers to rely on meta-analyses to make decisions such as, for example, should we improve teachers’ feedback skills or reduce class sizes as a means of raising student performance? Simpson shows that that meta-analyses (and meta-analyses of the meta-analyses!) are a poor tool for making these apples to oranges comparisons and cannot be relied upon as a source of information when making public policy decisions such as this. He identifies three specific issues with research design that invalidate the combining and comparing of effect sizes. I think that these are good issues to keep in mind when considering effect sizes as a clue to treatment efficacy and a source of information when choosing a speech or language therapy approach.

Recall that an effect size is a standardized mean difference, whereby the difference between means (i.e., the mean outcome of the treatment condition versus the mean outcome of the control condition) is expressed in standard deviation units. The issue is that the standard deviation units, which are supposed to reflect the variation in outcome scores between participants in the intervention trial, actually reflect many different aspects of the research design. Therefore if you compare the effect size of an intervention as obtained in one treatment trial with the effect size for another intervention as obtained in a different treatment trial, you cannot be sure that the difference is due to differences in the relative effectiveness of the two treatments. And yet, SLPs are asking themselves these kinds of questions every day: should I use a traditional articulation therapy approach or a phonological approach? Should I add nonspeech oral motor exercises to my traditional treatment protocol? Is it more efficient to focus on expressive language or receptive language goals? Should I use a parent training approach or direct therapy? And so on. Why is it unsafe to combine and compare effect sizes across studies to make these decisions?

The first issue that Simpson raises is that of comparison groups. Many, although not all, treatment trials compare an experimental intervention to either a ‘no treatment’ control group or a ‘usual care’ condition. The characteristics of the ‘no treatment’ and ‘usual care’ controls are inevitably poorly described if at all. And yet meta-analyses will combine effect sizes across many studies despite having a very poor sense of what the control condition is in the studies that are included in the final estimate of treatment effect. Control group and intervention descriptions can be so paltry that in some cases the experimental treatment of one study may be equivalent to the control condition of another study. The Law et al. (2003) review combined effect sizes for a number of RCTs evaluating phonological interventions. One intervention compared a treatment that was provided in 22 twice-weekly half hours sessions over a four month period to a wait list control (Almost & Rosenbaum, 1998). Another intervention involved monthly 45 minute sessions provided over 8 months, in comparison to a “watchful waiting” control in which many parents “dropped out” of the control condition (Glogowska et al. 2000). Inadequate information was provided about how much intervention the control group children accessed while they waited – almost anything is possible relative to the experimental condition in the Glogowska trial. For example, Yoder et al. (2005) observed that their control group actually accessed more treatment than the kids in their experimental treatment group which maybe explains why they did not obtain a main effect of their intervention (or not, who knows?). The point is that it is hard to know whether a small effect size in comparison to a robust control is more or less impressive than a large effect size in comparison to no treatment at all. Certainly, the comparison is not fair.

The second issue raised concerns range restriction in the population of interest. I realize now that I failed to take this into account when I repeated (in Rvachew & Brosseau-Lapré, 2018) the conclusion that dialogic reading interventions are more effective for low-income children than children with developmental language impairments (Mol et al., 2008). Effect sizes are inflated when the intervention is provided to only a restricted part of the population, and the selection variables are associated with the study outcomes. However, the inflation is greatest for the children near the middle of the distribution and least for children at the tails of the distribution. This fact may explain why effect sizes for vocabulary size after dialogic reading intervention are highest for middle class children (.58, Whitehurst et al. 1988), in the middle for lower class but normally developing children (.33, Lonigan & Whitehurst, 1998), and lowest for children with language impairments (.13, Crain-Thoreson & Dale, 1999). There are other potential explanatory factors in these studies but this issue with restricted range is an important variable that is of obvious importance in treatment trials directed at children with speech and language impairments. The low effect size for dialogic reading obtained by Crain-Thoreson & Dale should not by itself discourage use of dialogic reading with this population.

Finally, measurement validity plays a huge role with longer more valid tests improving effect sizes in comparison to shorter less valid tests. This might be important when comparing the relative effectiveness of therapy for different types of goals. Law et al. (2003) concluded that phonology therapy appeared to be more effective than therapy for syntax goals for example. For some reason the outcome measures in these two groups of studies tend to be very different. Phonology outcomes are typically assessed with picture naming tasks that include 25 to 100 items, with the outcome often expressed as percent consonants correct and therefore at the consonant level there are many items contributing to the test score. Sometimes the phonology outcome measure is created specifically to probe the child’s progress on the specific target of the phonology intervention. In both cases the outcome measure is likely to be a sensitive measure of the outcomes of the intervention. Surprisingly, in Law et al., the outcome of the studies of syntax interventions were quite often omnibus measures of language functioning, such as the Preschool Language Scale, or worse the Reynell Developmental Language Scale, neither test containing many items targeted specifically at the domain of the experimental intervention. When comparing effect sizes across studies, it is crucial to be sure that the outcome measures have equal reliability and validity as measures of the outcomes of interest.

My conclusion is that it is important to not make a fetish of meta-analyses and effect sizes. These kinds of studies provide just one kind of information that should be taken into account when making treatment decisions. Their value is only as good as the underlying research—overall, effect sizes are most trustworthy when they come from the same study or a series of studies involving the exact same independent and dependent variables and the same study population. Given that this is a rare occurrence in speech and language research, there is no real substitute for a deep knowledge of an entire literature on any given subject. Narrative reviews from “experts” (a much maligned concept!) still have a role to play.

References

Almost, D., & Rosenbaum, P. (1998). Effectiveness of speech intervention for phonological disorders: a randomized controlled trial. Developmental Medicine and Child Neuroloogy, 40, 319-325.

Crain-Thoreson, C., & Dale, P. S. (1999). Enhancing linguistic performance: Parents and teachers as book reading partners for children with language delays. Topics in Early Childhool Special Education, 19, 28-39.

Glogowska, M., Roulstone, S., Enderby, P., & Peters, T. (2000). Randomised controlled trial of community based speech and language therapy in preschool children. British Medical Journal, 321, 923-928.

Law, J., Garrett, Z., & Nye, C. (2003). Speech and language therapy interventions for children with primary speech and language delay or disorder (Cochrane Review). Cochrane Database of Systematic Reviews, Issue 3. Art. No.: CD004110. doi:10.1002/14651858.CD004110.

Lonigan, C. J., & Whitehurst, G. J. (1998). Relative efficacy of a parent teacher involvement in a shared-reading intervention for preschool children from low-income backgrounds. Early Childhood Research Quarterly, 13(2), 263-290.

Mol, S. E., Bus, A. G., de Jong, M. T., & Smeeta, D. J. H. (2008). Added value of dialogic parent-child book readings: A meta-analysis. Early Education and Development, 19, 7-26.

Rvachew, S., & Brosseau-Lapré, F. (2018). Developmental Phonological Disorders: Foundations of Clinical Practice (Second Edition). San Diego, CA: Plural Publishing.

Simpson, A. (2017). The misdirection of public policy: comparing and combining standardised effect sizes. Journal of Education Policy, 1-17. doi:10.1080/02680939.2017.1280183

Whitehurst, G. J., Falco, F., Lonigan, C. J., Fischel, J. E., DeBaryshe, B. D., Valdez-Menchaca, M. C., & Caulfield, M. (1988). Accelerating language development through picture book reading. Developmental Psychology, 24, 552-558.

Yoder, P. J., Camarata, S., & Gardner, E. (2005). Treatment effects on speech intelligibility and length of utterance in children with specific language and intelligibility impairments. Journal of Early Intervention, 28(1), 34-49.

How to choose phonology goals?

I find out via Twitter (don’t you love twitter!) that “teach complex sounds first” is making the rounds again (still!) and I am prompted to respond. Besides the fact that I have disproven the theoretical underpinnings of this idea, it bothers me that so many of the assumptions wrapped up in the assertion are unhelpful to a successful intervention. Specifically, we should not be treating “sounds”, there is no agreed upon and universal ordering of targets from simple to complex, and there is no reason to teach the potential targets one at a time in some particular order anyway. So what should we do? I will describe a useful procedure here with an example.

There is this curious rumour that I promote a “traditional developmental” approach to target selection that I must lay to rest. In fact, I have made it clear that I promote a dynamic systems approach. An important concept is the notion of nonlinearity: if you induce gradual linear changes in several potential targets at once, a complex interaction will result causing a nonlinear change across the system known as a phase shift. How do you choose the targets to work on at once? Francoise and I show how to use a “quick multilinear analysis” to identify potential targets  at all levels of the phonological hierarchy, in other words phrases, trochaic or iambic feet, syllables, onsets, rimes or codas, clusters, features or individual phonemes. Many case studies and demonstrations are laid out in our book that will shortly appear in a beautiful second edition. Then we show how to select three targets for simultaneous treatment using Grunwell’s scheme designed to facilitate progressive change in the child’s phonological system. I will demonstrate both parts of this process here, using a very brief sample from a case study that is described in our book. The child’s speech is delayed for her age of two years which can be established by comparing the word shape and phonetic repertoire to expectations established by Carol Stoel-Gammon.

case-study-6-3-sample-for-blog

Potential treatment targets can be identified by considering strengths and weaknesses at the prosodic and segmental tiers of the phonological hierarchy (full instructions for this quick multilinear analysis are contained in our book). The table below describes units that are present and absent. Note that since her language system is early developing, her phonology is probably word-based rather than phoneme based; therefore ‘distinction’ refers to the presence of a phonetic distinction rather than a phonemic contrast.

case-study-6-3-quick-multilinear-analysis

Now that we have a sense of potential targets from across the whole system, how do we select targets using Grunwell’s scheme? We want to ensure that we address word shape and segmental goals. We also want to choose one goal to stabilize a variable structure in the system, another to extend something that is established to a new context, and a third to expand the system to including something new. Here are my choices (others are possible):

case-study-6-3-grunwell-goals

There is a good chance that fricatives and codas will emerge spontaneously with this plan because we will have laid down the foundation for these structures. If they don’t it should not be hard to achieve them during the next therapy block. The idea that you can only induce large change in the system by teaching the most complex targets first is clearly not true as I have explained previously – in fact, complex sounds emerge more easily when the foundation is in place. Furthermore, Schwartz and Leonard (1982) also recommended in their study on selection effects in early phonological development that it was best to teach IN words to children with small vocabulary sizes – in other words expand the vocabulary size gradually by using word shapes and phonemes that are in the inventory, but combined in new ways.

It would be possible to use the stabilize-extend-expand scheme and choose different, more complex goals. For example, we could consider the nonreduplicated CVCV structure (cubby, bunny, bootie) to be the stabilize goal. Then we could introduce word final labial stops as the extend goal, generalizing these phones from the onset where they are well established to a new word position (up, tub, nap). Finally, we could introduce a word initial fricative as the expand goal (see, sock, soup). This plan with more complex targets might work but you are risking slower progress, given the empirical findings reported in Rvachew & Nowak (2001) and in Schwartz & Leonard (1982). Furthermore, you would be failing to recognize a major constraint on the structure of her syllables (the limitation to only 2 segments, VV or CV with CVV and CVC currently proscribed). If you focus only on introducing “complex sounds” without attending to this major issue at the prosodic levels of her phonological system, you will be in for a rough ride.

I attach here another example, this one a demonstration from the second edition of our book, chapter-8-demonstration-8-2, to appear in December 2016. Francoise and I have taken a great effort to show students how to implement an evidence based approach to therapy. I invite readers to take a peak!

Reading List

Rvachew, S., & Brosseau-Lapré, F. (2018). Developmental Phonological Disorders: Foundations of Clinical Practice (Second Edition). San Diego, CA: Plural Publishing. (Ready for order in December 2016)

Grunwell, P. (1992). Processes of phonological change in developmental speech disorders. Clinical Linguistics & Phonetics, 6, 101-122.

Stoel-Gammon, C. (1987). Phonological skills of 2-year-olds. Language, Speech & Hearing Services in Schools, 18, 323-329.

Rvachew, S., & Bernhardt, B. (2010). Clinical implications of the dynamic systems approach to phonological development. American Journal of Speech-Language Pathology, 19, 34-50.

Rvachew, S. & Nowak, M. (2001). The effect of target selection strategy on sound production learning. Journal of Speech, Language, and Hearing Research, 44, 610-623.

Schwartz, R., & Leonard, L. (1982). Do children pick and choose? An examination of selection and avoidance in early lexical acquisition. Journal of Child Language, 9, 319-336.

 

 

 

CAMs & Speech Therapy

In this final post on the potential conflict between Evidence Based Practice (EBP) and Patient Centred Care (PCC) I consider those situations in which your client or the client’s family persists in a course of action that you may feel is not evidence based. This is a very common occurrence although you may not be aware of it. Increasing numbers of surveys reveal that the families of children with disabilities use Complementary and Alternative Medicines/Therapies (CAMs), usually without telling their doctor and other health care providers within the “standard” health care environment.

Despite a growing number of studies it is difficult to get an exact estimate of the prevalence of CAM use among such families (see reading list below). Some estimates are low because families are reluctant to admit to using CAMs. Other estimates are ridiculously high because CAM users are responding to insurance company surveys in order to promote funding for these services and products. However, the best estimates are perhaps as follows: about 12% of children in the general population are exposed to CAMs; the proportion probably doubles for children with developmental disabilities in general and doubles again for children with autism. The most commonly used CAMs are dietary supplements or special diets, followed by “mind and body practices” (sensory integration therapy, yoga, etc.); the use of dangerous practices such as chelation therapy is mercifully much less frequent. Predictors of CAM use are high levels of parental education and stress. The child’s symptoms are not reliably associated with CAM use. The hypothesized reasons for these correlations are that educated parents have the means to find out about the CAMs and the financial means to access them. Having had some personal experience with this, I think that educated parents are very used to feeling in control of their lives and nothing shatters that sense of control as much as finding that your child has a developmental disability. I find it very interesting that the studies shown below counted CAM use after specifically excluding prayer! I may be wrong but I expect that many well educated parents, even those that pray, would look for a more active solution than putting their family exclusively in the hands of God. Educating yourself through internet searches and buying a miracle cure feels like taking back control of your life (although months later when you realize you have thousands of dollars of worthless orange gunk in your basement, you are feeling out of control again AND stupid, but that is another story). Anyway, this is why I think (an untested hypothesis I admit) that patient centred care is actually the key to preventing parents from buying into harmful or useless therapies.

When the parent asks (or demands, as used to happen when I had my private practice) that you use a therapy that is not evidence based, how do you respond in a way that balances evidence based practice with patient centred care?

The most important strategy is to maintain an open and respectful dialogue with the family at all times so that conversation about the use of CAMs can occur. Parents often do not reveal the use of these alternative therapies and sometimes there are dangerous interactions among the many therapies that the child is receiving. It is critical that the parent feels comfortable sharing with you and this will not occur if you are critical or dismissive of the parents’ goals and choices. A PCC approach to your own goal setting and intervention choices will facilitate that dialogue. It is actually a good thing if the parent asks you to participate in a change in treatment approach.

Find out what the parent’s motivations are. Possibly the parent’s concerns are not in your domain. For example dad might ask you to begin sessions with relaxation and breathing activities. You launch into a long lecture about how these exercises will not improve speech accuracy. It turns out that the exercises are meant to calm anxiety, a new issue that has arisen after a change in medication and some stresses at school. As an SLP, you are not actually in a position to be sure about the efficacy of the activity without some further checking and going along with the parent is not going to hurt in any case.

Consider whether your own intervention plan is still working and whether your own goals are still the most pertinent for the child. Sometimes we get so wrapped up in the implementation of a particular plan we miss the fact that new challenges in the child’s life obligate a course correction. Mum feels like her child needs something else and looks around for an alternative. After some discussion you may find that switching your goal from morphosyntax to narrative skills might work just as well as introducing acupuncture!

Talk with the parent about where the idea to use the CAM came from and how the rest of the family is adapting to the change. It is possible that mum knows the diet is unlikely to work but dad and dad’s entire family has taken it on as a family project to help the child. In some ways the diet is secondary to the family’s sense of solidarity. On the other hand mum may be isolating herself and the child from the rest of the family by committing to an intervention that everyone else thinks is bonkers! This will be difficult but efforts to engage the family with counseling might be in order.

Explore ways to help the parent establish the efficacy of the CAM. With the family’s consent you might be able to find information about the alternative approach from sources that are more credible than google. You might be able to help the parent set up a monitoring program to document changes in behavior or sleep habits or whatever it is that the parent is trying to modify. You may even be able to implement a single subject randomized experiment to document the efficacy of the therapy for the child. Dad may enjoy helping to plot the data in a spreadsheet.

Finally and also crucially, model evidence based thinking in all your interactions with the family. When you are suggesting new goals or approaches to intervention explain your decisions. Involve the family in those choices, describing the potential benefits and costs of the various options by referencing the scientific literature. Let the parent know that you are making evidence based hypotheses all the time and watching their child carefully to confirm whether your hypotheses were correct. Involve families in this process so that they become used to thinking in terms of educated guesses rather than phony certainties.

Reading list

Bowen, C. & Snow, P. C. (forthcoming, about January 2017). Making Sense of Interventions for Children’s Developmental Difficulties. Guildford: J&R Press. ISBN 978-1-907826-32-0 

Levy, S. E., & Hyman, S. L. (2015). Complementary and Alternative Medicine Treatments for Children with Autism Spectrum Disorders. Child and Adolescent Psychiatric Clinics of North America, 24(1), 117-143.

Owen-Smith, A. A., Bent, S., Lynch, F. L., Coleman, K. J., Yau, V. M., Pearson, K. A., . . . Croen, L. A. (2015). Prevalence and predictors of complementary and alternative medicine use in a large insured sample of children with Autism Spectrum Disorders. Research in Autism Spectrum Disorders, 17, 40-51.

Salomone, E., Charman, T., McConachie, H., Warreyn, P., Working Group 4, & COST Action “Enhancing the Scientific Study of Early Autism”. (2015). Prevalence and correlates of use of complementary and alternative medicine in children with autism spectrum disorder in Europe. European Journal of Pediatrics, 174, 1277-1285.

Valicenti-McDermott, M., Burrows, B., Bernstein, L., Hottinger, K., Lawson, K., Seijo, R., . . . Shinnar, S. (2014). Use of Complementary and Alternative Medicine in Children With Autism and Other Developmental Disabilities: Associations With Ethnicity, Child Comorbid Symptoms, and Parental Stress. Journal of Child Neurology, 29(3), 360-367.