Using Apps for Speech Therapy

It seems like only a few days ago I promised to write a blog post on the best uses of apps for speech therapy, when I wrote about the Werfel study in my last blogpost. But it turns out that I made that promise 3 months ago! Time flies when you are School Director it turns out. But also, my thinking about why you might want to substitute an app for picture cards reminded me of a particularly traumatic event in my past and maybe I just didn’t want to revisit that memory. But here goes…when I was sent out on my first summer practicum as an undergraduate student sometime in the nineteen-seventies I was assigned to a health unit in rural Alberta. The placement involved driving a great big Ford around to schools on country roads which was scary enough because I had a driver’s licence, but I had never really driven on account of not owning a car. Anyway, on the very first day my supervisor asked me to carry all our materials out to the car so she piled my arms up with stuff, many files filled with papers, some board games, those plastic boxes full of articulation cards, and on top of that…her lunch! Of course, I dropped the load in the parking lot. You can imagine the scene — I am not going to describe the process of picking it all back up and trying to reorder everything before getting it in the back seat. To make it all worse, she then hands me the keys and tells me to drive because she is going to eat her lunch on the way. Her lunch included a can of grape pop. Now you can imagine how my glasses became painted with purple goop. All I can says is that it is lucky I did not drive the car off the road.

This story is actually relevant to the topic at hand because I want to talk about iPad apps relative to all the things I was carrying in my hands, excluding the lunch. Recall that Werfel implemented a therapy program in which the children named pictures on the screen and then swiped them off, one after the other, for 25 sessions over 8 weeks. Is this how we want to use apps? Why would we use apps? What are the advantages of apps over the boxes of pictures cards? Let’s go through the advantages one at a time.

  1. Storage

The first obvious advantage is that all the information and functionality carried in the files, the boxes of picture cards and even the board games can be stored on an iPad — a relatively small object that would have fit in the lunch bag or my purse. Not only that, the information can be password protected so it is an efficient and relatively secure way of carrying things around. At the same time the screen is large enough for two people to view and small hands to manipulate. I read that SLPs use a lot of apps built for phones because their employers do not provide them with iPads but everyone has their own iPhone. That is a real shame because the functionality of an iPad or other tablet is hard to beat.

  1. Multimedia

The second advantage of a digital app is the possibility of presenting information to children with multimedia correlation across different sensory modalities. Apps can present therapy stimuli with an integration of colourful and realistic visual representations, integrated text, sound effects and movement. Susan Neuman’s theory of synergy predicts that children learn and store more robust mental representations when they experience new information this way. Some experimental support for this idea was presented by Strouse & Ganea who randomly assigned 102 toddler-mother pairs to a print-book or ebook shared reading condition. The results were striking:

“Toddlers who were read the electronic books paid more attention, made themselves more available for reading, displayed more positive affect, participated in more page turns, and produced more content-related comments during reading than those who were read the print versions of the books. Toddlers also correctly identified a novel animal labeled in the book more often when they had read the electronic than the traditional print books.”

In this study the animation provided by the ebooks was very simple: when the toddlers patted the page, the sound associated with the illustrated animal was presented. Therefore, we have multimedia stimulation and an interactive component contributing to engagement and learning.

  1. Interactive Features

The variety of interactive features that are built into apps are boundless. In ebooks “hotspots” within the text or illustrations launch a variety of effects that may advance the story and support learning. Alternatively these animations, sound effects and games that occur when the hotspots are activated may be entertaining while not relevant to the story at all. These same kind of features can be used to create learning activities in the context of educational games meant to teach letter sounds or vocabulary or reading or a wide range of other skills. Many games are simply digital versions of conventional board games. Other games are meant to be fun and creative, involving free style drawing, opportunities to create characters and settings and stories in an open-ended fashion. Apps that encourage creativity are recommended for their “minds-on” properties. Hirsh-Pasek et al presented a framework for evaluating and choosing apps that rests on four pillars of learning: (1) the app encourages active learning; (2) in which the child is deeply engaged by the learning task; (3) the learning experience is meaningful in that it promotes connections between new knowledge and existing knowledge; and (4) the learning activity permits high quality social interaction or social contingency. These authors also review the science of learning and conclude that when the app is explicitly educational the learning program should be structured to provided “scaffolded exploration toward a learning goal.” Therefore, rote learning games in which the child, for example, simply names pictures and receives a tangible reward such as points in a token-economy game would not meet these criteria. A completely open-ended game with no learning goal would also not meet these criterial.

  1. Personalization

Perhaps the most exciting opportunities offered by tablets and the associated apps are the possibilities for personalization. It is possible for children to create their own stimuli and stories using the camera, drawing, and writing tools. In this way all the practice materials for speech and language therapy can be especially meaningful and relevant to the child’s daily life and special interests.

Using Apps in Speech Therapy

The first advantage to using apps in speech therapy is that it is possible to “think outside the articulation card box” and use other tools to practice speech accuracy in authentic communicative contexts. Let us imagine that you are working of velar stops with a child who typically fronts these consonants. You want an opportunity to product the sounds in relatively complex words while providing meaningful feedback using focused stimulation that is adapted for the speech therapy context as described by Rvachew & Brosseau-Lapré (2012). There are some electronic books that lend themselves to conversation that useful for this purpose. Consider the Nosy Crow book “Don’t Wake Up Tiger!” First there are several opportunities to produce velar sounds in conversation: tiger (contrasted with turtle), frog, cake, candle, pelican, fox). There is an active learning component in that the child must perform specific actions to help the different animals get around the tiger without waking him up in order to set up their surprise birthday party. There are matching games and five “spot the difference” games, the last one involving the birthday party scene, providing the opportunity for distancing prompts. The idea here is that articulation drill is not the best way to improve speech accuracy for the majority of children with speech delay or disorders in any case. You will want to choose different stories or games for older children but definitely choose apps that permit authentic conversation and minds-on learning.

It is also possible to create your own games for speech therapy drill very simply using presentation tools along with photos, clip art, or drawing tools. If you were practicing words that contain siblilants for example, the child could bring a photo of his house. Pasted into a series of slides, over top of cartoon characters and animated to disappear upon clicking or swiping, you have a very simple game. In this case, the child asks the question “Whose house?” and after swiping the house, a simple animation reveals the “It’s mouse’s house (sheep’s/zebra’s/seal’s etc.).” Many common software tools permit simple animations that are useful, turning a simple swipe into a game that connects meaning to the drill practice.

Of course, there are many commercial apps for drill therapy or minimal pairs games. I will not make the mistake of endorsing or criticizing any particular product. However, you will want to look for common problems when you download free games or purchase more sophisticated therapy tools. One common issue is putting text on the minimal pair cards so that the children are using letter cues rather than listening to the sound of speech and referring to their own underlying representations for words when playing the game that is involved. Another issue is poor choices of words from the point of view of phonological theory (e.g., “ball” and “bottle” are not both /l/-coda words). The old articulation card boxes had the same problem but it was often easier to shuffle through and exclude the words that did not fit the pattern you were working on. The commercial apps may or may not be that flexible.

In any case, I am sure that most of you are more familiar with these apps than I am and have lots of creative ideas for using them. The main point I wanted to make is that we should not let the tail wag the dog. It is really important to choose the most creative minds-on apps and not let the software coax us way back to the “drill and kill” days of the sixties. We have known for some time now that phonological therapy is all about meaning. The fun part of digital tools is the opportunity that multimedia and interactivity offers for helping children make connections between new learning and their prior experience.

Would you do speech therapy like this?

I was interested to read a paper about the relative efficacy of using traditional flash cards versus tablet presentation of pictures for articulation drill therapy because I have developed iPad apps myself (e.g., see and have an interest in the potential of digital tools to enhance the speech therapy experience. The paper was recently published in the Online First section of Communication Disorders Quarterly by Krystel Werfel, Marren Brooks, and Lisa Fitton.

The study used a single subject alternating treatment design with four subjects, each kindergarten aged, —not clearly exhibiting signs of speech delay but none-the-less misarticulating two phonemes that could be practiced. Some statistical analyses (rather dubiously applied to single subject data) suggested that the children achieved mastery sooner in the flashcard condition but produced more correct responses in the tablet condition. To my eye, the data did not suggest a clear advantage to either condition. All the children did in fact master the treated phonemes (which were /z,s/, /pl,ɡl/, and /θ,ð/ (this pair for two children).

The authors make clear that the study is meant to be informative on the modality of stimulus presentation and not a test of the treatment protocol itself but I found myself alarmed at the possibility that readers might think that the treatment protocol would be reasonable in regular clinical practice and therefore I would like to address the way that the intervention was implemented. Often researchers implement a speech therapy intervention in a way that they would not in a regular clinical environment in an effort to exert more experimental control over all the variables than is typically necessary or desirable in an authentic clinical context. I can only hope that this explains some of the clinical choices that were made in this case. I am going to address several in turn as follows: (1) treatment approach; (2) treatment procedure; (3) reinforcement procedures; (4) cumulative intervention intensity; and (5) discharge criteria.

First, the authors state that they chose a traditional approach to therapy because there is empirical evidence that it works and clinicians prefer it. There is evidence of efficacy but in fact for most preschool aged children who qualify for speech services a phonological approach may be more efficacious as Francoise and I discuss in our text. Furthermore, the surveys indicating a preference for a traditional approach indicating that this is true in the United States but not elsewhere. Finally, there seems to be some confusion about what a “traditional” approach is. In some cases, traditional refers to a strict behaviorist intervention that focuses solely on speech production with a gradual increase in the complexity of speech units; in other cases it involves a sensory-motor approach with careful attention to variable speech practice and multiple targets; in other cases a traditional approach means Charles Van Riper’s approach which was properly sensory motor including both ear training, graduated speech practice and some principles of motor learning. The implementation in this paper was highly restricted involving only practice of single words and sometimes isolated sounds if necessary. If the speech therapist chooses a traditional rather than phonological approach it is best that the full sensory motor protocol be implemented.

Second, the drill based approach that was employed was selected again on empirical grounds. The study cited to support this approach was sound especially when treating children who have good speech perception abilities, most likely the case for the children in this study who did not have clear evidence of a speech disorder. Other approaches can be effective if procedures targeting phonological processing are incorporated into the intervention as shown by Hesketh and colleagues in the U.K. and also by me and Francoise with French-speaking children.

The strangest part of the whole intervention is that the children experienced over 25 treatment sessions each and throughout every session identical practice trials occurred: a stimulus prompt was presented, the child attempted to name the picture, the clinician provided feedback or extra support and then if the child’s response was correct he or she was permitted to mail the flash card or swipe the picture of the tablet. That was it. For eight weeks. I’m speechless. Enough said.

Regarding cumulative intervention intensity, I indicated in previous blogs that children should receive a minimum of 50 practice trials and ideally 100 practice trials per session. Furthermore, other single subject research using a minimal pairs procedures indicates that generalization goals are not usually met with fewer than 180 practice trials (when treating children with moderate or severe phonological delays). In Werfel’s study the children received treatment for two sounds in 20 minutes, so ten minutes per sound and 15 practice trials per sound or 10-minute block, therefore 30 practice trials per 20-minute treatment session. Reportedly, the mastery was achieved after 203 trials in the flashcard condition and 270 trials in the tablet condition (equivalent to 135 and 180 minutes of therapy respectively). However, increasing the number of practice trials to 50 during that 20-minute session could reduce the number of sessions or weeks in the intervention program by almost half. One way to do that would be to reduce the amount of feedback that was provided. The intervention was designed so that the clinician provided explicit feedback to the child after every practice attempt whereas the principles of motor learning suggest that less feedback is often better for speech motor learning. For example, a child can name five pictures in a row and be told that four of the five productions were correct. Another strategy is to practice at the challenge point at all times as described in detail by Francoise and I in Developmental Phonological Disorders: Foundations of Clinical Practice but also in our new undergraduate text Introduction to Speech Sound Disorders.

Finally, the discharge or stopping criteria in the study were set at 100% correct performance on the generalization probe over 3 consecutive sessions. The probe contained 5 treated words and 5 untreated words. This criterion meant that children practiced their targets for a long time past the point at which the practice material should have been made more difficult or the child should have been discharged to see if spontaneous generalization to natural speaking situations would occur. As Francoise and I review in Chapter 8 of our book, several studies have shown that children can be discharged after achieving between 40 and 80% correct responding on generalization probes. Most children will continue to make gains in production accuracy after this point. The four children in the Werfel et al study received an average of 5 unnecessary treatment sessions according to these criteria.

When conducting treatment studies, it is helpful to provide models of treatment procedures that are best practice in the clinical setting. Often interventions that are better than no intervention will prove to be effective in a research setting while not necessarily being best practice. These studies are confusing for a clinical audience I think. Furthermore, when asking clinical questions about new technologies it is interesting to ask, why would we want to bring it into our clinical practice? What benefit might it bring? How can we adapt these technologies so that the best of human interactions are retained and the most benefit of the technology is added? In my next blog I will address the Werfel study again, but this time imagining the questions we might ask about tablet-based implementations of articulation therapy.

Jingle and Jangle Fallacies in Levels of Representation

I have had several opportunities over the past few years to object when the investigators who conducted the Sound Start trial associated the framework outlined in our book with the theoretical foundation for their work. We have not had an opportunity to fully explore my objection and twitter is a bad medium for a discussion requiring this much complexity and nuance and therefore I am going to provide a rationale in some detail in my blog. McLeod et al. justify their approach by reference to certain psycholinguistic models with particular allegiance to Stackhouse and Wells’ (1993) important work. In our book, Francoise and I also pay homage to their model and note the historical lineage although our framework is drawn directly from work by Munson, Edwards, and Beckman (2005). Therefore there is no argument about the use of the terms input, output, and phonological processes or representations–as we note, this basic tripartite division of speech processes is more or less universal. The difficulty is that McLeod and colleagues divide up assessment and treatment tasks according to these category labels (input, output and phonological processes) differently from us and then cite our framework. It is jarring because the error appears to reflect both jingle and jangle fallacies. For those who have not encountered these amusing terms before, a jingle fallacy refers to the assumption that two concepts with the same name are the same when they are actually different; a jangle fallacy is the assumption that one concept that may be referred to with different names are therefore different when there is in fact only one concept. We can all agree that McLeod’s team and my team (and Munson’s team and so on) can all use a tripartite framework (input-output-phonological) and we can all agree to disagree about what which tasks go into which division (psycholinguistics have been doing that for a long time now and will continue to do so). However, when I am cited I would prefer to not have confusion about what I mean when I talk about input vs output processes.

I will begin with the point of agreement–the tripartite division of psycholinguistic processes, citing McLeod et al. (2017) directly: “Stackhouse and Wells (1997) … proposed three core elements: Input processes (i.e., detecting and perceiving speech…), cognitive-linguistic processes (i.e., creating, storing, and accessing lexical representations of words …), and output processes (i.e., producing speech…).

In our text, Francoise and I borrow heavily from Munson, Edwards, and Beckman to describe three types of phonological knowledge: Perceptual knowledge encoded in the form of acoustic-phonetic representations for speech sounds, abstracted from stored acoustic memories of words; articulatory knowledge encoded in the form of motor plans for syllables; and phonological knowledge, encoded as underspecified phonological units at all levels of the phonological hierarchy, and acquired as an emergent property of the lexicon itself. A variety of processes are proposed for acquiring and using these types of knowledge when perceiving, understanding, and producing speech.

The difficulty comes when we begin to assign different assessment or treatment tasks to these levels of processing or representation. In our book we describe input processes and input approaches to treatment as those that target specifically children’s acoustic-phonetic representations. Strong acoustic-phonetic representations provide support for speech perception and implicit phonological awareness. Assessment and intervention tasks will involve the provision of varied speech inputs, focusing on words but with systematic variation in acoustic cues and involving implicit learning strategies. Tasks that tap these processes may involve only listening to speech input or they may involve listening and talking–it depends upon the design of the task and the way that the children’s responses are analyzed. For example, on of my favourite studies that reveals the importance of “input processes” was conducted by Munson, Baylis, Krause, and Yim (2006). In their study children first listened passively to nonwords. After a distractor task they repeated nonwords, some of which they had previously heard during the passive listening task. Children with typical speech showed a benefit of the previous exposure in their repetition accuracy whereas children with a speech sound disorder did not show this benefit. You can see that this task that is largely dependent upon spoken responses is a measure of input processing! Speech perception tasks fall into this category most clearly when they reveal something about the nature of the acoustic cues that the child is using to make decisions about which acoustic-phonetic objects form a particular word or phonetic category. Some phonological awareness tasks are also input oriented when the child indicates that, for example, “hat” and “bat” sound similar by matching pictures even if they do not have high level metacognitive knowledge about what the similarity is.

Phonological knowledge is a more abstract form of knowledge that emerges from the organization of the lexicon and from explicit teaching, especially phonics and reading education in schools. It includes metacognitive knowledge of sublexical and subsyllabic units. Assessment and intervention tasks in this domain often involve high level expressions of this knowledge such as verbally identifying the common sound in the coda of the words “hat” and “boat” or indicating that [b] is at the beginning of the sound “boat” or differentiating 3-syllable from 2-syllable words.

In some children there are discontinuities across these levels of knowledge even when the same unit is involved. For example, a child may be able to indicate that [bæθ] and [bæt] and [bæs] correspond to different pictures (i.e., have different meanings) but have an unclear sense of the acoustic cues that differentiate the phonetic categories that differentiate these words. Another child might have excellent acoustic-phonetic representations for these words and the phonetic categories that differentiate them but have immature metaphonological knowledge, being unaware that each word is composed of three phonemes and unable to tell you that they share the same head [bæ]. In our book, Francoise and I detail the kinds of tasks that can be used clinically to assess and remediate children’s knowledge at different levels of representation.

The disagreement we are having with McLeod et al is the classification of all the tasks in the Phoneme Factory computer intervention as being “input oriented” tasks. According to our framework, even though most tasks require the child to listen and then respond by selecting pictures or letters on a computer screen, these tasks all involve accessing phonological levels of representation and do not serve to strengthen the child’s acoustic-phonetic representations. Even the most basic level task involves associating sounds produced in isolation (e.g., [s], [d]) with a standard pictograph (e.g., [s] → “snake”). The authors mistakenly identify this task with the lowest level “input” process in Stackhouse and Well’s model, that is, speech discrimination, but it is not a discrimination task and the stimuli do not reveal anything about the children’s knowledge of the acoustic-phonetic cues that differentiate one category of speech sounds from another. All the tasks in the program are metaphonological tasks that that therefore tap phonological knowledge even though real words and word meanings are not always engaged.

At the recent NZSPA2019 Conference in Brisbane Jane McCormack divided up the phonological awareness assessment tasks that comprise the CTOPP into input and output task purely on the basis of whether a spoken response was required by the child. However, I would not agree that any of these phonological awareness tasks reveal the child’s acoustic-phonetic knowledge of speech sound categories and therefore there are no “input tasks” per se. All the tasks are tapping meta-phonological knowledge.

If this is still confusing, think of that child who says [s̪it], [s̪nek], [fes̪], and [buts̪], and who confidently identifies [mauθ] as the picture with teeth, but both [maus] and [maus̪] as the picture of the rodent. This same child is able to blend the sounds [m] – [au] – [θ] to recreate the word /mauθ/. If you ask her to say [maus] without [s] she answers [mau]. Here we have a child whose acoustic-phonetic and articulatory-phonetic knowledge of the /s/ phoneme is poor, explaining the consistent distortion in her speech; at the same time the child’s phonological knowledge of the /s/ – /θ/ contrast is good and her meta-phonological skills are good as well. Therefore, when treating this child, we would want to focus at the phonetic level. The Phoneme Factory intervention might be good for her future literacy skills but it would not be the best prescription for her speech articulation problem. We really want to have a clear understanding of the difference between these three levels of representation.

As a more general point it is really important when citing anyone to match up terms with concepts in a way that is consistent with the cited authors’ original intent. This is hard because the use of terms undergoes so much historical and theoretical change. The changes are good I think – Munson et al. help us to understand that many children with developmental phonological disorders have difficulties in the phonetic domains (acoustic-phonetic and articulatory-phonetic representations) whereas many children with language impairments have deficits in phonological knowledge in fact, a by product of smaller lexicons. Knowing how to assess and remediate children’s knowledge in these three domains will help us to target our interventions more effectively.


Baker, E., Croot, K., McLeod, S., & Paul, R. (2001). Psycholinguistic models of speech development and their application to clinical practice. Journal of Speech, Language, and Hearing Research, 44, 685-702.

McLeod, S., Baker, E., McCormack, J., Wren, Y., Roulstone, S., Crow, K., . . . Howland, C. (2017). Cluster-Randomized Controlled Trial Evaluating the Effectiveness of Computer-Assisted Intervention Delivered by Educators for Children With Speech Sound Disorders. Journal of Speech, Language & Hearing Research, 60(7), 1891-1910. doi:10.1044/2017_JSLHR-S-16-0385

Munson, B., Baylis, A., Krause, M., & Yim, D.-S. (2006). Representation and access in phonological impairment. Paper presented at the 10th Conference on Laboratory Phonology, Paris, France, June 30-July 2.

Munson, B., Edwards, J., & Beckman, M. E. (2005). Phonological knowledge in typical and atypical speech-sound development. Topics in Language Disorders, 25(3), 190-206.

Rvachew, S., & Brosseau-Lapre, F. (2018). Developmental Phonological Disorders: Foundations of Clinical Practice (Second ed.). San Diego, CA: Plural Publishing, Inc.

Stackhouse, J., & Wells, B. (1993). Psycholinguistic assessment of developmental speech disorders. European Journal of Disorders of Communication, 28, 331-348.

Boys and Spelling

I rather like this new paper by Treiman et al (2019) in Scientific Studies of Reading on “The unique role of spelling in the prediction of later literacy performance” or in actual fact word reading performance because that is the only outcome measure in this study, albeit measured longitudinally between kindergarten and ninth grade and in 970 children. The upshot is that early spelling predicts unique variance in ongoing word reading skills after taking into account early phonological awareness, vocabulary and letter knowledge skills. Presumably spelling captures other important aspects of literacy knowledge such as orthographic knowledge and also I imagine morphological skills.

I have been interested in spelling for a while now because it is the aspect of literacy most likely to be impaired in children who have speech sound disorders. Furthermore, the Quebec government (that funded the research that I will describe here) had been concerned by falling literacy test scores across the province’s schools and the scores for orthography (a combination of spelling and morphology) had been particularly low. Specifically the percentage of children passing the province wide literacy test with respect to orthography fell from 87% in the year 2000 to 77% in 2005 whereas the proportion of children scoring in the unsatisfactory range increased from 5% to 11% over the same period.

Therefore, a group of us set out to develop a tool to predict spelling difficulties in French-speaking children in Quebec, the result being PHOPHLO (Prédiction des Habiletés Orthographiques Par des Habiletés Langage Oral). Specifically, we hypothesized that spelling difficulties at the end of the first and third grades could be predicted by examining oral language skills at the end of kindergarten/beginning of first grade using an ipad based screen of speech perception, speech production, rime awareness and morphology productions skills (more about the test at The test was found to accord well with teacher predictions of spelling difficulties and objective measures of spelling at the end of first grade:

Kolne, K., Gonnerman, L., Marquis, A., Royle, P., & Rvachew, S. (2016). Teacher predictions of children’s spelling ability: What are they based on and how good are they? Language and Literacy, 18(1), 71-98. [open access]

In a larger study we documented specificity and sensitivity of 93% and 71% respectively for the prediction of spelling at the end of second grade:

Rvachew, S., Royle, P., Gonnerman, L., Stanké, B., Marquis, A., & Herbay, A. (2017). Development of a Tool to Screen Risk of Literacy Delays in French-Speaking Children: PHOPHLO. Canadian Journal of Speech-Language Pathology and Audiology, 41(3), 321-340. [open access]

We are especially proud of this latter paper because it won the editor’s paper from CJSLPA. And I am especially proud of Alexandre Herbay because he created such beautiful software with only 6 months of funding from MITACS.

The reason for this blog is that it was only after publishing these papers that it occurred to me to look for gender effects in the data! I don’t know why because the province wide literacy test results had been flagging issues with gender differences in literacy performance all along. There has been a significant gap favouring the girls in literacy performance across all scoring criteria since 2000: even after improving the success rate considerably since 2005, the gender gap persists. For example, in 2010 88.7% of children passed orthography but the rate for girls was 90.1% versus the rate for boys at 81.3%. With this concern about the performance of boys looming large at the provincial level, it finally occurred to me to wonder if our PHOPHLO screener would be sensitive to gender differences.

The answer to my question is interesting on two accounts. First there turns out to be a big gender effect in spelling outcomes, as follows: girls who passed the PHOPHLO screener obtained a second grade spelling test score of 51 which compares to 40 for the girls who failed the PHOPHLO screener; boys who passed the PHOPHLO screener achieved a second grade spelling test score of 47 which compares to a spelling test score of 31 for boys who failed the PHOPHLO. This means that PHOPHLO predicted PHOPHLO performance for both boys and girls (main effect of PHOPHLO, F(1,74) = 26.71, p < .0001) but boys obtained lower scores than girls regardless of their PHOPHLO performance (main effect of gender, F(1,74) = 6.61, p = .012) with no significant interaction.

The second interesting finding however was that there was no gender difference in PHOPHLO scores: as measured by this screener the children had equivalent language skills at school entry. There are three possible explanations. The screener is only a screener and therefore it is quite likely that there are differences in language performance between the boys and girls at school entry that are uncovered by the PHOPHLO screener, given that boys and girls do have a different trajectory for early language development, although typically only for language production and it is often reported that they have caught up by school age. Another possibility is that these early language differences cause a difference in executive functions or temperament for boys that impacts their ability to learn literacy skills in school. The third possibility is that boys are treated differently in school due to gendered social expectations for behavior, interests and social identity that discourage literacy related activities for boys. In any case, this finding raises questions about what happens to boys at school between kindergarten and first grade. Our research is currently concerned with this question and I will share those results during my keynote address at the upcoming 2019 joint conference of Speech Pathology Australia and the New Zealand Speech Therapists Association in Brisbane.

How to score iPad SAILS

As the evidence accrues for the effectiveness of SAILS as a tool for assessing and treating children’s (in)ability to perceive certain phoneme contrasts (see blog post on the evidence here), the popularity of the new iPad SAILS app is growing. Now I am getting questions about how to score the new SAILS app on the iPad so I provide a brief tutorial here. The norms are not built into the app since most of the modules are not normed. However, four of the modules are associated with normative data and can be used to give a sense of whether children’s performance is within the expected range according to age/grade level. Those normative data have been published in our text “Developmental Phonological Disorders: Foundations of Clinical Practice” (derived from the sample described in Rvachew, 2007) but I reproduce the table here and show how to use it.

When you administer the modules lake, cat, rat and Sue you will be provided with an overall Level score for all the Levels in each module as well as item by item scores on the Results page. As an example, I show the results page below after administering the  rat module.

SAILS results screenshot rat

The screen shot shows the item-by-item performance on the right hand side for Level 2 of the rat module. On the left hand side we can see that the total score for Level 2 was 7/10 correct responses and the total score for Level 1 was 9/10 correct responses (we ignore responding to the Practice Level). To determine if the child’s perception of “r” is within normal limits, average performance across Levels 1 and 2: [(9+7)/20]*100 = 80% correct responses. This score can be compared to the normative data provided in Table 5-7 of the second edition of the DPD text, as reproduced below:

SAILS Norms RBL 2018

Specifically a z-score should be calculated: (80-85.70)/12.61 = -.45. In other words, if the child is in first grade, the z score is calculated by taking the obtained score of 80% minus the expected score of 85.70% and dividing the result by the standard deviation of 12.61 which gives a z score that is less than one standard deviation below the mean. Therefore, we are not concerned about this child’s perceptual abilities for the “r” sound. When calculating these scores, observe that some modules have one test level, some have two and some have three. Therefore the average score is sometimes based on 10 total responses, sometimes on 20 total responses as shown here, and sometimes on 30 total responses.

The child’s total score across the four modules lake, cat, rat and Sue can be averaged (ignoring all the practice levels) and compared against the means in the row labeled “all four”. Typically you want to know about the child’s performance on a particular phoneme however because generally children’s perceptual difficulties are linked to those phonemes that they misarticulate.

Normative data has not been obtained for any of the other modules. Typically however, a score of 7/10 or less than 7/10 is not a good score – a score this low suggests guessing or not much better than guessing given that this is a two alternative forced choice task.

Previously we have found that children’s performance on this test is useful for treatment planning in that children with these speech perception problems will achieve speech accuracy faster when the underlying speech perception problem is treated. Furthermore, poor overall speech perception performance  in children with speech delay is associated with slower development of phonological awareness and early reading skills.

I hope that you and your clients enjoy the SAILS task which can be found on the App Store, with new modules uploaded from time to time:


Feedback Errors in Speech Therapy

I have been spending hours reviewing video of student SLPs (SSLPs) conducting speech therapy sessions, looking for snippets to take to my upcoming talks at ASHA2018. The students are impressively skilled with a very difficult CAS population but after this many hours of watching, repeated examples of certain categories of errors pile up in the provision of feedback to children about their attempts to produce the targets words, phrases and sentences. I am going to provide some examples here with commentary. In no way am I meaning any disrespect to the students because it is my experience that the average person becomes an idiot when a camera is pointed at them. I recall hearing about studies on the “audience effect” as an undergraduate – the idea is that when your skills are shaky you get worse when someone is watching but when your skills are excellent an audience actually enhances them. My social psychology prof said this even works for cockroaches! I can’t vouch for that but it certainly works for speech pathologists. I remember one time video-taping a session that was required for a course – I thought it went really well so I gave a copy to the parents and the course instructor. Later when watching it I could see clearly that for the whole half hour the child was trying desperately and without success to tell me that I was calling him by the wrong name (I had mixed him up with his twin brother whom I was also treating). I was oblivious to this during the live session but it was clear on the video. Anyway, these examples are not reflections on the students’ skill levels overall but they are examples of common feedback errors that I see in novice and experienced SLPs. Interestingly the clinical educators (CEs) who were supervising these sessions rarely mentioned this aspect of the students’ practice. Readers may find this blog useful as a template for reviewing student practice.

Category 1: No feedback

Child: [repeats 5 different sentences containing the target /s/ cluster words]

SSLP: [Turns to CE.] “What did you get?” [This is followed by 1 minute and 40 seconds of conversation about the child’s level of accuracy and strategies to improve it on the next block of trials.]

SSLP: [Turns back to child.] “You need to sit up. You got 2 out 5 correct. Now we’re going to count them on my fingers…”

Child: “Do we have to say these?”

Comment on vignette: In this case the SSLP did finally give feedback but too late for it to be meaningful to the child and after the telling the child off for slouching in her chair! Other variants on this are taking notes about the child’s performance or turning to converse with the child’s parent or getting caught up in the reinforcement game and forgetting to provide feedback. In CAS interventions it is common to provide feedback on a random schedule or to provide summative feedback after a block of trials. However, the child should be able to predict the block size and have information about whether their performance is generally improving or not. Even if the child does not have a count of number or percent trials correct, the child should know that practice stimuli are getting more difficult, reflecting performance gains. Sometimes, we deliberately plan to not provide feedback because we want the child to evaluate his or her own productions, but in these cases the child is told beforehand and the child is given a means of explicitly making that judgment (e.g., putting token in jar). Furthermore, the SSLP would be expected to praise the child for making accurate self-judgments or self-corrections. When the child does not get feedback or cannot track their own progress they will lose interest in the activity. It is common for SSLPs to change the game thinking that it is not motivating enough but there is nothing more motivating than a clear sense of success!

Possible solutions: Video record sessions and ask students to watch for and count the frequency of events in which the child has not received expected feedback. Provide child with visual guides to track progress indexed either as correct trials or difficulty of practice materials.

Category 2: Ambiguous feedback

SSLP: “Say [ska].”

Child: “[skak]”

SSLP: “OK, take the fish out.”

Comment on the vignette: In this case it is not clear if the SSLP is accepting the inexact repetition of her model. In our CAS interventions we expect the child to produce the model exactly because metathesis and other planning errors are common and therefore I would consider this production to be incorrect. Other ambiguous feedback that I observed frequently were “Good try” and “Nice try” and similar variants. In these cases the child has not received a clear signal that the “try” was incorrect. Another version of ambiguous feedback is to comment on the child’s behavior rather than the child’s speech accuracy (e.g., “You did it by yourself!” in which case the “it” is ambiguous to the child not clearly related to the accuracy of the child’s speech attempts).

Possible solutions: SSLPs really do not like telling children that have said something incorrectly. Ask students to role play firm and informative feedback. Have the students plan a small number of clear phrases that are acceptable to them as indicators of correct and incorrect responses (e.g., “I didn’t hear your snake sound” may be more acceptable than “No, that’s wrong”). Post written copies of the phrases somewhere in the therapy room so that the SLP can see them. Track the use of vague phrases such as “nice try” and impose a mutually agreed but fun penalty for exceeding a threshold number (buy the next coffee round for example). This works well if students are peer coaching.

Category 3: Mixed signals

SSLP: “Say [ska].”

Child: “[s:ka]”

SSLP: “Good job! Take the fish out.” [Frown on face].

Comment on the vignette: I am rather prone to this one myself due to strong concentration on next moves! But it is really unhelpful for children with speech and language delays who find the nonverbal message much easier to interpret than the verbal message.

Possible solutions: It would be better if SLP therapy rooms looked like a physiotherapy room. It annoys the heck out of me when we can’t get them outfitted with beautiful wall to ceiling mirrors. The child and SLP should sit or stand in front of the mirror when working on speech. Many games can be played using ticky tack or reusable stickers or dry erase pens. The SLP will be more aware of the congruence or incongruence between facial expressions, body language and verbal signals during the session.

Category 4: Feedback that reinforces the error

SSLP: “Repeat after me, Spatnuck” [this is the name of a rocket ship in nonsense word therapy].

Child: “fatnuck”

SSLP: “I think you said fatnuck with a [f:] instead of a [s:].

Comment on the vignette: Some SSLPs provide this kind of feedback so frequently that the child hears as many models of the incorrect form as the correct form. This is not helpful! This kind of feedback after the error is not easy for young children to process. To help the child succeed, it would be better to change the difficulty level of the task itself and provide more effective support before the next trial. After attempts, recasting incorrect tries and imitating correct tries can help the child monitor their own attempts at the target.

Possible solutions: Try similar strategies as suggested for ambiguous feedback. Plan appropriate feedback in advance. Plan to say this when the incorrect response is heard: “I didn’t hear the snake sound. Let’s try just the beginning of the word, watch me: sss-pat.” And when “spat” is achieved, plan to say “Good, I heard spat, you get a Spatnuck to put in space.”

Category 5: Confused feedback

SSLP: “Oh! Remember to curl your tongue when you say shadow.”

SSLP: “Oh! You found another pair.”

Child: “It’s shell [sʷɛo].”

SSLP: “Oh! I like the way you rounded your lips. Where is your tongue? Remember to hide your tongue.”

SSLP: Oh! You remembered where it was. You found another pair.”

Child: “Shoes [sʷuz].”

SSLP: “Oh! I like the way you rounded your tongue.”

Comment on vignette: In this vignette the SSLP is providing feedback about three aspects of the child’s performance-finding pairs when playing memory, rounding lips when attempting “sh” sounds, and in some cases anterior tongue placement when attempting the “sh” sound as well. One aspect of her feedback that is confusing when watching the video is the using of the exclamation “Oh!” Initially it appeared to signal an upcoming correction but it became so constant that it was not a predictable signal of any kind of feedback and was confusing. The exclamation had a negative valence to it but it might precede a correction or positive feedback. The SSLP confused her feedback about lips and tongue and it was not clear whether she was expecting the child to achieve the correct lip gesture, the correct tongue gesture or both at the same time.

Possible solutions: This can happen when there is too much happening in a session. The CE could help the SSLP restructure the session so that she can focus her attention on one aspect of the child’s behavior at a time, like this: “I want you to name these five pictures. Each time I am going to watch your lips. When you are done you can put the pictures on the table and mix them up for our game later.” If the child rounds the lips each time, switch to focusing on the tongue. When the ten cards are on the table play memory, modeling the picture names. In this way the three behaviors (rounding lips, retracting tongue, finding pairs) are separated in time and the SSLP can focus attention on each one with care, providing appropriate feedback repeatedly during the appropriate intervals.

Category 6: Confused use of reinforcement materials

SSLP: “Repeat after me, [ska].”

Child: “[θak]”

SSLP: [ska]

Child: “[θak]”

SSLP: “OK, take the fish out.”

SSLP: “Repeat after me, [ska].”

Child: [ska]

SSLP: “There you got it, take the fish out.”

SSLP: “Repeat after me, [ska].”

Child: [ska]

SSLP: “Good, and the last one, [ska].”

Child: [ska]

SSLP: “That’s good, take the fish out.”

Comment on vignette: In this vignette the child cannot tell if he gets a fish for correct answers or wrong answers or any answer. It is even worse if the child has been told that he will get a fish for each correct answer. Sometimes a student will say “Everything was going fine, we were having fun and then he just lost it!” When you look at the video you see exchanges such as the one reproduced here leading up to a tantrum by the child. The SSLP has broken a promise to the child. They don’t forgive that.

Possible solutions: This one is hard because it is a classic rooky mistake. Experience is the best cure. Reducing the number of tasks that the SSLP must do simultaneously may help. Therefore, in the early sessions the CE might keep track of the child’s correct and incorrect responses for the SSLP and allow her to focus on managing the materials and the child’s behavior. SSLPs would never think of this but it is possible to let the child manage the reinforcement materials themselves in some cases. One of our favorite vignettes, reprinted on page 463 of DPD2e (Case Study 9-4) involved an error detection activity in which the child could put toy animals in the barn but only when the SSLP said the names of the animals correctly. The child had the toys in his hands throughout the activity. He would not put them in the barn unless the clinician said the words correctly and would get annoyed if she said them wrong, telling her “you have to say cow [kau]!” SSLPs can learn that it is not necessary to control everything.

I put these here for students and clinical educators and speech-language pathologists and hope that you will have fun finding these feedback mishaps in your own sessions. If you come up with better strategies to avoid them than I have suggested here please share them in the comments.

L2 learning of new phonetic contrasts: How hard is that?

Given my career long interest in the impact of perceptual knowledge on speech production learning it was gratifying to read the meta-analysis by Sakai and Moorman(2018) that concluded “Ultimately, the present meta-analysis was able to show that perception-only training can lead to production gains. This finding is encouraging to L2 instructors and learners.” They found that, when teaching adult L2 learners to perceive a foreign language phonetic contrast, an average effect size of .92 was obtained for gains in perception while an average effect size of .54 was obtained for gains in production accuracy for the same phonetic contrast. They reviewed studies in which no production practice or training was provided and therefore these changes in production accuracy were a direct effect of the perceptual training procedure.

Francoise and I were puzzled by one part of their paper however. They excluded 12 studies because they claimed that they were unable to obtain the data required for the calculation of effect sizes from the authors. Our prior study on training English speakers to perceive French vowels was excluded on this basis despite the fact that I am right here with the data all neat and tidy in spread sheets (I guess these things happen although it is the second time that this has happened to me now so it is getting to be annoying). Nonetheless, we have all the data required to calculate those effect sizes so I provide them here for each group. There were seven conditions in our study, manipulating variability in the talkers (multiple versus single talker) and the position of the training vowels on the continuum (far from the category boundary, prototypical location in category space, and close to the category boundary). Given a control condition in which listeners categorized grammatical items rather than vowel tokens, we have seven conditions: control (CON), single voice prototype (SVP), multiple voice prototype (MVP), single voice far (SVF), multiple voice far (MVF), single voice close (SVC), multiple voice close (MVC). I provide the effect sizes below calculated as described by Sakai et al. The figures our paper reflect our statistical analysis which indicate a reliable effect of the training on perception in the MVF and MVC conditions but to our disappointment no reliable effect on production. All groups improved production (including the control group) when acoustic measures were considered, but these acoustic changes were not perceptible to native French listeners, as in there was no significant effect of time (pre to post training) or condition and no interaction when we submitted listener ratings of the participants’ production effects to a repeated measures ANOVA. Nonetheless, some moderate effect sizes are seen below in the SVC and MVC conditions, relative to the CON condition. Two effect sizes are reported for the perception outcomes and the production outcomes: ES(PP) which reflects pre- to post-training changes and ES(PPC) which reflects the difference between the change observed in the experimental group versus the control group.

Brosseau Lapre et al Applied Psycholinguistics perception ES

This table reports similar findings to those described by Sakai and Moorman in that the ES(PPC) is considerably larger than the ES(PP) although this latter ES is smaller than the mean ES that they found. However, we had not expected all of our conditions to be equally effective. Overall, we concluded that training with multiple talkers was most effective when listeners were presented with a range of vowel stimuli that were far from the category boundary; however, training with a single voice was most effective when listeners were presented with vowel stimuli that were close to the category boundary.

Regarding production outcomes however, the figures with error bars in our paper really tell the story most clearly, showing no reliable effect of the perception training on production outcomes from the listener’s perceptive despite large changes in acoustic parameters for all groups, including the control group, suggesting that the perception testing alone (as conducted pre and post training) has at least a short term effect on production.

Brosseau Lapre et al Applied Psycholinguistics production ES

With respect to changes in production, these effect size data might suggest an effect of perception training on production in the single voice close and multiple voice close conditions but overall there were no statistically significant findings for production accuracy and despite small improvements in the SVC and MVC conditions, the average ratings are not good. This is one issue I always have with meta-analyses: they are concerned with the size of “effects” as measured by d but the d values do not tell you whether the “effects” in any of the studies so aggregated were actual effects as in statistically or even functionally significant. Now, theoretically, if you aggregate a lot of moderate effect sizes from a lot of underpowered studies, they could add up to something but in this case I think we have a picture of an effect that is very idiosyncratic. We really don’t know why some L2 participants learn these contrasts and some don’t in the perception or production domains. Sakai and Moorman do us a service by exploring some potential sources of heterogeneity in outcomes. It is possible that many  training sessions targeting about 3 contrasts over a three hour total training period and completed at home may be optimal. Furthermore, in terms of participant characteristics, beginners make more progress than intermediate level learners. Overall however, the characteristics of successful versus unsuccessful learners are not clear despite a growing number of studies that examine underlying perceptual and cognitive skills as predictors. Personally, I find it a bit discouraging to read the Sakai and Moorman paper. The authors were quite excited to find a reliable moderate effect size across a growing number of studies. But I know that those effect sizes are associated with a lot of people who cannot produce a foreign language speech sound that sounds even half-ways “native-like.” After forty years of work involving quite sophisticated methods for designing stimuli and training regimens I thought we would be further along. Definitely more work to do on this problem.



Words are where it’s at

It is probable that you have seen at least references to the Sperry, Sperry & Miller (2018) paper in Child Development because it made a big splash in the media–the claim of “challenging” Hart & Risley’s (1992/1995) finding of a “30 million word gap” in language input to children with “poor” versus “professional” parents caused a lot of excitement. You may not have seen the follow-up commentary by Golinkoff, Hoff et al and then then the reply by Sperry et al. I want to say something about vocabulary and phonological processing with these papers as a jumping off point which means I have to summarize their debate as efficiently as I can–no easy task because there is a lot going on in those papers but here is the short version:

  1. Sperry, Sperry & Miller (paradoxically) present a very good review of the literature showing that cross-culturally, language input provided directly TO children (i.e., the words that the child hears) predicts language outcomes. This is true even though there is a lot of variation in how much language is directed to children (versus spoken in the vicinity of children) across different cultures and SES strata: regardless of those differences, it is the child-directed speech that matters to rate of language development. Nonetheless, they make the point that overheard speech is understudied and present data indicating that children in poor families across a variety of different ethnic communities might hear and overhear as much speech as middle-class children. Their study is not at all like the Hart & Risley study and therefore the media claims of a “failure to replicate” are inappropriate and highly misleading.
  2. Golinkoff, Hoff et al reiterate the argument that they have been making for decades: children do well in school when they have good language skills; language skills are driven by the quantity and quality of child-directed inputs provided. The focus on the 30 million word gap has led to the development of effective parenting practices and interventions and a de-emphasis on Hart & Risley’s findings would be harmful to children in lower SES families (note that no one is arguing that all poor children receive inadequate inputs or that all racialized children are poor either, these are straw-man arguments).
  3. Sperry et al reply to this comment by saying “Based on the considerable research already cited here and in our study, we assert that it is a mistake to claim that any group has poor language skills simply because their skills are different. Furthermore, we believe that as long as the focus remains on isolated language skills (such as vocabulary) defined by mainstream norms, testing practices, and curricula, nonmainstream children will continue to fail. We believe that low-income, working class, and minority children would be more successful in school if pedagogical practices were more strongly rooted in a strengths-based approach…”

We can all get behind a call for culturally sensitive and fair tests I am sure. As speech-language pathologists we are very motivated to take a strengths-based approach to assessment as well. It is also important to understand that when mothers are talking with their children, they are not transmitting words alone, but also culture. Richman et al. (1992) describe how middle-class mothers in Boston engaged their infants in “emotionally arousing conversational interactions” whereas Gusii mothers “see themselves as protecting their infants” and focused on soothing interactions that moderated emotional excitement; in this same paper, increased maternal schooling was observed to be associated with increased verbal responsiveness to infants by Mexican mothers when compared to mothers from the same community with less education. Therefore, encouraging a “western” style of mother-infant vocal interaction may well conflict with the maternal role of enculturating her infant to valid social norms that differ from western or mainstream values. The call to respect those cultural norms, reflected in Sperry et al’s reply obviously deserves more serious consideration than shown in Golinkoff et al’s urgent plea to maximize vocabulary size.

Nonetheless, Sperry et al are engaging in some wishful thinking when they claim that “young children in societies where they are seldom spoken to nonetheless attain linguistic milestones at comparable rates.” In fact, the only evidence they point to in support of this claim pertains to pointing as a form of nonverbal communication. While it is evidently true that culture is a strong determinant of mother-infant interactional style, it makes no sense to argue that differences in the style of interaction and the amount of linguistic input make no difference to language learning. Teaching interactions vary with culture but learning mechanisms do not (unless you are arguing that there are substantial genetic variations in neurolinguistic mechanisms across ethnic groups and I am absolutely not arguing that, the complete opposite). When Linda Polka and I were studying selective attention in infant speech perception development we talked about speech intake as opposed to speech input. Certainly there may be different ways to engage the infant’s attention but ultimately the amount and quality of linguistic input that the child actively receives will impact the time course of language development. Immigrant parents may not be aiming for western middle class outcomes for their children but when they are, tools to increase vocabulary size in the majority language will be essential.

The other part of Sperry et al’s argument is that children who are not middle class speakers of English in North America might have strengths in other aspects of language (story telling, for example) that must be valued. Vocabulary is deemed to be an isolated skill. This is the part of their argument that I find to be most problematic. Vocabulary is central to all aspects of language learning: phonology and phonological processing, morphology, and syntax, in the oral and written domains. Words are the heart and soul of language and language learning. It is difficult to understand how the child could achieve excellence as a story teller without a good vocabulary. Furthermore, vocabulary is not learned in isolation from all those other aspects of language including the social, pragmatic and cultural. For those children receiving speech-language pathology services, a large vocabulary is protective: if the child for whatever reason has phonological or language processing deficits that make it difficult to learn phonological awareness or decoding skills or morphology or syntax, a large vocabulary can help compensate for those weaknesses. For a speech-language pathologist, a strengths-based perspective may well mean engaging all the people in the child’s environment to build on the child’s vocabularies in the home and school languages as a means of compensating for difficulties in these other areas of language. More typically, what I see is a narrow focus on phonological awareness or morphology or syntax because these skills are weaker and presumably more “important.” But vocabulary is one area where nonprofessionals, paraprofessionals and other professionals can make a huge difference and what a difference it makes!

Further to this topic I am adding below an excerpt from our book Rvachew & Brosseau-Lapré (2018) along with the associated Figure. I also recommend papers by Noble and colleagues on the neurocognitive correlates of reading (an effect that I am sure is also mediated by vocabulary size).

“Vocabulary skills may be an area of relative strength for children with DPD and therefore it may seem unnecessary to teach their parents to use dialogic reading techniques to facilitate their child’s vocabulary acquisition. If the child’s speech is completely unintelligible, low average vocabulary skills are not likely to be the SLP’s highest priority and with good reason! However, good vocabulary skills may be a protective factor for children with DPD with respect to literacy outcomes. Rvachew and Grawburg (2006) conducted a cluster analysis based on the speech perception, receptive vocabulary, and phonological awareness test scores of children with DPD. The results are shown graphically in Figure 9–2. In this figure, receptive vocabulary (PPVT–III) standard scores are plotted against speech perception scores (SAILS; /k/, /s/, /l/, and /ɹ/ modules), with different markers for individual children in each cluster. The figure legend shows the mean phonological awareness (PA) test score for each cluster. The normal limits for PPVT performance are between 85 and 115. The lower limit of normal performance on the SAILS test is a score of approximately 70% correct. Clusters 3 and 4 achieved a mean PA test score within normal limits (i.e., a score higher than 15), whereas Clusters 1 and 2 scored below normal limits on average. The figure illustrates that the children who achieved the highest PA test scores had either exceptionally high vocabulary test scores or very good speech perception scores. The cluster with the lowest PPVT–III scores demonstrated the poorest speech perception and phonological awareness performance. These children can be predicted to have future literacy deficits on the basis of poor language skills alone (Peterson, Pennington, Shriberg, & Boada, 2009). The contrast between Clusters 2 and 3 shows that good speech perception performance is the best predictor of PA for children whose vocabulary scores are within the average range. All children with exceptionally high vocabulary skills achieved good PA scores, however, even those who scored below normal limits on the speech perception test. The mechanism for this outcome is revealed by studies that show an association between vocabulary size and language processing efficiency in 2-year-old children that in turn predicts language outcomRBL Figure 9-2es in multiple domains over the subsequent 6 years (Marchman & Fernald, 2008). Individual differences in processing efficiency may reflect in part endogenous variations in the functioning of underlying neural mechanisms; however, research with bilingual children shows that the primary influence is the amount of environmental language input. Greater exposure to language input in a given language “deepens language specific, as well as language-general, features of existing representations [leading to a] synergistic interaction between processing skills and vocabulary learning” (Marchman & Fernald, 2008, p. 835). More specifically, a larger vocabulary size provides access to sublexical segmental phonological structure, permitting faster word recognition, word learning, and metalinguistic understanding (Law and Edwards, 2015). From a public health perspective, teaching all parents to maximize their children’s language development is part of the role of the SLP. For children with DPD it is especially important that parents not be so focused on “speech homework” that daily shared reading is set aside. SLPs can help the parents of children with DPD use shared reading as an opportunity to strengthen their child’s language and literacy skills and provide opportunities for speech practice (p. 469).”.


Conversations with SLPs: Nonword Practice Stimuli

I often answer queries from speech-language pathologists about their patients or more abstract matters of theory or clinical practice and sometimes the conversations are general enough to turn into blog topics. On this occasion I was asked my opinion about a specific paper with the question being generally about the credibility of the results and applicability of the findings to clinical practice:

Gierut, J., Morrisette, M. L., & Ziemer, S. M. (2010). Nonwords and generalization in children with phonological disorders. American Journal of Speech-Language Pathology, 19, 167-177.

In this paper the authors conduct a retrospective review of post treatment results obtained from 60 children with a moderate-to-severe phonological delay who had been treated in the context of research projects gathered under the umbrella of the “learnability project”. Half of these children had been taught nonwords and the remainder real words, representing phonemes for which the children demonstrated no productive phonological knowledge. The words (both the nonword targets and the real word targets) were taught in association with pictured referents, first in imitation and then in spontaneous production tasks. Generalization to real word targets was probed post-treatment. Note that the phonemes probed included those that were treated and any others that the child did not produce accurately at baseline. The results show an advantage to treated over untreated phonemes that is maintained over a 55 day follow-up interval. Greater generalization was observed for children who received treatment for nonwords compared to those children who received treatment for real words, but only for treated phonemes and only immediately post treatment because over time the children who received treatment for real words caught up to the other group.

OK, so what do I think about this paper. Overall, I think that it provides evidence that it is not harmful to use nonwords in treatment which is a really nice result for researchers. As Gierut et al explain, nonwords are handy because “they have been incorporated into research as a way of ensuring experimental control within and across children and studies.” They can be designed to target the specific phonological strengths and needs of each child and it is very unlikely that the family or school personnel will practice them outside of clinic and therefore it is possible to conclude that change is due to the experimental manipulation. Gierut et al go one step further however and conclude that nonword stimuli might offer an advantage for generalization learning because “the newness of the treated items might reduce interference from known words.” Here I think that the evidence is weaker simply because this is a nonexperimental study. The retrospective nature of the study and the fact that children were not assigned with blind random assignment in one cohort to be taught with one set of stimuli vs the other while holding other aspects of the design constant limits the conclusions that one can draw. For example, the authors point out that the children who were treated with nonwords received more treatment sessions than those treated with real words. Therefore, in terms of clinical implications, the study does not offer much guidance to the SLP beyond suggesting that there may be no harm in using nonword stimuli if the SLP has specific reasons for doing so.

We can offer experimental prospective evidence on this topic from my lab however. It is also limited in that it involves only two children but they were both treated with a single subject randomization design that provides excellent internal validity. This study was conducted by my former student Dr. Tanya Matthews with support from Marla Folden, M.Sc., S-LP(C). The interventions were provided by McGill students in speech-language pathology who were completing their final internship. The two children presented with very different profiles: TASC02 had childhood apraxia of speech with an accompanying cognitive delay and ADHD. TASC33 presented with a mild articulation delay and verbal and  nonverbal IQ within normal limits.

Both children were treated according to the same protocol: they received 18 treatment sessions, provided 3 per week for 6 weeks. Each week they experience three different treatment conditions, randomly assigned to one of the 3 sessions and a unique target as shown in the table below for the two children. Each session consisted of a preprepractice portion and a practice portion. The prepractice was either Mixed Procedures (auditory bombardment, error detection tasks, phonetic placement, segmentation and chaining of segments with the words) or Control (no prepractice). In all three conditions practice was high intensity practice employing principles of motor learning.

realword vs nonword conditions

Random assignment of condition/target pairs to sessions within weeks permits the use of resampling tests to determine if there are statistically significant differences in outcomes as a function of treatment condition. Outcomes were assessed via imitation probes that were administered at the end of each treatment session to measure generalization to untreated items (same day probes) and probes that were administered approximately 2 days later (at the beginning of the next treatment session) to measure maintenance of those learning gains (next day probes). The next table shows the mean probe scores by condition and child, the test statistic (squared mean differences across conditions) and the associated p value for the treatment effect for each child.

realword vs nonword outcomes

The data shown in this table reveal no significant results for either child for same day or next day probe scores. In other words there was no advantage to the prepractice versus no prepractice condition and there was no advantage to nonword practice over real word practice.

We hope to publish some data soon that suggests that the specific type of prepractice might make a difference for certain children. But overall the most important driver of outcomes for children with speech sound disorders seems to be practice and lots of it.

Reproducibility: Which Levers?

I was reading about health behavior change today and I was reminded that there is a difference between a complicated system and a complex system (D.T. Finegold and colleagues) and it crystalized for me why  the confident pronouncements of the reproducibility folks strike me as earnest but often misguided. If you think about it, most laboratory experiments are complicated systems that are meant to be roughly linear: There may be a lot of variables and many people involved in the manipulation or measurement of those variables but ultimately those manipulations and measurements should lead to observed changes in the dependent variable and then there is a conclusion; by linear system I mean that these different levels of the experiment are not supposed to contaminate each other. There are strict rules and procedures, context-specific of course, for carrying out the experiment and all the people involved need to be well trained in those procedures and they must follow the rules for the experiment to have integrity. Science itself is another matter altogether. It is a messy nonlinear dynamic complex system from which many good and some astounding results emerge, not because all the parts are perfect, but in spite of all the imperfection and possibly because of it. Shiffren, Börner and Stigler (2018) have produced a beautiful long read that describes this process of “progress despite irreproducibility.” I will leave it to them to explain it since they do it so well.

I am certain that the funders and the proponents of all the proposals to improve science are completely sincere but we all know that the road to hell is paved with good intentions. The reason that the best intentions are not going to work well in this case is that the irreproducibility folks are trying to “fix” a complex system by treating it as if it is a complicated problem. Chris Chambers tells a relatively simple tale in which a journal rejects a paper (according to his account) because a negative result was reported honestly which suggests that a focus on positive results rewards cheating to get those results and voilà: the solution is to encourage publication without the results. This idea is fleshed out by Nosek et al (2018) in a grand vision of a “preregistration revolution” which cannot possibly be implemented as imagined or result in the conceived outcomes. All possible objections have been declared to be false (bold print by Chris Chambers) and thus they have no need of my opinion. I am old enough to be starting my last cohort of students so I have just enough time to watch them to get tangled up in it. I am a patient person. I can wait to see what happens (although curiously no objective markers of the success of this revolution have been definitively put forward).

But here’s the thing. When you are predicting the future you can only look to the past. So here are the other things that I read today that lead me to be quite confident that although science will keep improving itself as it always has done, at least some of this current revolution will end up in the dust. First, on the topic of cheating, there is quite a big literature on academic cheating by undergraduate students which is directly relevant to the reproducibility movement. You will not be surprised to learn that (perceived) cheating is contagious. It is hard to know the causal direction – it is probably reciprocal. If a student believes that everyone is cheating the likelihood that the student will cheat is increased. Students who cheat believe that everyone else is cheating regardless of the actual rate of cheating. Students and athletes who are intrinsically versus extrinsically motivated are also less likely to cheat so it is not a good idea to undermine intrinsic motivation with excessive extrinsic reward systems, especially those that reduce perceived autonomy. Cheating is reduced by “creating a deeply embedded culture of integrity:” Culture is the important word here because most research and most interventions target individuals but it is culture and systems that need to be changed. Accomplishing a culture of integrity includes (perhaps you will think paradoxically) creating a trusting and supportive atmosphere with reduced competitive pressures while ensuring harsh and predictable consequences for cheating. The reproducibility movement has taken the path of deliberately inflating the statistics on the prevalence of questionable research practices with the goal of manufacturing a crisis, under the mistaken belief that the crisis narrative is necessary to motivate change when it is more likely that this narrative will actually increase cynicism and mistrust, having exactly the opposite effect.

The second article I read that was serendipitously relevant was about political polarization. Interestingly, it turns out that perceived polarization reduces trust in government whereas actual polarization between groups is not predictive of trust, political participation and so on. It is very clear to me that the proponents of this movement are deliberately polarizing and have been since the beginning, setting hard scientists against soft, men against woman and especially the young against the old (I would point to parts of my twitter feed as proof of this I but I don’t need to contaminate your day with that much negativity, suffice to say it is not a trusting and supportive atmosphere). The Pew Center shows that despite decades of a “war against science” we remain one of the most trusted groups in society. It is madness to destroy ourselves from within.

A really super interesting event that happened in my tweet feed today was the release of the report detailing the complete failure of the Gates Foundation $600M effort to improve education by waving sticks and carrots over teachers with the assumption that getting rid of bad teachers was a primary “lever” that when pulled would spit better educated minority students out the other end (seriously, they use the word levers, it cracks me up; talk about mistaking a complex system for a complicated one). Anyway, it didn’t work. The report properly points out that that the disappointing results may have occurred because their “theory of action” was wrong. There just wasn’t enough variability in teacher quality even at the outset for all that focus on teacher quality to make that much difference especially since the comparison schools were engaged in continuous improvement in teacher quality as well. But of course the response on twitter today has been focused on teacher quality: many observers figure that the bad teachers foiled the attempt through resistance, of course! The thing is that education is one of those systems in our society that actually works really well, kind of like science. If you start with the assumption that that the scientists are the problem and if you could just get someone to force them to shape up (see daydream in this blog by Lakens in which he shows that he knows nothing about professional associations despite his excellence as a statistician)…well, I think we have another case of people with money pulling on levers with no clue what is behind them.

And finally, let’s end with the Toronto Star, an excellent newspaper, that has a really long read (sorry, its long but really worth your time) describing a dramatic but successful change in a nursing home for people with dementia. It starts out as a terrible home for people with dementia and becomes a place you would (sadly but confidently) place your family member. This story is interesting because you start with the sense that everyone must have the worst motives in order for this place to be this bad—care-givers, families, funders, government—and end up realizing that everyone had absolutely the best intentions and cared deeply for the welfare of the patients. The problem was an attempt to manage the risk of error and place that goal above all others. You will see that the result of efforts to control error from the top down created the hell that the road paved with good intentions must inevitably create.

So this is it, I may be wrong and if I am it will not be the first time. But I do not think that scientists have been wasting their time for the last 30 years as one young person declared so dramatically in my twitter feed. I don’t think that they will waste the next 30 years either because they will mostly keep their eye on whatever it is that motivated them to get into this crazy business. Best we support and help each other and let each other know when we have improved something but at the same time not get too caught up in trying to control what everyone else is doing. Unless of course you are so disheartened with science you would rather give it up and join the folks in the expense account department.

Post-script on July 7, 2018: Another paper to add to this grab-bag:

Kaufman, J. C., & Glǎveanu, V. P. (2018). The Road to Uncreative Science Is Paved With Good Intentions: Ideas, Implementations, and Uneasy Balances. Perspectives on Psychological Science, 13(4), 457-465. doi:10.1177/1745691617753947

I liked this perspective on science:

“The propulsion model is concerned with how a creative work affects the field. Some types of contributions stay within the existing paradigm. Replications,1 at the most basic level, aim to reproduce or recreate a past successful creation, whereas redefinitions take a new perspective on existing work. Forward or advance forward incrementations push the field ahead slightly or a great deal, respectively. Forward incrementations anticipate where the field is heading and are often quite successful, whereas advance forward incrementations may be ahead of their time and may be recognized only retrospectively. These categories stay within the existing paradigm; others push the boundaries. Redirections, for example, try to change the way a field is moving and ake it in a new direction. Integrations aim to merge two fields, whereas reinitiation contributions seek to entirely reinvent what constitutes the field.”