Boys and Spelling

I rather like this new paper by Treiman et al (2019) in Scientific Studies of Reading on “The unique role of spelling in the prediction of later literacy performance” or in actual fact word reading performance because that is the only outcome measure in this study, albeit measured longitudinally between kindergarten and ninth grade and in 970 children. The upshot is that early spelling predicts unique variance in ongoing word reading skills after taking into account early phonological awareness, vocabulary and letter knowledge skills. Presumably spelling captures other important aspects of literacy knowledge such as orthographic knowledge and also I imagine morphological skills.

I have been interested in spelling for a while now because it is the aspect of literacy most likely to be impaired in children who have speech sound disorders. Furthermore, the Quebec government (that funded the research that I will describe here) had been concerned by falling literacy test scores across the province’s schools and the scores for orthography (a combination of spelling and morphology) had been particularly low. Specifically the percentage of children passing the province wide literacy test with respect to orthography fell from 87% in the year 2000 to 77% in 2005 whereas the proportion of children scoring in the unsatisfactory range increased from 5% to 11% over the same period.

Therefore, a group of us set out to develop a tool to predict spelling difficulties in French-speaking children in Quebec, the result being PHOPHLO (Prédiction des Habiletés Orthographiques Par des Habiletés Langage Oral). Specifically, we hypothesized that spelling difficulties at the end of the first and third grades could be predicted by examining oral language skills at the end of kindergarten/beginning of first grade using an ipad based screen of speech perception, speech production, rime awareness and morphology productions skills (more about the test at The test was found to accord well with teacher predictions of spelling difficulties and objective measures of spelling at the end of first grade:

Kolne, K., Gonnerman, L., Marquis, A., Royle, P., & Rvachew, S. (2016). Teacher predictions of children’s spelling ability: What are they based on and how good are they? Language and Literacy, 18(1), 71-98. [open access]

In a larger study we documented specificity and sensitivity of 93% and 71% respectively for the prediction of spelling at the end of second grade:

Rvachew, S., Royle, P., Gonnerman, L., Stanké, B., Marquis, A., & Herbay, A. (2017). Development of a Tool to Screen Risk of Literacy Delays in French-Speaking Children: PHOPHLO. Canadian Journal of Speech-Language Pathology and Audiology, 41(3), 321-340. [open access]

We are especially proud of this latter paper because it won the editor’s paper from CJSLPA. And I am especially proud of Alexandre Herbay because he created such beautiful software with only 6 months of funding from MITACS.

The reason for this blog is that it was only after publishing these papers that it occurred to me to look for gender effects in the data! I don’t know why because the province wide literacy test results had been flagging issues with gender differences in literacy performance all along. There has been a significant gap favouring the girls in literacy performance across all scoring criteria since 2000: even after improving the success rate considerably since 2005, the gender gap persists. For example, in 2010 88.7% of children passed orthography but the rate for girls was 90.1% versus the rate for boys at 81.3%. With this concern about the performance of boys looming large at the provincial level, it finally occurred to me to wonder if our PHOPHLO screener would be sensitive to gender differences.

The answer to my question is interesting on two accounts. First there turns out to be a big gender effect in spelling outcomes, as follows: girls who passed the PHOPHLO screener obtained a second grade spelling test score of 51 which compares to 40 for the girls who failed the PHOPHLO screener; boys who passed the PHOPHLO screener achieved a second grade spelling test score of 47 which compares to a spelling test score of 31 for boys who failed the PHOPHLO. This means that PHOPHLO predicted PHOPHLO performance for both boys and girls (main effect of PHOPHLO, F(1,74) = 26.71, p < .0001) but boys obtained lower scores than girls regardless of their PHOPHLO performance (main effect of gender, F(1,74) = 6.61, p = .012) with no significant interaction.

The second interesting finding however was that there was no gender difference in PHOPHLO scores: as measured by this screener the children had equivalent language skills at school entry. There are three possible explanations. The screener is only a screener and therefore it is quite likely that there are differences in language performance between the boys and girls at school entry that are uncovered by the PHOPHLO screener, given that boys and girls do have a different trajectory for early language development, although typically only for language production and it is often reported that they have caught up by school age. Another possibility is that these early language differences cause a difference in executive functions or temperament for boys that impacts their ability to learn literacy skills in school. The third possibility is that boys are treated differently in school due to gendered social expectations for behavior, interests and social identity that discourage literacy related activities for boys. In any case, this finding raises questions about what happens to boys at school between kindergarten and first grade. Our research is currently concerned with this question and I will share those results during my keynote address at the upcoming 2019 joint conference of Speech Pathology Australia and the New Zealand Speech Therapists Association in Brisbane.


I am getting questions about our PHOPHLO project now that a part of it has inexplicably made Science Daily  so I will provide a summary of the project here with a few of the outcomes. This project is a collaboration between me and Laura Gonnerman at McGill University and Phaedra Royle at the University of Montréal. The testing of the children was coordinated by Phaedra’s post-doctoral student Alexandra Marquis under my supervision. The purpose of the project was to develop a screening test that could be administered to francophone children at the beginning of first grade to predict difficulties in the achievement of written language skills at the end of second grade. The project proceeded in three phases.

Pilot Phase. We began by administering a battery of tests of oral language skills to children in kindergarten and first grade classrooms in two schools. The tests included two tools developed by Françoise Brosseau-Lapré and me for the ECRIP trial, specifically a version of SAILS assessing the children’s perception of the word “gris” (French word for ‘grey’) and the Test de Conscience Phonologique Préscolaire, a measure of rime and onset matching that does not require any spoken responses. The third was a test of articulation accuracy developed by a former masters student Marianne Paul and subsequently validated by me and Françoise, Test de Dépistage Francophone de Phonologie . The fourth test was a measure of spoken morphological skills, Jeu de Verbes,  assessing the ability to produce appropriate past tense morphemes (a bit like a French ‘wug’ test), developed by Phaedra Royle at the University of Montréal. We also collected ratings of the children’s abilities from the teachers and conducted item analyses to develop a screening test from all this data that was a much shorter version of these four measures (44 items in total, about 20 minute administration time compared to 80 minutes). We have published two papers describing the children’s performance during Phase I, with comparisons across the unilingual French children and the multilingual French children (about 40% of the Phase I children spoke a language other than French at home; the language at school is 100% French). Rvachew et al. (2013), published in Clinical Linguistics & Phonetics  , reported that (1) there were no statistically significant differences in articulation accuracy across these two groups; (2) there was a slight tendency toward more errors involving the features [+voice] and [-anterior] nonetheless; and (3) all groups produced more errors in unstressed syllables. Marquis et al. (2012), published in Travaux interdisciplinaires sur la parole et le langage, reported that there were significantly fewer errors for the verb group [é] than the other verb groups tested.

Phase I. During this phase we administered the new screening test to 91 first grade children with approximately half being unilingual speakers of French and the remainder being multilingual. We also asked for teacher ratings of the child’s risk for future difficulties in the acquisition of writing skills and we piloted a measure of spelling abilities with a subset of these children in the latter half of the first grade year. Kendall Kolne, a doctoral student co-supervised by Laura and I, has just submitted a manuscript to Language and Literacy describing the relationship between the children’s performance on the Phase I tests, some other demographic measures,  and the teacher ratings. First, we found that language background, the education levels of parents, home literacy practices, and oral language skills (as measured by our screening test) all differentiated whether or not the children were identified by their teachers as at-risk for future writing difficulty. With respect to predicting first grade spelling skills, teacher ratings accounted for more variance than our screening test but the screening test and the teacher rating combined predicted the most variance in spelling ability (52% overall),and there was some unique information provided by the screening test. I leave it to the reader to decide whether it is good news or bad news that our 20 minute screen is as good as or is only as good as the teacher’s opinion! We did find a few interesting exceptions where the teachers relied overly much on family background information, leading to over and under estimation of risk for a few children. We have not yet done an analysis to find out how well the teacher ratings hold up over the longer term.

Phase II. In the final phase of the project we administered three tests of written language skills to the Phase I children when they reached the end of second grade (we were able to test 78 of the original sample at this time). The primary outcome measure was a standardized test of spelling that included nonwords, real words and sentences (BELO; Georges, F., & Pech-Georgel, C. (2006). BELO – Batterie d’évaluation de lecture et d’orthographe: Éditions Solal). We found that the screening test administered in first grade was a reasonably good predictor of BELO scores: the sensitivity of the PHOPHLO (i.e., proportion of true positives identified) was 66% while the specificity (i.e., proportion of true negatives identified) was 92%. I expect to submit a manuscript describing these results next year. We also administered two additional measures of written morphology skills developed by Phaedra Royle and Laura Gonnerman. Alex Marquis presented some of these results at the Romance Turn IV Congress in September 2014 and it is this presentation that made the big splash. They reported on the relationship between oral morphosyntax skills measured in first grade and written morphosyntax skills measured at the end of second grade. The written task involved reading sentences and choosing the correct one of three alternatives, e.g., Paul a/as/à une amie. (in this case the first alternative is the correct one). The interesting finding was that the multilingual children performed much more poorly than the unilingual children on the oral language task but there were no differences between these two groups on the written language tasks including the morphosyntax task or the spelling task. More interpretation is found in the Science Daily piece.

Post study phases: Laura Gonnerman is following these and other children into the later grades and confirms a mixed pattern of strengths and weakness in the multilingual group relative to the unilingual French group. She is also looking at the efficacy of certain teaching strategies. I am working with a company (iLanguageLabs) to create a software version of the PHOPHLO. It is nearing completion and I am hopeful that it will be available on the chrome store in 2016 for use by teachers and orthophonistes (SLPs). As they say, the proof of the pudding is in the eating, and in this case the proof will be in cross validation with a new and larger sample of children. I will seek funding and move forward with this phase when the app is fully operational. For myself, I am prone to be a bit cautious with my conclusions until the “proof” is in, which could be years yet.