L2 learning of new phonetic contrasts: How hard is that?

Given my career long interest in the impact of perceptual knowledge on speech production learning it was gratifying to read the meta-analysis by Sakai and Moorman(2018) that concluded “Ultimately, the present meta-analysis was able to show that perception-only training can lead to production gains. This finding is encouraging to L2 instructors and learners.” They found that, when teaching adult L2 learners to perceive a foreign language phonetic contrast, an average effect size of .92 was obtained for gains in perception while an average effect size of .54 was obtained for gains in production accuracy for the same phonetic contrast. They reviewed studies in which no production practice or training was provided and therefore these changes in production accuracy were a direct effect of the perceptual training procedure.

Francoise and I were puzzled by one part of their paper however. They excluded 12 studies because they claimed that they were unable to obtain the data required for the calculation of effect sizes from the authors. Our prior study on training English speakers to perceive French vowels was excluded on this basis despite the fact that I am right here with the data all neat and tidy in spread sheets (I guess these things happen although it is the second time that this has happened to me now so it is getting to be annoying). Nonetheless, we have all the data required to calculate those effect sizes so I provide them here for each group. There were seven conditions in our study, manipulating variability in the talkers (multiple versus single talker) and the position of the training vowels on the continuum (far from the category boundary, prototypical location in category space, and close to the category boundary). Given a control condition in which listeners categorized grammatical items rather than vowel tokens, we have seven conditions: control (CON), single voice prototype (SVP), multiple voice prototype (MVP), single voice far (SVF), multiple voice far (MVF), single voice close (SVC), multiple voice close (MVC). I provide the effect sizes below calculated as described by Sakai et al. The figures our paper reflect our statistical analysis which indicate a reliable effect of the training on perception in the MVF and MVC conditions but to our disappointment no reliable effect on production. All groups improved production (including the control group) when acoustic measures were considered, but these acoustic changes were not perceptible to native French listeners, as in there was no significant effect of time (pre to post training) or condition and no interaction when we submitted listener ratings of the participants’ production effects to a repeated measures ANOVA. Nonetheless, some moderate effect sizes are seen below in the SVC and MVC conditions, relative to the CON condition. Two effect sizes are reported for the perception outcomes and the production outcomes: ES(PP) which reflects pre- to post-training changes and ES(PPC) which reflects the difference between the change observed in the experimental group versus the control group.

Brosseau Lapre et al Applied Psycholinguistics perception ES

This table reports similar findings to those described by Sakai and Moorman in that the ES(PPC) is considerably larger than the ES(PP) although this latter ES is smaller than the mean ES that they found. However, we had not expected all of our conditions to be equally effective. Overall, we concluded that training with multiple talkers was most effective when listeners were presented with a range of vowel stimuli that were far from the category boundary; however, training with a single voice was most effective when listeners were presented with vowel stimuli that were close to the category boundary.

Regarding production outcomes however, the figures with error bars in our paper really tell the story most clearly, showing no reliable effect of the perception training on production outcomes from the listener’s perceptive despite large changes in acoustic parameters for all groups, including the control group, suggesting that the perception testing alone (as conducted pre and post training) has at least a short term effect on production.

Brosseau Lapre et al Applied Psycholinguistics production ES

With respect to changes in production, these effect size data might suggest an effect of perception training on production in the single voice close and multiple voice close conditions but overall there were no statistically significant findings for production accuracy and despite small improvements in the SVC and MVC conditions, the average ratings are not good. This is one issue I always have with meta-analyses: they are concerned with the size of “effects” as measured by d but the d values do not tell you whether the “effects” in any of the studies so aggregated were actual effects as in statistically or even functionally significant. Now, theoretically, if you aggregate a lot of moderate effect sizes from a lot of underpowered studies, they could add up to something but in this case I think we have a picture of an effect that is very idiosyncratic. We really don’t know why some L2 participants learn these contrasts and some don’t in the perception or production domains. Sakai and Moorman do us a service by exploring some potential sources of heterogeneity in outcomes. It is possible that many  training sessions targeting about 3 contrasts over a three hour total training period and completed at home may be optimal. Furthermore, in terms of participant characteristics, beginners make more progress than intermediate level learners. Overall however, the characteristics of successful versus unsuccessful learners are not clear despite a growing number of studies that examine underlying perceptual and cognitive skills as predictors. Personally, I find it a bit discouraging to read the Sakai and Moorman paper. The authors were quite excited to find a reliable moderate effect size across a growing number of studies. But I know that those effect sizes are associated with a lot of people who cannot produce a foreign language speech sound that sounds even half-ways “native-like.” After forty years of work involving quite sophisticated methods for designing stimuli and training regimens I thought we would be further along. Definitely more work to do on this problem.