Reproducibility: On the Nature of Scientific Consensus

The idea that scientists who raise questions about whether (ir)reproducibility is a crisis or not are like the “merchants of doubt” is argued via analogy with, for example, climate change deniers. It’s a multistep analogy. First there is an iron-clad consensus on the part of scientists that humans are causing a change in the climate that will have catastrophic consequences. Because the solutions to the problem threaten corporate interests, those big money interests astroturf groups like “Friends of Science” to sow doubt about the scientific consensus in order to derail the implementation of positive policy options. For the analogy on Bishop’s Blog to work, there must first be a consensus among scientists that the publication of irreproducible research is a crisis, a catastrophe even. I am going to talk about this issue of consensus today although it would be more fun to follow that analogy along and try to figure out whether corporate interests are threatened by more or less scientific credibility and how the analogy works when it is corporate money that is funding the consensus and not the dissenters! But anyway, on the topic of consensus…

The promoters of the reproducibility crisis have taken to simply stating that there is a consensus, citing most frequently a highly unscientific Nature poll. I know how to create scientific questionnaires (it used to be part of my job in another life before academia) and it is clear that the question “Is there a reproducibility crisis?” with the options “crisis,” “slight crisis” (an oxymoron) and “no crisis” is a push poll. The survey was designed to make it possible for people to claim “90% of respondents to a recent survey in Nature agreed that there is a reproducibility crisis” which is how you sell toothpaste, not determine whether there is a crisis or not. On twitter I have been informed, with no embarrassment, that unscientific polls are justified because they are used to “raise awareness”. The problem comes when polls that are used to create a consensus are also used as proof of that consensus. How does scientific consensus usually come about?

In many areas of science it is not typical for groups of scientists to formally declare a consensus about a scientific question but when there are public or health policy implications working groups will create consensus documents, always starting with a rigorous procedure for identifying the working group, the literature or empirical evidence that will be considered, the standards by which that evidence will be judged and the process by which the consensus will emerge. Ideally it is a dynamic and broad based exercise. The Intergovernmental Panel on Climate Change is a model in this regard and it is the rigorous nature of this process that allows us to place our trust in the consensus conclusion even when we are not experts in the area of climate. A less complex and for us more comprehensible example is the recent process employed by the CATALISE consortium to propose that Specific Language Impairment be reconceptualised as Developmental Language Disorder. This process meets all the requirements of a rigorous process with the online Delphi technique an intriguing part of the series of events that led to a set of consensus statements about the identification and classification of developmental language disorders. Ultimately each statement is supported by a rationale from the consortium members including scientific evidence when available. The consortium itself was broad based and the process permitted a full exposition of points of agreement and disagreement and needs for further research. For me, importantly, a logical sequence of events and statements is involved-the assertion that the new term be used was the end of the process, not the beginning of it. The field of speech-language pathology as a whole has responded enthusiastically even though there are financial disincentives to adopting all of the recommendations in some jurisdictions. Certainly the process of raising awareness of the consensus documents has had no need of push polls or bullying. One reason that the process was so well received, beyond respect for the actors and the process, is that the empirical support for some of the key ideas seems unassailable. Not everyone agrees on every point and we are all uncomfortable with the scourge of low powered studies in speech and language disorders (an inevitable side effect of funder neglect); however, the scientific foundation for the assertion that language impairments are not specific has reached a critical mass, and therefore no-one needs to go about beating up any “merchants of doubt” on this one. We trust that in those cases where the new approach is not adopted it is generally due to factors outside the control of the individual clinician.

The CATALISE process remains extraordinary however. More typically a consensus emerges in our field almost imperceptibly and without clear rationale. When I was a student in 1975 I was taught that children with “articulation disorders” did not have underlying speech perception deficits and therefore it would be a waste of time to implement any speech perception training procedures (full stop!). When I began to practice I had reason to question this conclusion (some things you really can see with your own eyes) so I drove into the university library (I was working far away in a rural area) and started to look stuff up. Imagine my surprise when I found that the one study cited to support this assertion involved four children who did not receive a single assessment of their speech perception skills (weird but true). Furthermore there was a long history of studies showing that children with speech sound disorders had difficulties with speech discrimination. I show just a few of these in the chart below (I heard via Twitter that, at the SPA conference just this month in Australia, Lise Baker and her students reported that 83% of all studies that have looked at this question found that children with a speech sound disorder have difficulties with speech perception). So, why was there this period from approximately 1975 through about 1995 when it was common knowledge that these kids had no difficulty with speech perception? In fact some textbooks still say this. Where did this mistaken consensus come from?

When I first found out that this mistaken consensus was contrary to the published evidence I was quite frankly incandescent with rage! I was young and naïve and I couldn’t believe I had been taught wrong stuff. But interestingly the changes in what people believed to be true were based on changes in the underlying theory which is changing all the time. In the chart below I have put the theories and the studies alongside each other in time. Notice that the McReynolds, Kohn, and Williams (1975) paper which found poorer speech perception among the SSD kids, actually concluded that they didn’t, contrary to their own data but consistent with the prevailing theory at the time!

History of Speech Perception Research

What we see is that in the fifties and sixties, when it was commonly assumed that higher level language problems were caused by impairments in lower level functions, many studies were conducted to prove this theory and in fact they found evidence to support that theory with some exceptions. In the later sixties and seventies a number of theories were in play that placed strong emphasis on innate mechanisms. There were few if any  studies conducted to examine the perceptual abilities of children with speech sound disorders because everyone just assumed they had to be normal on the basis of the burgeoning field of infant perceptual research showing that neonates could perceive anything (not exactly true but close enough for people to get a little over enthusiastic). More recently emergentist approaches have taken hold and more sophisticated techniques for testing speech perception have allowed us to determine how children perceive speech and when they will have difficulty perceiving it. The old theories have been proved wrong (not everyone will agree on this because the ideas about lower level sensory or motor deficits are zombies; the innate feature detector idea, that is completely dead; for the most part, the evidence is overwhelming and we have moved on to theories that are considerably more complex and interesting, so much so that I refer you to my book rather than trying to explain them here).

The question is, on the topic of reproducibility, whether it would have been or would be worthwhile for anyone to try and reproduce, let’s say Kronvall and Diehl (1952) just for kicks? No! That would be a serious waste of time as my master’s thesis supervisor explained to me in the eighties when he dragged me more-or-less kicking and screaming into a room with a house-sized vax computer to learn how to synthesize speech (I believe I am the first person to synthesize words with fricatives, it took me over a year). It is hard to assess the clinical impact of all that fuzzy thinking through the period 1975 – 1995. But somehow, in the long run we have ended up in a better place. My point is that scientific consensus arises from an odd and sometimes unpredictable mixture of theory and evidence and it is not always clear what is right and what is wrong until you can look back from a distance. And despite all the fuzziness and error in the process, progress marches on.

Advertisements
Leave a comment

2 Comments

  1. Reproducibility crisis: How do we know how much science replicates? | Developmental Phonological Disorders
  2. Reproducibility: Solutions (not) | Developmental Phonological Disorders

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: