should replications be held to a higher standard than original research?
i have seen some very bright and influential people argue that they should, mainly because of the potential damage that a failed replication could do to the original author's reputation. according to this argument, shoddy original research may be 'irresponsible' but shoddy replication is a 'sin'.*
i have several objections to this.
1. original research is often treated as precedent. my impression is that people see the original finding as very likely to be true, and require a lot of new evidence to be convinced otherwise. this is problematic for many reasons, but if it's true, it's all the more reason to hold original research to a very high standard. giving original research a pass is dangerous given how hard it is to overturn a finding once it is in the literature.
2. i admit that there are probably exceptions to point #1. i certainly have seen some shoddy replications get more attention than they deserve. however, i don't think they do much reputational damage, because the scientific community is rather quick to point out the flaws. there are many people who have a stake in not letting shoddy replications have the last word - not just those who are inclined to believe the original finding, but also those who would like to promote replications. the worst publicity for reform is shoddy research in the name of reform. (also: people who are mean in the name of reform. being nice to people is a really good idea. especially if you are trying to convince them to make difficult changes.)
3. if we adopt meta-analytic thinking, all evidence is grist for the mill. there is no scientific basis for treating evidence differently based on chronology. quality, yes. chronology, no. [i know very little about meta-analysis, but luckily that does not stop me from using phrases like 'meta-analytic thinking.' it is one of my favorite phrases. (i also like 'standard normal deviant.' as in, some of my best friends are standard normal deviants.)]
4. there is also the very real danger of scaring the daylights out of would-be replicators. calling shoddy replications a sin might be enough to deter the more timid among us. maybe that is the point - encourage people not to take replication lightly - but there is a fine line between encouraging good behavior and threatening well-meaning people. funder has written and spoken about the plight of the replicator. it is already pretty damn scary for a less established researcher to attempt to replicate a well-established finding. i'm not sure we should be trying to make it more scary.
5. we also should not give in to the narrative that a failed replication, even when it is so rigorous that it is basically definitive, should harm the original author's reputation. there are cases where it clearly should, but it is good to remember that people sometimes get unlucky (or too lucky, in the p < .05 sense), and a false positive does not make someone a bad person, or even a bad researcher. it all depends on how rigorous the original study (and analyses) were. and it also depends on how the original researcher reacts to the strong evidence that their finding was a false positive. there are several prominent examples of people whose work failed to replicate in a convincing way, but whose reputations were not harmed, and maybe were even improved, because they reacted in an extremely scientifically-minded, non-defensive way. i hope to dedicate some blog posts to those outstanding examples in the future. (please let me know of any cases you know of.)
*someone you've heard of used those words.
happy persian new year!
I agree that replications should not be held to a different standard -- evidence is evidence. I think what *is* important is that people widely understand that replications have to be faithfully executed. Just because you ran roughly the same study and failed to find the same results may not mean that there's no effect -- it could just be that you suck at running studies, or worse, you were looking for a null effect and (hopefully unknowingly) nudged things in that direction.
Of course there's a big responsibility on the original researchers -- often not lived up to -- to make it as easy as possible to run a faithful replication. And I would guess that most replications are run faithfully. So while I don't think shoddy replications are a sin in some special way, I think that there should be a high standard for faithfulness of replications. After all, if your only job is to run the same study -- meaning you didn't need to come up with the hypotheses, methods, designs, etc. -- then you should do that one job very very well.
Posted by: Dave Nussbaum | 21 March 2014 at 01:12 AM
Your point #3 reminded me of an all-time favorite Onion headline:
http://www.theonion.com/articles/standard-deviation-not-enough-for-perverted-statis,8892/
Posted by: Stuart Buck | 21 March 2014 at 02:57 AM
I think that #2 is inconsistent with the argument. It sounds like "other people will hold this research to higher standards, so the job will get done without my worrying about it." This may be true, but it's not consistent with the rest of your argument.
Posted by: Chris C. | 21 March 2014 at 03:53 AM
Great article! I cannot agree more with the argument that replication studies should *not* be held to higher standards, but I would go further. I'd argue that shoddy original studies have much more potential to cause damage overall when you consider the far reaching negative consequences they can have in terms of wasted time, resources, and opportunity costs when large number of labs try to extend false positive findings! E.g., the execution of thousands of studies on social priming phenomena may turn out to have been completely for naught. :-(
Posted by: Etienne LeBel | 23 March 2014 at 09:06 AM
thanks for your comments!
dave - i agree that one of the criteria for evaluating the quality of replications is the similarity to the original study, and of course there will always be some differences so that judgment can be difficult to make. it's good to remember that there are multiple - and sometimes different - criteria to keep in mind when evaluating original research and replications.
chris - i agree that my point #2 somewhat undermines my argument. i should probably have labeled it point #1', as a caveat to point #1.
etienne - i agree that there are some pretty compelling reasons to hold original research to a very high standard. the current incentive structure often rewards authors of original research for boldness and counter-intuitiveness, so extra scrutiny seems not unreasonable.
Posted by: simine vazire | 23 March 2014 at 09:17 AM
Nice thoughts Simine! I am certainly don't believe that replications **should** be held to higher standards than original research, however, they probably **are** held to higher standards. I will slightly qualify that statement though because I think it applies mostly to one specific type of replication.
Set aside your support for the new statistics for a moment and go back to dichotomous thinking. If the original article can find evidence for an effect or not, and the replication can find evidence for an effect or not, we can visualize a original study/replication study 2 by 2 table. You can probably intuit which combinations (unfairly) will be held to higher standards than others.
Here is my intuitions. Original studies that find no evidence for an effect will probably be easily accepted but will never get published. This is the whole "I had a hunch, collected a small sample, found nothing, and never bothered to write it up for publication" type studies. These studies begin with the **belief** that an effect **may** exist but there is no prior evidence. Ironically, replications of these effects probably proceed as if they are not replications at all. If the researcher fails to find an effect, they file the study away (not knowing that there are now 2 studies in file drawers); if the researcher finds an effect, they can publish it as if it they are the first to observe this effect (not knowing there is another study in a file drawer somewhere). It is then up to others to replicate this seemingly original effect. Therefore, we may not be knowingly doing replications until somebody (un)luckily happens upon a sample of data that shows an effect. The replicability vote-counting begins after, but not before, the first p<.05 appears.
It gets interesting when an original study finds an effect however. Replications that confirm the original study's observed effects probably have a relatively low threshold for acceptance (acceptance in the social sense, not in the publication sense). However, studies are probably held to the highest standard when there is evidence for an original effect but a replication fails to reproduce the effect-in-question. I believe that the "successful original-failed replication" pattern is what your post is mostly discussing. These replications probably are held to a higher standard because the contribution is in weeding out bad information (i.e., addition by subtraction). The high standard is probably because there was once evidence to believe an effect existed but now people need evidence to disregard that belief and to believe in something else. This leads to all of the problems you discuss.
It would be interesting for somebody to look into this empirically. Do researchers treat original positive evidence as an anchor and we insufficiently use failed replications in our adjustments?
Posted by: Randy McCarthy | 24 March 2014 at 12:25 AM