Enter your Email:
Preview | Powered by FeedBlitz

« i always live without knowing | Main | unbelievable. »

Comments

Dave Nussbaum

I agree that replications should not be held to a different standard -- evidence is evidence. I think what *is* important is that people widely understand that replications have to be faithfully executed. Just because you ran roughly the same study and failed to find the same results may not mean that there's no effect -- it could just be that you suck at running studies, or worse, you were looking for a null effect and (hopefully unknowingly) nudged things in that direction.

Of course there's a big responsibility on the original researchers -- often not lived up to -- to make it as easy as possible to run a faithful replication. And I would guess that most replications are run faithfully. So while I don't think shoddy replications are a sin in some special way, I think that there should be a high standard for faithfulness of replications. After all, if your only job is to run the same study -- meaning you didn't need to come up with the hypotheses, methods, designs, etc. -- then you should do that one job very very well.

Stuart Buck

Your point #3 reminded me of an all-time favorite Onion headline:
http://www.theonion.com/articles/standard-deviation-not-enough-for-perverted-statis,8892/

Chris C.

I think that #2 is inconsistent with the argument. It sounds like "other people will hold this research to higher standards, so the job will get done without my worrying about it." This may be true, but it's not consistent with the rest of your argument.

Etienne LeBel

Great article! I cannot agree more with the argument that replication studies should *not* be held to higher standards, but I would go further. I'd argue that shoddy original studies have much more potential to cause damage overall when you consider the far reaching negative consequences they can have in terms of wasted time, resources, and opportunity costs when large number of labs try to extend false positive findings! E.g., the execution of thousands of studies on social priming phenomena may turn out to have been completely for naught. :-(

simine vazire

thanks for your comments!

dave - i agree that one of the criteria for evaluating the quality of replications is the similarity to the original study, and of course there will always be some differences so that judgment can be difficult to make. it's good to remember that there are multiple - and sometimes different - criteria to keep in mind when evaluating original research and replications.

chris - i agree that my point #2 somewhat undermines my argument. i should probably have labeled it point #1', as a caveat to point #1.

etienne - i agree that there are some pretty compelling reasons to hold original research to a very high standard. the current incentive structure often rewards authors of original research for boldness and counter-intuitiveness, so extra scrutiny seems not unreasonable.

Randy McCarthy

Nice thoughts Simine! I am certainly don't believe that replications **should** be held to higher standards than original research, however, they probably **are** held to higher standards. I will slightly qualify that statement though because I think it applies mostly to one specific type of replication.

Set aside your support for the new statistics for a moment and go back to dichotomous thinking. If the original article can find evidence for an effect or not, and the replication can find evidence for an effect or not, we can visualize a original study/replication study 2 by 2 table. You can probably intuit which combinations (unfairly) will be held to higher standards than others.

Here is my intuitions. Original studies that find no evidence for an effect will probably be easily accepted but will never get published. This is the whole "I had a hunch, collected a small sample, found nothing, and never bothered to write it up for publication" type studies. These studies begin with the **belief** that an effect **may** exist but there is no prior evidence. Ironically, replications of these effects probably proceed as if they are not replications at all. If the researcher fails to find an effect, they file the study away (not knowing that there are now 2 studies in file drawers); if the researcher finds an effect, they can publish it as if it they are the first to observe this effect (not knowing there is another study in a file drawer somewhere). It is then up to others to replicate this seemingly original effect. Therefore, we may not be knowingly doing replications until somebody (un)luckily happens upon a sample of data that shows an effect. The replicability vote-counting begins after, but not before, the first p<.05 appears.

It gets interesting when an original study finds an effect however. Replications that confirm the original study's observed effects probably have a relatively low threshold for acceptance (acceptance in the social sense, not in the publication sense). However, studies are probably held to the highest standard when there is evidence for an original effect but a replication fails to reproduce the effect-in-question. I believe that the "successful original-failed replication" pattern is what your post is mostly discussing. These replications probably are held to a higher standard because the contribution is in weeding out bad information (i.e., addition by subtraction). The high standard is probably because there was once evidence to believe an effect existed but now people need evidence to disregard that belief and to believe in something else. This leads to all of the problems you discuss.

It would be interesting for somebody to look into this empirically. Do researchers treat original positive evidence as an anchor and we insufficiently use failed replications in our adjustments?

The comments to this entry are closed.