Enter your Email:
Preview | Powered by FeedBlitz

« it's the end of the world as we know it... and i feel fine | Main | the good, the bad, and the ugly »



"i think many of the replication studies in the RP:P, like many of the original studies, were underpowered. for this and other reasons, i agree the RP:P was flawed, and the replicability rate was suppressed by these flaws."

I am trying to understand this, but couldn't one always state that replication studies are under-powered after a "failed" replication attempt? (assuming the effect is never exactly 0)?

Perhaps researchers need to come up with some new rules for designing replication studies (especially concerning power), and for judging them/deciding if a replication succeeded or failed.

I would love to hear more on how to exactly do this.

Simine Vazire

if you stick with the NHST framework, yes, you can't draw conclusions from null effects, including null replications. but in effect estimation or bayesian frameworks, you can (if you have enough precision/evidence). for an example of an effect estimation approach, i really like uri simonsohn's "small telescopes" approach: http://datacolada.org/wp-content/uploads/2016/03/26-Pscyh-Science-Small-Telescopes-Evaluating-replication-results.pdf

Maude Lachaine

That's a good point. I get the impression that there's a lot of moving the goalposts these days and it's not clear what type of evidence it would take for some academics to change their position. Though I understand that it is a charged topic.

Speaking as an outsider to psychology, I think the replication problem doesn't make the field look bad, quite the opposite. I suspect a lot of other fields have the same problem, but aren't really taking steps to fix it. Coming from a statistics perspective, I find this movement to improve statistical methods, experimental design and replicability in psychology to be quite refreshing and exciting. No need to be negative.

Sam Schwarzkopf

Great post, Simine. I think you're completely spot on when you ask about the falsifiabity of the hypotheses. We should always do that and it certainly applies to this. What evidence could convince anyone that there is/isn't a problem? If the answer is 'none' it is pointless to continue discussing any results.

Whether or not psychology is in crisis is entirely subjective. But there are objective ways to quantify the validity of research and all people in this discussion should really get together and decide what level of replicability they think is needed.

As far as the RPP is concerned, a considerable proportion of the findings (both the original and the replications) are completely inconclusive and only a small proportion yield compelling support for the existence of these effects. I don't know about anyone else but to me those stats aren't evidence of a healthy field. Large number of inconclusive findings imply that the research is done with sufficient power and sensitivity. The fact that even the replications suffer from this means that estimates of power are based on invalid assumptions, presumably at least in part because the original effect sizes are substantial overestimates.

The issue of methodological differences that could have been avoided is certainly another reason for some concern but again I must ask what a proponent of the original effects would accept as compelling evidence for the null hypothesis. As I said in my post (that single-handedly pissed off all of social psychology research apparently), if you can't make some a priori decisions on what a hypothesis implies then you end up chasing ghosts.

My main worry, and the reason why I was so sceptical of preregistration etc for so long, is that if we only concentrate on strong effects that are robust to methodological differences and come out strongly even in preregistered designs then we may bias science towards only strong effects and miss the more nuanced but potentially important ones. So we need to be wary of this. An effect that is a total snowflake, e.g. "Professor prime makes you smarter, Einstein makes you dumber, Einstein with the tongue out makes you smarter again, and generally it only works when the temperature is below 30C..." (I made some of these up) then I think you need to eventually accept that you are chasing a ghost. But if you think that there are modulating factors at play, which is a perfectly justified assumption, then you should test that and show it's robust.

The field should reflect this. We should stop reporting single findings that are likely to not be robust to lots of confounding factors as general mechanisms.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name and email address are required. Email address will not be displayed with the comment.)