Enter your Email:
Preview | Powered by FeedBlitz

« does psychology have a public relations problem?* | Main | super power »


Sam Schwarzkopf

I have high hopes for today. This is the third very sensible post I read today :P Your suggestions seem very reasonable to me and that you are trying to break that polarisation between "replicators" and "status-quoers". There is a middle ground here and that's what we should be after.

One thing I'd disagree on is that hidden moderators/mediators/confounds are a stronger argument than comments about the competence of the research (be it the original or the replication). Incompetence and/or low quality experiments can very well affect the outcome and I don't think it is as simple to judge that as saying "there is a large N and the original author approved the methods." If, as in a proper adversarial collaboration, the original authors were directly involved in the experiments and the outcome is nonetheless a failure to replicate, then I think it is hard to argue that poor quality is to blame. But in many other replication attempts that judgement call is a lot harder. As you rightly point out though, the same argument applies to the original result too. So the take home message from this is that you shouldn't put too much faith in any set of two studies (original + replication) regardless of what their power etc unless you have very high confidence that one of them was of sufficient quality.

In my view, the hidden moderator defense is a much weaker argument. There will always be unknown factors. You need experiments to test under which conditions a finding generalises (if it replicates at all that is). Unless you can formulate a moderating factor and test it, this argument is meaningless.

Ulrich Schimmack

Dear Laura Scherer,

your recommendations for the future are good, but I disagree with the analysis of the current situation.

You write: Choosing to attribute recent replication failures to one or the other of these explanations is unhelpful, because both explanations could be correct.

This is simply not correct because it is possible to use the OSF-reproducibilty data to test these alternative hypotheses. The original studies show clear signs of bias that inflated the reported effect sizes by 50% or more. The unbiased replication studies revealed this bias. It is possible that for some individual findings moderators might have an effect, but the large discrepancy of 97% significant results in the original studies and 36% significant results in the replication study is due to questionable research practices and publication biases in psychology.

Also, recommendations for improvement have been made for several years, but the power of original published studies in psychology in 2015 is not higher than it was in the 50 years before.


We can all hope that things will get better, but often an honest assessment of the current situation is needed before things improve. The moderator argument is not helpful because it hides the real cause of the problem: original studies are not credible and provide no protection against false-positive results because published results are based on a selection of significant results from a set of underpowered studies.

Sincerely, Uli Schimmack



Hi Lara,
I cannot disagree with your recommendations for replication studies, they're great. However the way you build the narrative it now appears you are suggesting that RP:P does not comply with these high standards. I hope you agree that would be very wrong to suggest, one could refer to the procedure of RP:P and see virtually all of your recommendations were implemented in RP:P

Also, I was wondering... can you give examples of replication studies that do not meet the high standards and 'further muddy up the literature with a bunch of false negatives'? I can't think of many studies that would qualify as such in the recent literature... there aren't that many replication studies around to choose from.

What I *can* disagree with are the ideas about the role of hidden moderators in the application of the scientific method to produce scientific knowledhe

Assuming that we're talking anout confirmatory studies and that research questions are always based on some deductive chain of propositions that lead to a prediction of measurment outcomes
in terms of an observational constraint between at least a dependent and independent variable....

Then it is absolutely valid to claim, after a failed direct replication study, that an uncontrolled confounding variable in the replication was responsible for the failure to replicate the original effect.

However, the consequence of claiming this was the case for the replication is that the original deductive chain/theory/claim was invalid as well!

It means that:
- the Ceteris Paribus clause is violated. There apparently were sensible/foreseeable moderators that systmaticilly vary with the effect, which were the original study did not explicitly attempt to control for.

- to claim a failed direct replication was due to a hidden moderator and the original study observed a true effect implies that randomisation failed in the original study and the the hidden moderator was controlled for 'by accident' / or by selection bias. Random assignment of subjects to groups or conditions controls for non-systematic variability within / between subjects. A moderator variable implies systematic variation, something that randomisation can't resolve.

So either way, the interpretation of the results of the original study is problematic if one wishes to explain a failed replication was due to hidden moderators.

Luckily philosophers of science (Lakatos) have analysed these defensive strategies in great detail and here's the deal:

1. Progressive research programme: Acknowledge hidden moderator and accept there is a problem with the original study and the theory/deductive chain that predicted the effect. Amend the original claim or start from scratch.

2. Degenrative research programme: Point to hidden moderator in order to protect the perceived veracity of the original result and theory. Do not amend, or reject the original claim.

All the best,


Thanks for the comments, these are great.

Sam, it's possible that I agree with everything that you said. The point about questioning replicators' competence was simply that, so long as published replications are high quality, it shouldn't be acceptable to question a replicator's competence. Further, it may have been too fair to the moderator argument to give it such credence without being clear that we need to consider moderators a priori (I said that somewhere, in passing, I think…). I think you said it best: we shouldn't put much faith in any set of studies unless we can be reasonably certain that at least one of them was high quality.

Uli, I think you and I agree that we currently have a problem with false positives in the literature and that this may take years to sort out. I see this as a long term effort. In the near term, weak findings that shouldn't have made it into the literature will be rejected as a result of failed replications that are high quality and that test reasonable moderators (proposed a priori, I hope). All of us need to be prepared to accept that some original findings were false positives--if we can't, then published findings will be like the walking dead (they refuse to die). That said, I'm going to stand by the point that moderators are an important thing to consider when conducting replications. I don't see these as mutually exclusive and that was part of the point of the post.

Fred, with regard to the first part of your comment, I want to be clear that I have extremely high regard for the RP:P and agree that they met the criteria on the list. Some people that I've talked to disagree with this sentiment, although it seems that the real issue is that 100 studies is just a lot to digest (making it hard to go through and scrutinize each one). With regard to the rest of your comment, when I was writing this piece I just knew that someone would comment with something far more sophisticated about philosophy of science :) So in response, I say we all decide to go with #1 from your comment and not #2, and agree that we should consider *reasonable*, *a priori* moderators and be prepared to toss out findings that we can't replicate after considering those moderators. It seems that we would all agree with this.

To me, these comments represent a huge step in the right direction relative to other degenerative arguments that I have witnessed recently. Thanks for the respectful tone and the thoughtful points. Very much appreciated.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name and email address are required. Email address will not be displayed with the comment.)