Enter your Email:
Preview | Powered by FeedBlitz

« modus tollens bitches | Main | lady problems »

Comments

Dr. R


Here is an idea what a researcher should do with self-ratings and informant ratings that deals with the problem of aggregation. DO NOT AGGREGATE!

The reason is that now error variance in self-ratings is meshed up with error variance in informant ratings and it becomes impossible to say which variance components drive a correlation.

The best way to analyze these data is to use Structural Equation Modeling. Now you can correct for measurement error in personality and show the true strength of the relationship and you can show that error variance in self-ratings is correlated with the criterion (shared method variance).

See, Kim, Schimmack, & Oishi (JPSP, 2012) for an example.

Structural equation modeling was invented in the 1950s. Maybe 60 years later, personality psychologists can start using this useful statistical tool that avoids the problem of QRPs in aggregation.

The Devil's Neuroscientist

Okay now I'm confused. Perhaps it's because I'm not a personality researcher or because of my cursory reading over my alter ego's morning coffee (it's not always easy to concentrate when you share someone else's mind). But what does this story have to do with p-hacking?

Your protagonists were faced with a choice which analysis protocol to follow. The option most commonly used in the literature would have given them the more striking results (you say "bigger effects" so in the context of p-hacking I assume that the p-values were lower for this statistical comparison?). The option most typically used in the protagonists' own research would have given them weaker effects. This isn't so much about p-hacking as it is about deciding what the most appropriate analysis should be.

The Crusaders (as I call them) will tell you that the protagonists should have just preregistered that analysis pipeline. This example serves again as a reminder why this doesn't actually work in practice: if the experiment had been preregistered, the preregistered analysis would not have included the peer reports. So prereg probably wouldn't have saved the protagonists from this dilemma.

Of course, the best solution to this situation would have been to simply report *both* analyses alongside a discussion of this problem. Or(and?) choose a more appropriate analysis method that gets around aggregation, as the previous commenter suggested.

Tom

I thought I'd respond to The Devil's Neuroscientist as she prompted me on Twitter =)

// Is it p-hacking?
The story is about researchers potentially (but not actually) exploiting their researcher degrees of freedom in a manner that could artificially inflate the strength of their findings.

It certainly sounds like p-hacking in essence to me, even if that term is not strictly accurate in a technical sense* if there was no p less than .05 threshold issue (unclear if there was).

// Would pre-registration have made a difference?
I think a registered reports version of pre-reg would have done. If we assume that the pre-reg document was reviewed by the same reviewers mentioned above, then they would have pointed out the missing peer reports issue before the study had even been done. So when the researchers came to write up their report, that would have been one less degree of freedom available for potential exploitation.

Also imagine how much time/resources would have been saved if the reseachers hadn't even been intending to collect peer reports at all and the reviewer pointed this out early on...

// (im)perfect science
I think another important issue raised by this piece (in addition to the fact that I dropped my tea laughing at one point) is the inherent tension between the drive for aesthetically pleasing science (cf. desire to tame unwieldy tables, reviewer encourging HARKing) and the rough looking version of science that is less easy to communicate but much more transparent. Are we getting the balance right?

* I drew my p-hacking definition from urban dictionary as I think we are talking about a colloquial term here rather than one with a precise definition: http://www.urbandictionary.com/define.php?term=p-hacking

The Devil's Neuroscientist

Thanks for this comment. I agree that with even more coffee in my system I can now somewhat see why you would regard this as a form of p-hacking even if it isn't strictly about getting p<.05.

I think what threw me off at first is that the dilemma was really about the choice between typical research practice and the approach the authors would have normally taken. The reason I don't really see this as p-hacking is that it doesn't "artificially inflate the strength of their findings" because we don't really know which results are truly the more appropriate. Perhaps option 2 is deflating the results? It's more of a question of whether the authors stick to their own theories or conform with the status quo. But yes, I can see now why you see it this way.

Again, rather than pre-registration I think the best course of action would have been to simply present both types of analysis (plus perhaps others). In situations where the right course of action is unclear I feel it's better to just provide the reader with the available evidence and let them make up their minds.

You're of course right that a peer-reviewed registered report could have determined this from the outset. I have previously argued that this is the only way that preregistration could realistically work. In the meantime, discussions I witnessed between various proponents of prereg though have led me to reevaluate this idea. In fact, Tywin Lannis... sorry, David Shanks recently argued quite cogently (in my mind) that registered reports are unlikely to ever catch on.

Finally, my goody two-shoes twin brother, Sam, would like to say that he agrees that this blog is very funny.

simine

thanks for your comments everyone!

Dr. R - yup, i totally agree. SEM is very useful and we should use it more. it doesn't eliminate the problem of researcher degrees of freedom, but of course no statistical technique can do that. in the end we will always need to rely on judgment and reason.

devil's neuroscientist - part of the point i wanted to make is that often, when researcher degrees of freedom are involved, both/all options are justifiable. it's not a matter of simply avoiding the 'wrong' approach. but if we systematically choose the approach that gives us bigger effects/smaller p-values, this contributes to the bias in the published literature (and we are capitalizing on chance). p-hacking is relevant anytime our decision about which analysis to use is influenced by how strong/significant the results are with each option. (reporting all results is great, and we do that in the supplemental materials, but most people won't read the supplemental materials, and i don't always want to see every possible analysis in the main text.)

re: pre-registration. i think it's great. i'm a big fan in principle. but i can't seem to do it. i just love exploring data. we are trying to be more systematic about using one dataset for exploration, and then another strictly for confirmation (this is our solution to preregistration when dealing with datasets that took us several years to collect). so i'm trying to move closer to the ideal. in my view, the ideal is some combination of exploration and confirmation, with total transparency about which is happening when. (i also think that it's almost impossible to anticipate all possible researcher degrees of freedom ahead of time, so pre-registration can help a lot, but can't completely eliminate the temptation to p-hack.)

thank you all for your interest and feedback!

The Devil's Neuroscientist

Thanks for the clarifications. I also appreciate your candor about exploring data. Sam is certainly the same. Data are exciting, why wouldn't you explore them? I agree that it would be great to have a better idea what is exploration and what is hypothesis-driven research. However, I think that is really a cultural change. The current culture puts undue value on hypothesis-driven research which forces people to pretend that their exploration is hypothesis-driven. Thus I believe pre-registration is trying to cure the symptom rather than the cause of the problem.

The comments to this entry are closed.