i am teaching a seminar called 'oh you like that finding do you? well it's probably FALSE.'
the students are a little bit shell-shocked.
i am having the time of my life.
the hardest part is explaining how perfectly smart, truth-seeking scientists can repeatedly do idiotic things. happily, i have plenty of examples from my own life.
just this week, i almost p-hacked. the details are a little dry, but i think it's worth telling this tale of mundane p-hackery, because this is what p-hacking looks like in the wild. it is not sinister and dark like a good episode of the americans. it is super boring like the stories you read to your children at night.* in fact, if you run out of bedtime material, read them this.
once upon a time, there were some personality researchers who wanted to measure personality.
they collected self-reports and six peer reports from each of their participants.
they wrote a paper examining how the big five personality traits correlate with friendship satisfaction.
because the paper was due right when the data came in, and organizing the peer reports takes time, they only used self-reports of personality (bad personality researchers!).
the reviewers and editor said 'dude, we know you have peer reports. and you shouldn't just correlate self-reports (of personality) with self-reports (of friendship satisfaction).'**
so they compiled the data and ran the analyses with the peer reports of personality. not surprisingly, all the correlations with (self-reported) friendship satisfaction were weaker. still significant (yay large samples), and with a similar pattern as the self-reports, but weaker.
they started off by reporting all of their results, with the self-reports and the peer reports side by side.
then their tables got unwieldy, and they decided that they should just aggregate the self- and peer-reports into a composite measure for each personality trait. because obviously that is the best measure of personality. and it would make their tables easier to read.***
so now, gentle reader, we have a self-report and (up to) six peer reports for each participant. we want to aggregate them. what are we to do?
option a is to first aggregate the six peer reports, and then average that composite with the self-report, in which case the self-report is weighted as much as ALL the peer reports put together.
option b is to just average all seven reports (self-report and six peer reports) all at once, in which case the self-report is weighted only as much as any single peer report.****
pop quiz hotshot: which one will give us "better" results?
option a would give us bigger effects. and many many researchers in this area***** would say option a is the better option because self-reports are special and should be weighted more than each individual peer report. so our protagonists would have cover if they went with option a.
option b, however, is what our protagonists have typically done in the past. partly because they are convention-busting mavericks and partly because they believe that the self is not that special - each of your close friends knows about as much about your personality as you do yourself.
so the noble researchers sat there and, for several minutes, contemplated what to do. 'contemplated' is the wrong word. it was immediately obvious to them what they should do. they even said 'this is p-hacking, what we're contemplating doing right now.' out loud. and still, they sat there for another minute. squirming.
then they did the right thing because that's what protagonists do and would i really be telling you this story if we p-hacked? come on.
the moral of the story is: not p-hacking is HARD. side effects include redness, swelling, and deep, deep frustration. and self-righteousness.
what's that? your kids are still awake? here is a fun little postscript:
another thing the reviewers said was 'look, i know your analyses were exploratory, but anyone with half a brain would have predicted finding ABC. stop dicking around and, in the introduction, tell your readers about how all existing personality theory and research would predict ABC.'
what are our protagonists to do? they did not predict ABC, but mostly because they were too damn lazy to make specific predictions (bad bad researchers!). they did not want to lie in the introduction. but they also agreed that only a dumbass would not predict ABC.
in a brilliant stroke of genius, they decided to write: 'previous theory and research on A and B and C would clearly suggest that ABC. although this would be a reasonable prediction, we did not actually make this prediction a priori' and they lived happily ever after.
* i am extrapolating from the stories my dog makes me read at night. i hear parents love it when you compare their kids to dogs.
** this is called getting a taste of your own medicine. your kids will appreciate this lesson someday.
*** of course we are putting the data and disaggregated results on OSF. we're not total idiots.
**** the younger kids might get confused here. just write out the formulas for them.
***** almost all seven of them.
****** everything in this story is true (you know, without the dicking around and the dumbass bits), but in our defense, our paper (and methods) were a little more sophisticated than described here.