this blogging thing is pretty rad. (also: twitter. wow.)
today's topic: having it all. i'm not talking about the work/life problem. i'm talking about the sample size/methodological rigor problem.
let's start with sample size. by now, everyone knows that bigger is better. you can't have too large a sample. there is no 'double-edged sword'. there are no downsides to a large sample. more evidence is always better, and larger samples = more evidence.
this seems very obvious but i've seen at least three different editors criticize manuscripts for having samples that are too big. so i want to be very clear: there is no such thing as a sample that is too big. calling a sample too big is like calling a a grizzly bear too cute. it makes no sense.
many people have written very compelling explanations about why we should want larger samples (more power). i will trust that you have read those.
what i want to talk about is the downside of large samples.
ok, i know i just said there are no downsides, but that's once the data have already been collected. if you are an editor or a reviewer, and you are reading a manuscript with a sample size of, say, 884,328, you should consider this a major asset. it's not complicated.
but let's say you're a researcher, designing a study. you've heard large sample sizes are awesome. you have $1,000 to spend. what should you do?
one of my biggest fears in life is that people will accept the large-samples mantra unthinkingly, and decide to run all their studies on mturk, where they can get 10,000 participants for their $1,000. my other biggest fear is spiders.
over the last thirty years, my field (personality psychology), has finally grown out of its over-reliance on self-reports. it is now practically impossible to publish a cross-sectional study of college students that relies exclusively on self-reports. and that's probably as it should be. we cannot build our knowledge of human behavior on self-reports, reaction times, and vignette studies alone (experimental philosophers, take note).
however, now that people are catching on to the fact that most of our studies have samples that are too small, people are turning to mturk to increase their sample sizes. this is a nice sentiment, and there are some great things about mturk (and some not so great...). but there is a problem: it is exceedingly difficult to collect actual behavioral measures, informant reports, or physiological measures from mturk samples. and we need those measures. how can we study morality, love, stress, leadership, or any other interpersonal phenomenon if we only have the self's perspective?
we need good methods. that means we need multiple methods, and each one needs to be implemented rigorously. this is very expensive and time consuming. multiply that by many many participants, and it feels like we have to make a sacrifice. we can't have it all.
but we can. there is one ridiculously simple solution: slow down.
it's true that we cannot have both large samples and diverse, intensive methods if we want to continue running billions of studies per year. but there is another model. run fewer, better studies.
despite my skeptical exterior, i have a wildly idealistic streak. i don't like to compromise. so i believe that we can have large, beautiful studies with multiple, reliable, valid methods. we just have to value them. journals need to bend over backwards to publish those studies, and all of us need to treat those studies as much more definitive than small, mono-method studies. they should be the holy grail of psychological research.i understand the pressure to do things quickly. i know that calling on people to collect larger samples necessarily puts pressure on them to use cheaper, quicker methods. but we should not easily give in to the notion that we must choose one or the other. we can have it all. we just have to be patient.the end.
**photo credit: erik pettersson.
Nice article. I think it's symptomatic of something wrong in our field that we worry about whether a study is going to "come out" in our favor instead of whether its methods are strong enough to support the knowledge about the world we get from it. I'm pretty happy whenever I design a study that I know will have something of interest to say regardless of its results.
Posted by: Roger Giner-Sorolla | 07 March 2014 at 04:40 AM
Great post.
The best way to evaluate science (and scientists) is for knowledgeable experts to make informed judgments.
The problem is that expertise and time are rare commodities. So people end up making heuristic judgments based on metrics that they can easily see and count. Which just incentivizes people to inflate the metrics. We've all seen the problems generated by the "more publications is better" heuristic. I totally agree that if "more subjects is better" becomes too strongly incentivized, it will create distortions of its own. It is tempting to try to fix it by incentivizing other things, but "more methods is better" or "more dollars spent per subject is better" will create problems too.
So it comes back to expert judgment. And decision-makers who do not have the time or expertise need to trust those who do. If you're a dean and find yourself counting lines on the vitas of tenure candidates in fields you don't know, ask yourself why you aren't trusting the recommendations of the people in the department. Maybe you should. And if you have a good reason not to trust them -- well, you're the freakin' dean, fixing it is your job. They didn't hire you to count things. (I hope.)
Posted by: Sanjay Srivastava | 07 March 2014 at 05:01 AM
Great set of posts, Simine! I appreciate the nuance that you and others like Sanjay, Laura, David, and Brent are bringing to the “replication/methodology/ethics crisis”. We have years (and years!) of training in methodology, assessment, and statistics and having to think a little deeply about how we plan, run, and analyze our findings is not such a bad thing. I also echo your sentiment that running large N studies using only self-reported questionnaires is not as rigorous or ecologically valid as taking the time to obtain peer reports, directly observed behavior, behavioral residue, and/or "life data" (e.g., verifying GPA via transcripts, # of facebook friends, polling confirmation of whether someone actually voted). I find it maddening to review manuscripts that use 40 mturkers, pay them $.20 each, and takes all of 5-minutes to complete (questionably validated) questionnaires and would not jump for joy to read a replication of 400 or 4000 mturkers using the same methodology. Instead of publishing less, let’s just think more about our study design to make sure we are utilizing the various methodological tools available to us and that make sense for the phenomena we are studying. None of these suggestions (larger N, multiple methods, replication) are anything new to our field but I'm cautiously optimistic that the renewed attention to these concerns will bring about (thoughtful, methodical) change in our field.
Posted by: PsychChrisNave | 07 March 2014 at 12:54 PM
thanks for your comments!
roger - i agree. i've always been too chicken to run a study that would only be useful if the results came out as i expected. i just don't have very much trust in my ability to come up with correct hypotheses. (incidentally, when i hear people worry about the ethics of wasting participants' time with large samples, i think that's legitimate but in my opinion the participant time that is wasted on studies that 'didn't work' and are relegated to the file drawer is a much bigger problem. at least with large samples, the participants' data are more likely to eventually contribute to the knowledge base, even if each participant's contribution is small.)
sanjay - you're right, we shouldn't use the quantity heuristic, and we should rely on experts to judge quality. i think this is hard for several reasons, one of which is the huge burden that is already placed on experts (as reviewers of journal articles, tenure cases, etc.). (more on that in a future post). another problem is that when we don't use an objective heuristic like quantity, we are left with subjective impressions, which feel more susceptible to implicit (and explicit) bias. i know that the quantity measure is also subject to bias, but somehow the potential for bias is more palpable when relying on people's opinions of quality.
chris - thanks for your comment! i'm optimistic, too. let's hope we're right!
Posted by: simine vazire | 07 March 2014 at 07:02 PM