[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.]
recently i wrote a thing where i was the eight millionth person to say that we should evaluate research based on its scientific qualities (in this case, i was arguing that we should not evaluate research based on the status or prestige of the person or institution behind it). i had to keep the column short, so i ended with the snappy line "Let's focus less on eminence and more on its less glamorous cousin, rigor."
the question of how to evaluate quality came up again on twitter,* and Tal Yarkoni expressed skepticism about whether scientists can agree on what makes for a high quality paper.
there's good reason to be skeptical that scientists - even scientists working on the same topic - would agree on the quality of a specific paper. indeed, the empirical evidence regarding consensus among reviewers during peer review suggests there is ample disagreement (see this systematic review, cited in this editorial that is absolutely worth a read).
so, my goal here is to try to outline what i mean by "rigor" - to operationally define this construct at least in the little corner of the world that is my mind. i have no illusions that others will share my definition, nor do i think that they should - there is some value to different people having different approaches to evaluating work, because then each may catch something the others miss (this is one of the reasons i'm not too concerned about low agreement among peer reviewers - we often select them on purpose to focus on different things or have different biases). nevertheless, i think we should try to build some consensus as a field about what some important criteria are for evaluating the quality of research, and maybe if we start trying to articulate these more explicitly, we can see where people's views overlap and where they don't. i think the exchange between Simmons and Finkel & Eastwick (and Simmons's response) was a nice example of that.
what is rigor? to me, this is what teaching research methods is all about. trying to identify specific features that make for a good scientific study. so it won't surprise you that many of the things on my list of what makes a study rigorous will be super basic things that you learned in your first year of college. but it's amazing to me how often these issues are overlooked or waved away in studies by seasoned researchers. thus, even though many of these things will seem too obvious to mention, i think they need to be mentioned. still, this list isn't even close to exhaustive, and it's very heavily oriented towards social and personality psych studies. i would love to see others' lists, and to work together to come up with a slightly more comprehensive list.**
so, what do i ask myself when i'm reading and evaluating a scientific paper in my field (e.g., when i'm evaluating job candidates, trying to decide whether to build on a finding in my own research, reviewing a manuscript for a journal, etc.)?*** i ask myself two broad questions. there are many more specific questions you could list under each one, but these broad questions are how i orient my thinking.
1. is the narrow claim true: did the authors find the effect they say they found?questions about the research design:-is the sample size adequate and is it clear how it was determined?-is the population appropriate for the questions the authors are trying to answer?-are the measures and manipulations valid (and validated)?-are there confounds? selection effects? selective dropout? (i.e., were the groups equal to begin with, and treated equally other than the manipulated variable?)-could there be demand characteristics?-did the authors design the study such that null results could be informative? (big enough sample, validity checks, etc.)-do the authors report all conditions and all measures collected?-do the authors provide links to their materials?questions about the analyses:-are the analyses appropriate? do they test the key research question(s)? (this might seem like it's super straightforward, but it's often pretty complicated to identify the key statistical analysis that directly tests your prediction (something you become painfully aware of when you try to do a p-curve).)-are the test statistics correct? (e.g., run statcheck)-do the authors report all equally reasonable analyses they could have done to test their question (e.g., different decisions about data exclusions, transformations, covariates, etc.)?-if there were multiple analyses or multiple ways to conduct each analysis, do the authors address the increased chance of false positives (because of multiple comparisons)?-do the authors provide their data and analysis code? if so, are the results reproducible?2. is the broader claim true: does the finding mean what the authors claim it means?-are there alternate explanations?-if the authors are making causal claims, are those justified?-is the strength of the authors' conclusion calibrated to the strength of their evidence? (there are various ways to think about what "strong evidence" looks like: precise estimate, small p-value, large Bayes Factor, etc.)-if results are not robust/consistent, are the conclusions appropriately calibrated?-do the authors focus on some results and ignore others when interpreting what they found?-do the authors extrapolate too far beyond the evidence they have? are they careful about any generalizations beyond the population/setting/measures they examined? do they include a statement of constraints on generalizability?-if the research is not pre-registered, is it presented as exploratory and is the finding presented as a preliminary one that needs to be followed up on?-do the authors seem open to being wrong? do they take a skeptical approach to their results? do they have any conflict of interest that may limit their openness to disconfirming evidence?then what?if a paper passes both of these hurdles, there are still questions left to ask. what those questions are depends on your goal. you might ask whether the study is important or interesting enough to spend more time on (e.g., build on with your own research). you might ask if the research topic is a good fit with your department. you might ask if the study is novel or groundbreaking or definitive enough for a particular journal. you might ask what implications it has for theory, or how its findings could be applied in the real world.to be honest though, if a paper passes both of these hurdles, i'm pretty impressed.**** maybe my standards will go up as the norms in our field change, but right now, a paper that passes these hurdles stands out to me.if there were three people like me, who defined rigor according to the criteria outlined above, then would we agree about which papers are better than others? which candidate we should hire? i don't know. i'd like to think agreement would be slightly better than without a shared definition. if so, that could be an argument for editorial teams, search committees, grant panels, and other evaluating bodies to try to come up with at least a loose set of criteria by which they define rigor. perhaps that is a step towards this land of unicorns and roses where we don't just rely on metrics, where scientific work is evaluated on its own merit, and not based on status, fads, or the evaluator's hunger level.****** you can literally debate anything on twitter. even powdered cheese.** if you like this idea, you might like the SIPS workshop on "how to promote transparency and replicability as a reviewer" (osf page here. SIPS 2017 program here.)*** i don't actually ask myself each of these sub-questions every time. as with many operationalizations, this is a flawed attempt at articulating some of the things i think i do implicitly, some of the time. mmmmm operationalization. funtimes.**** i don't mean passes every single question - i'm not crazy like reviewer 2. i just mean if i can answer yes to "is the narrow claim true?" and to "is the broader claim true?"***** no, that's probably not a real thing.