the question came up a few weeks ago: if two measures are correlated .60, should they be aggregated into a single measure or kept separate? this might seem like a narrow question, but it raises some deep and complicated issues. at the heart of the matter is: when are two measures measuring the same construct?
thinking about this reminded me of the jingle and jangle fallacies. i have not read the original papers (the jingle fallacy apparently dates back to edward thorndike, 1904, and the jangle fallacy to truman kelley, 1927. i have not tracked down either book/paper. sorry.) the jingle fallacy is calling two things by the same name that are actually different constructs. the jangle fallacy is using two different names for things that are actually the same construct.
let's get specific. let's take the constructs 'neuroticism' and 'negative affect'. are these two the same thing? (
maybe,
maybe not).
one obvious difference is that neuroticism is usually thought of as a stable personality trait, whereas negative affect is typically thought of as a momentary state. but either can be conceptualized either way. in our own studies, we* measure trait neuroticism and trait negative affect, and we also get people to report their state neuroticism and state negative affect several times a day for several weeks (using experience sampling methods). how do we decide whether neuroticism and negative affect are really a single construct (either at the trait or the state level)?
i won't get into the specific results (because then we would not be able to publish them in a peer reviewed journal and get lots of fame and glory), but instead i will reflect on the factors that go into making this decision. there are two broad considerations relevant to deciding whether two measures are actually measuring the same construct: the empirical and the theoretical.
1. the empirical part: how strongly are they correlated with each other?
the strength of the correlation between the two measures is an important clue to whether they are measuring the same construct. but it is tough to know how strong of a correlation counts as strong enough to decide they are not distinct.
the question i opened with - if two measures correlate .60 with each other, should they be aggregated? - is not a straightforward one. in fact, .60 is just about at the threshold of my own intuition about where the line is drawn between convergent and discriminant validity (or maybe it's more like .55, or .568). but how to interpret the .60 correlation depends on a few other things. perhaps most importantly, what is the reliability of each measure? if each measure has a reliability of .60, then the fact that they correlate .60 with each other suggests that they are highly similar, because a measure's reliability is, roughly speaking, the theoretical upper limit to how strongly it can correlate with other measures. this is because a measure's reliability is, in essence, its correlation with itself, so it should not correlate with another measure more strongly than it correlates with itself. (but this is in theory only, in reality measures regularly exhibit correlations with other measures that are stronger than their own reliability. partly because reliability estimates can be wrong, and partly because, as charcot said, 'theory is nice, but it doesn't prevent things from existing.') so if the correlation between two measures is high, and close to the reliabilities of the two measures, then that is pretty strong evidence that they are measuring the same construct.
there is more to the empirical part. for example, do the two measures show similar patterns of association with other measures? do they covary not just across people, but within-person over time? when combined, do they load onto a single factor? but this is a blog so i am going to ignore those issues.
2. the theoretical part: are there important conceptual distinctions between the two constructs?
it is possible that, even if all empirical results point to the conclusion that your two measures are indeed getting at the same construct, you are persuaded by strong theoretical reasons to keep them separate. you are probably wrong, but you can try to make the case.
i am a dust bowl empiricist at heart. i am inclined to believe the data over anyone's theory. so it will be difficult to convince me that two measures that don't pass the empirical test of distinctiveness are actually measuring distinct constructs.
however, if you have a strong theoretical argument why the two things you are attempting to measure are distinct constructs, i will entertain the possibility that your theory is right and your measures are crap. this is perhaps the best use i can think of for theory - sometimes it helps us figure out that we are using really bad measures.
what are some examples of good theoretical reasons to maintain that two things are distinct constructs, even in the face of empirical evidence to the contrary? one is that you believe they arise from different processes. these could be developmental processes, cognitive processes, biological processes, etc. but you still need empirical evidence to back up the process difference, so this is not really a theoretical argument.
another is that you can describe scenarios where the two constructs come apart (this is related to the 'different processes' reason - if they can come apart, there must be some difference in the processes that produce them). this is one purpose of thought experiments. for example, if you can paint a compelling picture of a time when a person would experience neuroticism but not negative affect, then you might be able to convince me that they are different constructs, even if, practically speaking, they almost never come apart. in my view, the scenario you depict has to be relatively plausible (philosophers and others may disagree). i don't care if neuroticism and negative affect could come apart on twin earth, i want to know if they could come apart in a situation human beings are liable to encounter here on this actual planet.
but in the end, i think you have to back up your theoretical argument with empirical evidence. if you really believe your two things are different constructs, you need to be able to observe them coming apart in actual data. that might mean developing new, better measures, or tracking down rare cases or rare situations, or doing other difficult things, but until you do those things, you are not going to convince me based on theory alone.
those are my half baked thoughts. many people have said much smarter things about construct validation. paul meehl, jane loevinger, campbell and fiske, cronbach... and also
some people who are currently alive. i'm not sure why you are reading my blog instead of reading them. come to think of it, i'm not sure why i'm writing my blog instead of reading them.
*by 'we' i mean my awesome graduate students and our fantastic research assistants.
I'm currently in the midst of writing a paper that argues among other things that two variables showing a .98 correlation should be regarded as measuring somewhat different things.
I think that correlational evidence for whether two things are the same is way, way, way overvalued. Perhaps we can consider it 'necessary but not sufficient'. If one state or action leads (that is: causes) to another state to result almost perfectly reliably, then the correlation could easily show a correlation of near 1 under normal circumstances. Case in point, pushing a gas pedal will result in a car accelerating in normal circumstances. Level of pedal-push could be correlated with level of car acceleration r > .99. But a car's rate of acceleration and level of pressure put on the gas pedal are not the same thing. They have a near perfect correlation in normal circumstances, but we can imagine some easy experiments that would sever that link.
It is to the field's detriment that researchers reflexively assume that two things correlated at a level of .80 or whatever are "the same thing". My sense is that it generally doesn't show enough appreciation for process.
Posted by: Dustin Wood | 29 April 2014 at 07:52 AM
Perhaps we have a false dichotomy on our hands here. Surely the options are not only "same" and "different", but also there can be "closely related". The eyelets on my shoes are separate physical entities, but they also move in near perfect synchrony with each other, right up until you start dismembering the shoe (or maybe, when it's untied, if you measure their movements on a scale of millimeters you can get them waggling separately). That's just on an empirical level. On a theoretical level you can parse them as separate entities, that are physically separated in space, or you can regard them as part of the larger superordinate entity "shoe", and then they're all one thing.
Which one you would use would depend entirely on your purposes. To someone trying to track my running, measuring one is equivalent to measuring any of the others (and the conclusion "laughably bad" is inevitable either way). To someone interested in shoe surgery, it makes sense to think of them as separate.
To take your original sample, if negative affectivity and neuroticism correlate highly (and they do), then for most people's practical purposes, it's probably most parsimonious to see them as "the same thing". But maybe you have a context where it matters to make the distinction, and you can show that in this context they really do something different. In that case, power to you... but even you would probably go on treating them as the same in most other contexts*.
* I know nothing of your data, maybe you blow this reasoning apart. Could be.
Posted by: Alex Gunz | 05 June 2014 at 05:18 AM
The examples in the website Spurious Correlations (http://www.tylervigen.com/) provide evidence supporting Dustin's assertion that "correlational evidence for whether two things are the same is way, way, way overvalued."
Posted by: Martha Smith | 08 July 2014 at 01:26 PM