the question came up a few weeks ago: if two measures are correlated .60, should they be aggregated into a single measure or kept separate? this might seem like a narrow question, but it raises some deep and complicated issues. at the heart of the matter is: when are two measures measuring the same construct?
thinking about this reminded me of the jingle and jangle fallacies. i have not read the original papers (the jingle fallacy apparently dates back to edward thorndike, 1904, and the jangle fallacy to truman kelley, 1927. i have not tracked down either book/paper. sorry.) the jingle fallacy is calling two things by the same name that are actually different constructs. the jangle fallacy is using two different names for things that are actually the same construct.
let's get specific. let's take the constructs 'neuroticism' and 'negative affect'. are these two the same thing? (
maybe,
maybe not).
one obvious difference is that neuroticism is usually thought of as a stable personality trait, whereas negative affect is typically thought of as a momentary state. but either can be conceptualized either way. in our own studies, we* measure trait neuroticism and trait negative affect, and we also get people to report their state neuroticism and state negative affect several times a day for several weeks (using experience sampling methods). how do we decide whether neuroticism and negative affect are really a single construct (either at the trait or the state level)?
i won't get into the specific results (because then we would not be able to publish them in a peer reviewed journal and get lots of fame and glory), but instead i will reflect on the factors that go into making this decision. there are two broad considerations relevant to deciding whether two measures are actually measuring the same construct: the empirical and the theoretical.
1. the empirical part: how strongly are they correlated with each other?
the strength of the correlation between the two measures is an important clue to whether they are measuring the same construct. but it is tough to know how strong of a correlation counts as strong enough to decide they are not distinct.
the question i opened with - if two measures correlate .60 with each other, should they be aggregated? - is not a straightforward one. in fact, .60 is just about at the threshold of my own intuition about where the line is drawn between convergent and discriminant validity (or maybe it's more like .55, or .568). but how to interpret the .60 correlation depends on a few other things. perhaps most importantly, what is the reliability of each measure? if each measure has a reliability of .60, then the fact that they correlate .60 with each other suggests that they are highly similar, because a measure's reliability is, roughly speaking, the theoretical upper limit to how strongly it can correlate with other measures. this is because a measure's reliability is, in essence, its correlation with itself, so it should not correlate with another measure more strongly than it correlates with itself. (but this is in theory only, in reality measures regularly exhibit correlations with other measures that are stronger than their own reliability. partly because reliability estimates can be wrong, and partly because, as charcot said, 'theory is nice, but it doesn't prevent things from existing.') so if the correlation between two measures is high, and close to the reliabilities of the two measures, then that is pretty strong evidence that they are measuring the same construct.
there is more to the empirical part. for example, do the two measures show similar patterns of association with other measures? do they covary not just across people, but within-person over time? when combined, do they load onto a single factor? but this is a blog so i am going to ignore those issues.
2. the theoretical part: are there important conceptual distinctions between the two constructs?
it is possible that, even if all empirical results point to the conclusion that your two measures are indeed getting at the same construct, you are persuaded by strong theoretical reasons to keep them separate. you are probably wrong, but you can try to make the case.
i am a dust bowl empiricist at heart. i am inclined to believe the data over anyone's theory. so it will be difficult to convince me that two measures that don't pass the empirical test of distinctiveness are actually measuring distinct constructs.
however, if you have a strong theoretical argument why the two things you are attempting to measure are distinct constructs, i will entertain the possibility that your theory is right and your measures are crap. this is perhaps the best use i can think of for theory - sometimes it helps us figure out that we are using really bad measures.
what are some examples of good theoretical reasons to maintain that two things are distinct constructs, even in the face of empirical evidence to the contrary? one is that you believe they arise from different processes. these could be developmental processes, cognitive processes, biological processes, etc. but you still need empirical evidence to back up the process difference, so this is not really a theoretical argument.
another is that you can describe scenarios where the two constructs come apart (this is related to the 'different processes' reason - if they can come apart, there must be some difference in the processes that produce them). this is one purpose of thought experiments. for example, if you can paint a compelling picture of a time when a person would experience neuroticism but not negative affect, then you might be able to convince me that they are different constructs, even if, practically speaking, they almost never come apart. in my view, the scenario you depict has to be relatively plausible (philosophers and others may disagree). i don't care if neuroticism and negative affect could come apart on twin earth, i want to know if they could come apart in a situation human beings are liable to encounter here on this actual planet.
but in the end, i think you have to back up your theoretical argument with empirical evidence. if you really believe your two things are different constructs, you need to be able to observe them coming apart in actual data. that might mean developing new, better measures, or tracking down rare cases or rare situations, or doing other difficult things, but until you do those things, you are not going to convince me based on theory alone.
those are my half baked thoughts. many people have said much smarter things about construct validation. paul meehl, jane loevinger, campbell and fiske, cronbach... and also
some people who are currently alive. i'm not sure why you are reading my blog instead of reading them. come to think of it, i'm not sure why i'm writing my blog instead of reading them.
*by 'we' i mean my awesome graduate students and our fantastic research assistants.