Explanation, not prediction (Signifying Nothing: how does saying nothing at all become so loud?)

Tuesday, 30 December 2003

Explanation, not prediction

Dan Drezner, subbing for Andrew Sullivan, discusses problems with forecasting models and the media members who latch onto them. One notable oversight in forecasting: virtually all of the existing models predict the nationwide vote, rather than the outcomes of state elections to the electoral college—a particularly problematic consideration when dealing with close elections, like that in 2000. The ones that do make state-level predictions are rather dated.

More to the point, as Matt Yglesias points out, aggregate-level models are often inherently problematic. The problem that Yglesias calls “specification searching”—or what I’d call atheoretical modelling, with a healthy dose of stepwise regression to boot—is endemic to the whole class of forecasting models, because fundamentally they are inductive exercises, focused on finding the best combination of variables to predict the observed outcome. Most good social science (or science in general, for that matter), by contrast, is deductive: establish a truly explanatory theory, develop specific hypotheses, and operationalize and test them.

That isn’t to say, however, that unemployment doesn’t belong in the model at all; it may, for example, be the best available indicator of a theoretical construct like “voters’ perceptions of the national economy.” But as someone whose research interests are more centered on individual-level explanations of behavior, rather than attempting to explain aggregate outcomes, I sometimes wonder if aggregate-level models trade too much scientific value for their parsimony.

See also James Joyner, who points out that small sample sizes aren’t necessarily problematic when the universe is also small. However, in a small sample the good social scientist will be particularly attentive to the potential issue of outliers—atypical observations that can lead one to make conclusions that aren’t justified on the basis of the data as a whole.