Tuesday, 30 December 2003

Explanation, not prediction

Dan Drezner, subbing for Andrew Sullivan, discusses problems with forecasting models and the media members who latch onto them. One notable oversight in forecasting: virtually all of the existing models predict the nationwide vote, rather than the outcomes of state elections to the electoral college—a particularly problematic consideration when dealing with close elections, like that in 2000. The ones that do make state-level predictions are rather dated.

More to the point, as Matt Yglesias points out, aggregate-level models are often inherently problematic. The problem that Yglesias calls “specification searching”—or what I’d call atheoretical modelling, with a healthy dose of stepwise regression to boot—is endemic to the whole class of forecasting models, because fundamentally they are inductive exercises, focused on finding the best combination of variables to predict the observed outcome. Most good social science (or science in general, for that matter), by contrast, is deductive: establish a truly explanatory theory, develop specific hypotheses, and operationalize and test them.

That isn’t to say, however, that unemployment doesn’t belong in the model at all; it may, for example, be the best available indicator of a theoretical construct like “voters’ perceptions of the national economy.” But as someone whose research interests are more centered on individual-level explanations of behavior, rather than attempting to explain aggregate outcomes, I sometimes wonder if aggregate-level models trade too much scientific value for their parsimony.

See also James Joyner, who points out that small sample sizes aren’t necessarily problematic when the universe is also small. However, in a small sample the good social scientist will be particularly attentive to the potential issue of outliers—atypical observations that can lead one to make conclusions that aren’t justified on the basis of the data as a whole.

Friday, 30 April 2004


David Adesnik has an odd standard for courage among political scientists:

It takes guts for a political scientist to actually predict something. That’s because all that political scientists really have are their reputations, and they can’t afford to put those on the line. So here’s a shout out to Larry Sabato, who isn’t afraid to put his money where his mouth is.

Other than referring David to my post on explanation and prediction, I’d only warn readers that what really takes guts is to get between Larry Sabato and a camera.

Thursday, 26 August 2004

Explanation, prediction, and the Fair model

There’s been some discussion of late of Ray Fair’s model, and particularly its prediction that George Bush will walk away with 57.5% of the two-party vote in November. Bill Hobbs and Don Sensing find this to be interesting—and, at some level, I suppose it is. But I have to mention a couple of caveats:

  1. I seriously doubt either major-party candidate will get 57.5% of the two-party vote. A few numbers for comparison: Ronald Reagan’s landslide in 1984 against Walter Mondale netted 59.2% of the two-party vote, while Bill Clinton’s pounding of Bob Dole got 54.7% of the two-party vote. I’d frankly be surprised if Fair’s forecast is even correct within his stated margin of error (±2.4%). To be gracious to Fair on this point, he does candidly acknowledge that there could be specification issues that would inflate the forecast.
  2. I think forecasting models do a poor job of explaining the causal mechanisms that take place. The national economy doesn’t vote—rather, about a hundred million Americans do, and the effects of the national economy on individuals are for the most part weak (but, admittedly, can be quite strong for voters in particular industries and regions).

Of course, a third caveat is that forecasting the national vote-share is (in my opinion) a misspecification of the institutional conditions under which the election takes place; there are 51 elections (in the 50 states and District of Columbia) that allocate representation in the electoral college, and I generally think that understanding those 51 elections is much more important than forecasting the headline figure, which only has a tenuous relationship with the substantively meaningful outcome (who wins the election).

Also (potentially) of interest: back in my slightly-more-prolific days, I posted a brief exposition of my distaste for (and disinterest in) election forecasting models.