For the polling geeks of the world, predicting a Barack Obama victory in the days before Tuesday’s election was a no-brainer. But Emory political science professor Drew Linzer pegged Tuesday night’s results last summer: 332 electoral votes for Obama; 206 for his opponent, Mitt Romney.
“The prediction the model spit out in June turned out to be fairly accurate,” Linzer said Tuesday afternoon. In fact, if Obama holds his lead in Florida, it was exactly right.
Linzer is a member of a small but growing tribe of data scientists who are applying sophisticated statistical techniques to wring more information from the thousands of poll results publicly available during a presidential election. By mixing in information about the economy and candidate popularity, they produce election forecasts that are much more reliable than any single poll.
Statistician and former professional poker player Nate Silver was the first to popularize the approach. During the 2008 elections, he decided that statistical models he used to predict baseball performance could be adapted to predict election outcomes. That year in his FiveThirtyEight blog (there are 538 votes in the electoral college), he correctly predicted election results in all but one state.
It was also in 2008 that Linzer earned his PhD and became an assistant professor in the Emory University political science department after having worked three years in the public opinion polling industry.
He started playing around with his own forecasting model, applying it to 2008 election data to test its predictive power. Last summer, he decided to share the results of his model applied to the 2012 election through a web site, Votamatic.org.
Soon he was being mentioned in news stories alongside Silver, and Sam Wang of the Princeton Election Consortium, and his site was getting 70,000 to 80,000 hits a day.
The problem with any individual poll, Linzer said, is a natural limit to its accuracy. With unlimited time and money, a pollster could talk to all 100 million or so voters and know how they are thinking about casting their ballots.
Most polls, though, are limited to asking a few hundred to a few thousand people and will never tell you the real number. So they rely on probability theory, which says that random survey results cluster around the real number so that there is a 95 percent chance your poll results is within a set range of the real number. .
If there are results from many polls, however, then the sampling errors — the differences between the samples and the real number — should cancel out, and the average of all the polls should get you much closer to the real number.
Linzer’s model, as well as Silver’s, also throws in other information that, history shows, impacts election outcomes: economic growth rates, incumbency and candidate approval ratings. Early in the election cycle, Linzer’s model places more weight on these factors. As more polling data becomes available and as more people pay attention to the election and opinion solidifies, Linzer gives poll results more weight.
The result, Linzer hopes, is more accurate long-term and short term election forecasts. But as a political scientist, he’s even more interested in the more accurate daily picture that can give insight into how events impact campaigns and how public opinion can change.
For the 2012 presidential race, Linzer said his model showed four phases. Obama started in the lead, but his support gradually shrank through the summer. The conventions reversed that, starting a gradual revival in Obama’s support up to the first presidential debate.
That debate produced an immediate drop in Obama’s support. But the damage was limited to just 2 percentage points, the model suggested, not the 6 to 8 points some polls showed. After the final debate, Obama’s support grew again until, by election night, it was back to where it had started last summer.
In the end, Linzer said, support for both candidates proved to be very stable throughout the election. Both were well known to voters from the beginning, and there were many fewer undecided voters than in previous elections.
The billions spent on advertising by the two campaigns and their supporting PACs and super PACs, Linzer said, mostly canceled each other out.
“It was an arms race. But you can’t expect unilateral disarmament,” he said.
And those daily jumps and dives in poll results obsessed over by pundits: mostly sampling error, Linzer said.
Linzer said he plans to refine his Votamatic web site for the 2016 election.
“The science of this is still new, in my opinion,” Linzer said. “I wouldn’t say that we’ve solved this yet.”