Direction and error in data and elections

The big news story this week was the US presidential election, as well as elections in the House and the Senate. The result isn’t yet fully known, although at the current time, it looks fairly likely that Joe Biden is going to get to 270 votes in the electoral college to become president.


As in the last election, pollsters have been castigated for getting it wrong, and underestimating Trump’s share of the vote. There have been many reasons suggested for this such as the shy Trump vote or simply lack of a response from Trump voters to pollster’s surveys (which might differ if you’ve been working from home or not during the pandemic?). RealClearPolitics last average poll, showed Biden on 51.2% and Trump on 44.0%, ie. +7.2% in favour of Biden. At the time of writing, Biden is on 50.5% and Trump on 47.7%, ie. +2.8% in favour of Biden. In other words the national polls were overestimating the gap by 4.4%.


What appears to be lost in all this criticism, is that the polls (likely) predicted the winner in the US presidential election of 2020, unlike in 2016. As @NathanTankus has noted, polls can be informative, but still have a non-random error that seems to have underpredicted Trump’s support. Indeed, if we look at state by state polling data vs returns for 2016, there does appear to be a consistent pattern (tweet by @stefanjwojcik) of underestimating Republican votes.


A few weeks ago, we looked at the national polls since 1936 for US presidential elections and I’ve put the table below again. In most cases, national polls have been a relatively good predictor, even with all the caveats about the electoral college. Furthermore, it isn’t unusual that the polls are out, but still predict the winner, one notable example was 2008, when polls predicted a +11% advantage for Obama, but in the end were +7.2% in favour of Obama. One point that @MarcosCarreira has mentioned is that it’s important to have an idea of the possible size of the error you could expect as well, as this could well have a bearing on how confident your are on the direction. Several weeks ago, we also talks about how it’s important to understand the distribution of outcomes, rather than purely a point forecast for any event.


The question we need to ask is what is more important: getting the direction of a forecast or reducing the standard error? I would say from a market’s perspective, having a smaller standard error but the wrong direction is not as valuable as getting the direction right. Would a poll have been better to predict Trump on 50.5% and Biden on 47.7%, the error would have less, right? But the direction of surprise would have been wrong, and resulted in predicting the election incorrectly.


This idea of direction vs. magnitude of error is of course not unique to elections. It’s also something we observe with economic data. Here, it is more important to get the direction of an economic data surprise right, and it is more valuable than an error which is smaller, but in the wrong direction.


So yes, the polls do seems to have underestimated the strength of one candidate, and there need to be ways of adjusting for a such a bias. But they did get the winner right, and that is something we shouldn’t lose sight of.