Equations

23456789

Sunday, June 5, 2016

FiveThirtyEight has an establishment bias



If you are like me, you had a lot of respect for FiveThirtyEight. They represented a new breed of political journalists who could explain the world with hard data. FiveThirtyEight aggregates polls, assigns weights to them based on historic accuracy and produces more accurate forecasts than any individual pollster could. Nate Silver shot to fame by forecasting the 2008 Presidential and Senate races with great accuracy.

This Primary season however, FiveThirtyEight has gone from punditry to hackery (not my words). What follows is an analysis of how every one of their forecasts compared to actual outcomes.

Note: Here is Nate Silver explaining how "Polls Plus" averages are calculated. They apparently include "endorsements, state and national fundraising totals, favorability ratings, ideology ratings and national polls".

See the bottom of this post for the raw data for probability distributions and confidence intervals for each democratic primary race. This data was manually transcribed from 538's website by hand. Ive checked them several times for errors, but if you do find one let me know.

StateCandidate"Polls Plus" AverageActual ResultForecast - Actual Result
Alabamasanders23.219.24
Alabamaclinton73.977.8-3.9
Arkansassanders32.229.72.5
Arkansasclinton64.366.3-2
Connecticutsanders46.546.40.1
Connecticutclinton51.151.8-0.7
Wisconsinsanders50.256.6-6.4
Wisconsinclinton47.243.14.1
WestVirginiasanders49.651.4-1.8
WestVirginiaclinton46.635.810.8
Virginiasanders34.635.2-0.6
Virginiaclinton62.364.3-2
Vermontsanders86.786.10.6
Vermontclinton1113.6-2.6
Texassanders32.433.2-0.8
Texasclinton64.665.2-0.6
Tennesseesanders33.932.41.5
Tennesseeclinton62.766.1-3.4
SouthCarolinasanders28.7262.7
SouthCarolinaclinton6773.5-6.5
RhodeIslandsanders49.354.6-5.3
RhodeIslandclinton47.943.64.3
Pennsylvaniasanders39.743.6-3.9
Pennsylvaniaclinton57.855.62.2
Oklahomasanders47.351.9-4.6
Oklahomaclinton47.441.55.9
Ohiosanders43.142.70.4
Ohioclinton54.156.5-2.4
Nevadasanders4647.3-1.3
Nevadaclinton52.452.6-0.2
NewYorksanders41.342-0.7
NewYorkclinton56.358-1.7
NewHampshiresanders57.260.4-3.2
NewHampshireclinton39.9381.9
NorthCarolinasanders36.240.8-4.6
NorthCarolinaclinton60.854.66.2
Missourisanders4849.4-1.4
Missouriclinton48.849.6-0.8
Mississippisanders14.516.5-2
Mississippiclinton79.282.6-3.4
Michigansanders37.549.8-12.3
Michiganclinton6048.311.7
Massachusettssanders44.848.7-3.9
Massachusettsclinton52.350.12.2
Marylandsanders39.833.26.6
Marylandclinton57.563-5.5
Louisianasanders17.923.2-5.3
Louisianaclinton7571.13.9
Iowasanders45.149.6-4.5
Iowaclinton48.349.9-1.6
Indianasanders43.352.5-9.2
Indianaclinton54.247.56.7
Illinoissanders44.148.7-4.6
Illinoisclinton51.850.51.3
Georgiasanders28.928.20.7
Georgiaclinton6871.3-3.3
Floridasanders32.133.3-1.2
Floridaclinton64.964.40.5
Alabamatrump41.443.42-2.02
Alabamacruz16.821.09-4.29
Alabamarubio22.718.664.04
Alabamakasich5.64.431.17
Arkansastrump32.432.79-0.39
Arkansascruz27.930.5-2.6
Arkansasrubio24.924.80.1
Arkansaskasich4.63.720.88
Arizonatrump45.245.95-0.75
Arizonacruz33.927.616.29
Arizonarubio000
Arizonakasich18.410.577.83
Connecticuttrump56.157.87-1.77
Connecticutcruz12.211.710.49
Connecticutrubio000
Connecticutkasich29.328.360.94
Floridatrump4245.72-3.72
Floridacruz18.817.141.66
Floridarubio27.727.040.66
Floridakasich9.76.772.93
Georgiatrump36.838.81-2.01
Georgiacruz20.923.6-2.7
Georgiarubio23.824.45-0.65
Georgiakasich85.592.41
Illinoistrump32.738.8-6.1
Illinoiscruz28.430.23-1.83
Illinoisrubio16.78.747.96
Illinoiskasich20.219.740.46
Indianatrump43.653.25-9.65
Indianacruz37.636.640.96
Indianarubio000
Indianakasich16.27.578.63
Iowatrump25.624.31.3
Iowacruz24.327.64-3.34
Iowarubio18.123.12-5.02
Iowakasich2.81.860.94
Kansastrump33.123.359.75
Kansascruz31.847.5-15.7
Kansasrubio19.516.832.67
Kansaskasich14.411.073.33
Louisianatrump44.141.452.65
Louisianacruz3037.83-7.83
Louisianarubio1711.225.78
Louisianakasich6.36.43-0.13
Marylandtrump50.354.1-3.8
Marylandcruz23.818.974.83
Marylandrubio000
Marylandkasich23.923.220.68
Massachusettstrump48.948.99-0.09
Massachusettscruz9.29.5-0.3
Massachusettsrubio20.517.752.75
Massachusettskasich16.917.94-1.04
Michigantrump36.836.550.25
Michigancruz2324.68-1.68
Michiganrubio14.19.344.76
Michigankasich23.724.26-0.56
NorthCarolinatrump42.740.232.47
NorthCarolinacruz32.636.76-4.16
NorthCarolinarubio9.97.732.17
NorthCarolinakasich12.912.670.23
NewHampshiretrump26.835.23-8.43
NewHampshirecruz1211.630.37
NewHampshirerubio15.710.525.18
NewHampshirekasich15.215.72-0.52
NewYorktrump53.859.21-5.41
NewYorkcruz19.114.534.57
NewYorkrubio000
NewYorkkasich24.924.680.22
Nevadatrump37.145.75-8.65
Nevadacruz2121.3-0.3
Nevadarubio27.123.773.33
Nevadakasich7.63.594.01
Ohiotrump34.435.87-1.47
Ohiocruz18.113.314.79
Ohiorubio3.92.341.56
Ohiokasich41.846.95-5.15
Oklahomatrump33.328.324.98
Oklahomacruz21.934.37-12.47
Oklahomarubio25.226.01-0.81
Oklahomakasich7.93.594.31
Pennsylvaniatrump47.756.61-8.91
Pennsylvaniacruz27.121.675.43
Pennsylvaniarubio000
Pennsylvaniakasich23.419.443.96
RhodeIslandtrump59.762.92-3.22
RhodeIslandcruz12.510.292.21
RhodeIslandrubio000
RhodeIslandkasich25.224.011.19
SouthCarolinatrump30.532.51-2.01
SouthCarolinacruz19.522.33-2.83
SouthCarolinarubio19.822.48-2.68
SouthCarolinakasich9.37.611.69
Tennesseetrump42.238.943.26
Tennesseecruz18.624.71-6.11
Tennesseerubio22.421.181.22
Tennesseekasich6.25.290.91
Texastrump26.626.75-0.15
Texascruz37.943.76-5.86
Texasrubio20.217.742.46
Texaskasich7.14.252.85
Utahtrump9.914.03-4.13
Utahcruz5869.17-11.17
Utahrubio000
Utahkasich30.316.8113.49
Virginiatrump36.734.81.9
Virginiacruz16.116.69-0.59
Virginiarubio30.131.98-1.88
Virginiakasich7.59.54-2.04
Wisconsintrump34.135.02-0.92
Wisconsincruz41.848.2-6.4
Wisconsinrubio000
Wisconsinkasich21.814.17.7


What did we learn? FiveThirtyEight seems to be overestimating Establishment candidates' vote share (Clinton) and underestimated the outsiders' (Sanders, Cruz, Trump).


Note: While they have also been kind to Rubio and Kasich, it must be noted that one of them had too few observations and the other had low vote shares throughout the race, making forecasts difficult.


The distribution of their "error" i.e. their forecast vs. actual result is normally distributed around mean = 0 (Here a "Forecast Error" > 0 means FiveThirtyEight overestimated your vote share. In other words, the candidate did worse than the forecast)




But some candidates are more equal than others. Kasich and Rubio have benefitted from overestimation. Sanders and Trump have largely been underestimated, while Clinton has benefitted from some wild overestimates in some key states (notice the long tail - owing to states like Michigan and Indiana).




Of all the candidates, they love Clinton the most, and Sanders the least. Nate Silver and his lieutenant Harry "whiz kid" Enten don't try to hide it. Here are some of the things they have said in the past. Taken together, you have to wonder if they are data journalists or pro-Clinton hacks. Here are some of the memorable things they have said (paraphrasing)












No, the system isn't rigged against Sanders (which sort of contradicts with #7)





But lets take a closer look at their Clinton vs Sanders forecasts. What follows is an overlay of FiveThirtyEight's forecast vs actual outcome.

Each forecast has a probability distribution and a polling average. According to FivethirtyEight, there is an 80% chance that the actual outcome will fall within the blue-shaded area. A blue dot marks the polling average (also supplied by FiveThirtyEight)

A red line represents the actual result from that primary. A red shaded area indicates that the candidate did better than the forecast, a black shaded area indicated that the candidate underperformed the forecast.

As you can see, FiveThirtyEight has wildly overestimated Clinton's vote share and underestimated Sanders' in some key races.

What does all this mean? FiveThirtyEight is not being objective. They are letting their personal biases affect their work. Worse, they are using their platform to go after Clinton's opponents. That is not journalism, that is just hackery.

Note: I've also uploaded the raw data for these probability distributions here.