The folks over at Vox asked FHQ to put together a delegate projection model of the Republican race similar to the one we created with The New Yorker in 2012. Below are the details:
The model
The model
With more than 20 contests in the books in the Republican presidential nomination process in 2016, there is enough data to begin looking at patterns from the results. While there are a number of demographic factors that highly correlate with Donald Trump’s share of the vote in those contests -- percentage with a college education, percentage of non-white population, religiosity (in various forms), among others -- the catch is that those same variables do not all adequately explain the variation in the other three candidates’ shares of the vote across states. 
Ted Cruz’s vote share across the contests to this point, for instance, increases more in caucus states and those in the interior West. The Texas senator is helped more in those states than his opponents -- Trump most especially -- are hurt. Kasich does best in highly educated states with lower religiosity and Rubio is all over the map. The vote share for the junior Florida senator is most highly (and negatively) correlated with Kasich’s share of the vote. As Kasich votes have increased at the state level, Rubio’s has decreased.
Ted Cruz’s vote share across the contests to this point, for instance, increases more in caucus states and those in the interior West. The Texas senator is helped more in those states than his opponents -- Trump most especially -- are hurt. Kasich does best in highly educated states with lower religiosity and Rubio is all over the map. The vote share for the junior Florida senator is most highly (and negatively) correlated with Kasich’s share of the vote. As Kasich votes have increased at the state level, Rubio’s has decreased.
Given that variety, a kitchen sink approach to explaining all four candidates’ vote percentages across states and projecting across the remaining states was not fruitful. The best performing model was one in which the percentage of non-white population, an interior West dummy variable and Cook’s PVI were regressed on the candidates’ shares of the vote in the contests that have been completed. Data for that parsimonious model was available not only the state level but at the congressional district level as well.
Those regressions, in turn, allow for a projection of the candidate’s vote shares in the upcoming contests both statewide and, where necessary, at the congressional district level. A similar process was used for a projection model the New Yorker published later in the 2012 Republican race. Once predicted, those vote shares in upcoming states allows for a projection of the delegate count through the conclusion of the contests on June 7.
The projection
The resulting projected delegate allocation is revealing. Building on the FHQ Delegate Count as a baseline, Trump crests above the 1237 delegates necessary to clinch the Republican nomination, but would not do so until June 7. California would push him over the top. The model also predicts Trump victories in both Florida and Ohio on March 15. That 165 delegates is a significant chunk of delegates in Trump’s column. 
However, it should be noted that the model does not include polling or any measure of how well candidates are or are perceived to be doing in various states, including their home states (especially in Kasich’s case). But even without Ohio’s 66 delegates, Trump is above 1200 delegates and would still have up to 250 other delegates to woo in order to close the small gap.
Cruz does well in the interior West as the discount placed on Trump and Cruz’s overperformance in that area through March 8 pushes him past Trump in those states. That is handy in a number of truly winner-take-all states like Montana, Nebraska and South Dakota, but those are small delegation states that more than overtaken by Trump wins in winner-take-all Arizona, Delaware and New Jersey (on top of Florida and Ohio). Cruz does well, but only musters about half of Trump’s delegate total in this projection. What the model demonstrates is that Trump would use March 15 to stretch his current delegate lead and then break away in the late April contests in the mid-Atlantic and northeast.
The delegate allocation rules really hamper Rubio and Kasich if they continue. Third place (or lower) was no place to be on Super Tuesday on March 1 due to the allocation rules and that is even more true after March 15. The combination of winner-take-all, winner-take-most and proportional allocations with thresholds significantly decrease the odds that Rubio and Kasich would be able to acquire many more delegates.
Given what is known from the results so far, the models project a delegate count that looks like this:
Trump: 1279
However, it should be noted that the model does not include polling or any measure of how well candidates are or are perceived to be doing in various states, including their home states (especially in Kasich’s case). But even without Ohio’s 66 delegates, Trump is above 1200 delegates and would still have up to 250 other delegates to woo in order to close the small gap.
Cruz does well in the interior West as the discount placed on Trump and Cruz’s overperformance in that area through March 8 pushes him past Trump in those states. That is handy in a number of truly winner-take-all states like Montana, Nebraska and South Dakota, but those are small delegation states that more than overtaken by Trump wins in winner-take-all Arizona, Delaware and New Jersey (on top of Florida and Ohio). Cruz does well, but only musters about half of Trump’s delegate total in this projection. What the model demonstrates is that Trump would use March 15 to stretch his current delegate lead and then break away in the late April contests in the mid-Atlantic and northeast.
The delegate allocation rules really hamper Rubio and Kasich if they continue. Third place (or lower) was no place to be on Super Tuesday on March 1 due to the allocation rules and that is even more true after March 15. The combination of winner-take-all, winner-take-most and proportional allocations with thresholds significantly decrease the odds that Rubio and Kasich would be able to acquire many more delegates.
Given what is known from the results so far, the models project a delegate count that looks like this:
Trump: 1279
Cruz: 601
Rubio: 235
Kasich: 78
The weaknesses
The weaknesses
What is not accounted for in this model are contingencies triggered by any of the candidates currently in the race dropping out. That ideally would require more data, but could also be overcome with a redistribution of the current vote shares. However, the latter course, is potentially prone to errors. When in doubt, gather or wait for more data. 
This series of models could also be vulnerable to criticism concerning how robust the included variables explain the variation in the candidates’ vote shares in the completed contests. With only three independent variables, the door is opened to omitted variable bias.
Finally, close observation will reveal that in most cases, statewide winners are also sweeping all of the congressional districts in the winner-take-most/winner-take-all by congressional district states. This is a function of the small standard deviation in the percentage of non-white population variable being used to predict future vote shares. The needle moves, but not much on the congressional district level. Yet, the interesting thing is that both Trump and Cruz do better in states and districts with a higher percentage of non-white population. Whether it increases or decreases, Trump and Cruz are affected in the same direction. When Trump loses share as the non-white population drops, Rubio is gaining. Cruz is most consistently in second place in those districts behind Trump, but Rubio overtakes Cruz the lower the non-white population is. That may or may not have some bearing on future delegate allocations if Rubio drops out. But the bottom line here is that a multi-candidate race seems to be hurting Cruz or the candidate in second place. It would be more troubling if there was an inverse relationship between Trump and Cruz with respect to the percentage non-white population variable.
This series of models could also be vulnerable to criticism concerning how robust the included variables explain the variation in the candidates’ vote shares in the completed contests. With only three independent variables, the door is opened to omitted variable bias.
Finally, close observation will reveal that in most cases, statewide winners are also sweeping all of the congressional districts in the winner-take-most/winner-take-all by congressional district states. This is a function of the small standard deviation in the percentage of non-white population variable being used to predict future vote shares. The needle moves, but not much on the congressional district level. Yet, the interesting thing is that both Trump and Cruz do better in states and districts with a higher percentage of non-white population. Whether it increases or decreases, Trump and Cruz are affected in the same direction. When Trump loses share as the non-white population drops, Rubio is gaining. Cruz is most consistently in second place in those districts behind Trump, but Rubio overtakes Cruz the lower the non-white population is. That may or may not have some bearing on future delegate allocations if Rubio drops out. But the bottom line here is that a multi-candidate race seems to be hurting Cruz or the candidate in second place. It would be more troubling if there was an inverse relationship between Trump and Cruz with respect to the percentage non-white population variable.
Recent Posts:
The Impact of Divided National Parties on Presidential Elections
2016 Republican Delegate Allocation: NORTHERN MARIANA ISLANDS
2016 Republican Delegate Allocation: MISSOURI
The Impact of Divided National Parties on Presidential Elections
2016 Republican Delegate Allocation: NORTHERN MARIANA ISLANDS
2016 Republican Delegate Allocation: MISSOURI
Follow FHQ on Twitter, Google+ and Facebook or subscribe by Email.