We Built A Polling Model … For The Baseball Hall Of Fame


We analyze elections here at FiveThirtyEight, and we’re not about to miss one — even if it takes place in the world of baseball. On Tuesday, the Hall of Fame’s full Class of 2019 will be announced, and Mariano Rivera, Edgar Martinez and more than 30 other former players will find out if they will join a list of immortals in baseball’s most exclusive club.

Hopefuls earn a spot in the Hall if they’re named on at least 75 percent of ballots cast by typically around 400 voting members of the Baseball Writers’ Association of America.32 (If not, they roll over to next year’s ballot — unless they get fewer than 5 percent of the vote or have been on the ballot for 10 years.) We already have some idea of what percentage to expect for each player, thanks to the efforts of Ryan Thibodaux. Thibodaux is the exit pollster of the baseball world: He and his team of “interns” scour the internet for voters who have made their Hall of Fame ballots public before the full results are announced. Thibodaux tallies them all up on his online Baseball Hall of Fame Vote Tracker — like the first few precincts to report on election night — to give us partial election results weeks in advance.

The problem is, as experienced election-watchers know, the early returns may not be representative of the whole electorate. It turns out that the public vote is consistently higher than the actual vote for some players and lower for others. In pollster lingo, this is because of sampling bias. Voters who are more likely to publish their ballots in advance are those more likely to be on Twitter (the main medium for sharing ballots) and vote a certain way. As the table below demonstrates, players who are favored by the sabermetric crowd (for example, Tim Raines and Mike Mussina) or tied to performance-enhancing drugs (Barry Bonds, Roger Clemens) historically are overrepresented on public ballots, while players perceived as “clean” (Fred McGriff) or whose cases rely on traditional statistics like saves or Gold Glove Awards (Omar Vizquel, Lee Smith) do better on private ballots.

Public ballots overestimate some Hall of Fame candidates

The difference between a player’s vote share on private ballots and his vote share on public ballots at the time of the Hall of Fame announcement, for select candidates

Player 2016 2017 2018 Average
Lee Smith +10.9 +8.3 +9.6
Omar Vizquel +8.1 +8.1
Fred McGriff +3.8 +13.0 +7.2 +8.0
Trevor Hoffman* +7.2 +3.0 +3.2 +4.5
Vladimir Guerrero* -1.3 -4.5 -2.9
Larry Walker +2.4 -3.1 -10.5 -3.7
Jeff Bagwell* -11.8 -3.1 -7.4
Tim Raines* -10.7 -6.4 -8.5
Edgar Martinez -7.7 -16.6 -16.8 -13.7
Mike Mussina -14.0 -16.5 -15.8 -15.4
Roger Clemens -10.5 -20.6 -16.9 -16.0
Curt Schilling -15.2 -13.7 -22.0 -17.0
Barry Bonds -13.2 -23.8 -19.2 -18.8

Positive numbers mean the player did better on private ballots than public ballots; negative numbers mean the player did better on public ballots than private ballots.

*Eventually elected to the Hall of Fame by the Baseball Writers’ Association of America.

Source: Ryan Thibodaux’s Baseball Hall of Fame Vote Tracker

For the past several years, I’ve used these public-private differentials to project the final results of the Hall balloting. For every returning player, I calculate an “adjustment factor” based on whether he has historically over- or underperformed his “polls.”33 Then I add or subtract the adjustment factor to or from the player’s public vote share in Thibodaux’s tracker to get the player’s estimated vote share among private voters. Then, I simply combine the two vote shares proportionally (based on what percentage of the estimated total vote has been made public thus far) to arrive at the player’s final projection. (There’s a different adjustment for first-time candidates — more on that later.)

Below are this year’s projections as of Jan. 16, with 182 ballots revealed out of an estimated 412 expected to be cast. (Dozens more are likely to be made public before the announcement, so for real-time updates to the forecast, follow me on Twitter.)

Rivera, Halladay and Martinez will likely make the Hall

Forecasted results for the 2019 Baseball Hall of Fame election, combining 182 already public ballots and an estimated 230 remaining ballots whose projected vote shares reflect the historical voting bias of private ballots

Players likely to make the Hall % of Public Ballots Adjustment Factor* Estimated % of Private Ballots Projected Final Vote
Mariano Rivera 100.0% 0.0 100.0% 100.0%
Roy Halladay 94.0 -2.4 91.6 92.6
Edgar Martinez 90.7 -16.7 73.9 81.3
Players projected to fall short
Mike Mussina 81.9% -16.0 65.9% 73.0%
Roger Clemens 73.6 -17.8 55.8 63.7
Curt Schilling 74.2 -20.0 54.2 63.0
Larry Walker 67.0 -8.6 58.4 62.2
Barry Bonds 73.1 -20.4 52.7 61.7
Omar Vizquel 36.8 +8.1 44.9 41.3
Fred McGriff 36.3 +8.6 44.9 41.1
Manny Ramirez 26.9 -1.1 25.9 26.3
Todd Helton 19.2 +6.0 25.3 22.6
Scott Rolen 20.9 -3.7 17.1 18.8
Jeff Kent 15.4 +2.1 17.5 16.6
Billy Wagner 15.9 +0.7 16.7 16.3
Gary Sheffield 13.2 +1.3 14.5 13.9
Andruw Jones 8.2 +5.0 13.3 11.0
Sammy Sosa 13.2 -5.3 7.9 10.2
Andy Pettitte 6.6 +2.0 8.6 7.7
Players projected to be eliminated from future ballots
Michael Young 1.6% +0.1 1.7% 1.7%
Lance Berkman 1.1 +0.3 1.4 1.3
Roy Oswalt 1.1 +0.2 1.3 1.2
Miguel Tejada 1.1 0.0 1.1 1.1

Excludes candidates who have received zero votes on public ballots.

*The adjustment factor is derived from historical differences between public and private ballots.

Source: Ryan Thibodaux’s Baseball Hall of Fame Vote Tracker

I’m projecting that three players will surpass the 75 percent threshold required for election. Let’s start with the player who has waited the longest to get here: Seattle Mariners hitting machine Edgar Martinez, who is eligible for election for the 10th and final time.34 Martinez is currently pulling 90.7 percent of the public vote, but — perhaps because of anti-designated-hitter bias — he is one of the candidates who always see precipitous drops from the exit polls to final results. Still, I’m expecting him to win 73.9 percent of the votes that have yet to be revealed, giving him an overall total of 81.3 percent.

With a projected 92.6 percent of the vote, fearsome starting pitcher Roy Halladay also looks assured of election. This is the first time Halladay — who tragically died in a plane crash in 2017 — has appeared on the ballot. That means he has no vote history to go off, so my model treats him (and other first-time candidates) a little differently. First, I calculate which returning candidate’s support is most highly correlated with the new candidate’s support. For Halladay, this was Mike Mussina, which makes sense — both are starting pitchers who fall short of the traditional Hall of Fame standard (300 wins) but have strong peripheral stats. Then I estimate the new candidate’s support among private ballots assuming that correlation will hold steady. For example, so far in public ballots, Halladay has won 96.6 percent of voters who also voted for Mussina but only 81.8 percent of voters who didn’t vote for Mussina. I applied those same percentages to the projected number of private ballots with and without Mussina.

Mariano Rivera — indisputably the best closer of all time — rounds out this group.35 Currently, I’m projecting him to get a round 100 percent of the vote, which would make him the first unanimous selection in Hall of Fame history. However, I freely admit that candidacies like Rivera’s — another first-time candidate — are a blind spot for the model. With no public “no” votes on Rivera, he correlates equally poorly with every returning candidate.36 Therefore, our best option is to look at how other closers, like Trevor Hoffman and Lee Smith, have fared — and if there is one ironclad rule for the public-private differentials, it’s that closers do even better with private voters than they do with public ones. You can’t do better than 100 percent, obviously, so Rivera’s adjustment factor is simply zero.

The most interesting finding belongs to a candidate who isn’t currently projected to make the Hall: Mussina. Although his name has been checked on 81.9 percent of ballots revealed so far, he historically has performed about 16 points worse among private voters. At the current ratio of public-to-private ballots, that computes to a final vote share of 73.0 percent — just 2 points shy of election. But if there’s one thing you should have learned from reading FiveThirtyEight, it’s that polling errors of 2 points happen all the time, so Mussina could definitely still pull this out. Even if Mussina doesn’t make it this year, though, he’s a cinch to get in eventually, as he has four years of eligibility remaining.

No other candidates are particularly close to being elected, according to my projections, although that doesn’t mean their forecasted vote shares aren’t interesting. If you just take Thibodaux’s tracker as gospel, you might think Barry Bonds (73.1 percent of public ballots) and Roger Clemens (73.6 percent) are knocking on the door. But the two superstars, both rumored to have been heavy steroid users, are virtually guaranteed to bomb with private voters. Their projected percentages in the low 60s are only incremental improvements from their 2018 vote shares, creating genuine suspense over whether they will be elected before their window of opportunity expires in 2022. On the other hand, Larry Walker is forecasted to finish at 62.2 percent, which would be a 28.1-point improvement over last year — a record. Amazingly, that would mean the ex-Rockies outfielder, whose candidacy seemed stuck in purgatory just last year, would have a real shot at election in 2020, his final year on the ballot.

Finally, the exit polls imply that a few candidates — including Andruw Jones and Andy Pettitte — are in danger of falling off the ballot entirely. But my model thinks they’ll live to be voted upon another day. Last year, Jones won private ballots at twice the rate that he won public ballots, and first-time candidate Pettitte’s support is negatively correlated with Martinez’s support, implying that he’ll gain on private ballots as well. The only candidates I’m expecting to drop off are those whose support in Thibodaux’s tracker is so limited that it would take a miracle among private ballots to save them. Sorry, Michael Young, Lance Berkman, Roy Oswalt and Miguel Tejada.