FiveThirtyEight Picks the N.C.A.A. Bracket

The long march of the Republican nomination hasn’t been very friendly to those political writers who were hoping to enjoy March Madness. But I did dust off the FiveThirtyEight N.C.A.A. forecasting model, which is back with a fresh set of projections for this year’s men’s field. You can find those forecasts here.

The FiveThirtyEight tournament forecasts are probabilistic, and come to an objective estimate about the odds of each team advancing to each stage of the tournament.

It is up to you to figure out how to use that information to fill out your bracket. It isn’t necessarily the case that you should always go with the FiveThirtyEight favorite. Many bracket competitions give bonus points for picking upsets or lower seeds; by having some sense for what the odds are in each game, you can make an assessment about whether the reward is worth the risk.

Even in bracket competitions that do not reward upset picks outright, the game theory is such that there is some incentive to pick teams that the other people in your pool won’t pick. In may be strategically optimal to pick a team that has a 45 percent chance of pulling off an upset, for instance, particularly if they are not a trendy upset pick.

And of course, no system has supernatural abilities to predict tournament games. So you shouldn’t hesitate to trust your gut, or to look at what other objective systems are saying.

But the FiveThirtyEight forecasts should strike a nice balance of considering a number of objective factors without going too overboard. The methodology for the picks is exactly the same as last year, with one very minor exception that I’ll explain in a moment.

You can read in detail about how the forecasts are made here; I’ll stick to a relatively brief summary below.

The main component of the forecast is a composite of six ratings systems which are converted to the same scale and averaged together. This includes four sets of computer power rankings, and two that involve human judgment. In some ways, the concept is similar to the one that we use for our polling averages, which are likewise aggregates of different polls.

The computer systems that the forecasts use are as follows: (i) the “predictor” ratings from Jeff Sagarin; (ii) Ken Pomeroy’s Pythagorean ratings; (iii) Joel Sokol’s LRMC ratings and (iv) Sonny Moore’s power ratings. These computer ratings are generally highly correlated with one another, but they do have some slight differences, and all have performed pretty well in the past.

The two sets of human ratings are the Associated Press and USA Today preseason polls, and the 68-team “S-curve” as designated by the N.C.A.A. tournament committee. Both of these require a little bit more explanation.

Why use preseason ratings in March? Because this serves as a proxy for the level of a team’s true talent — after 30 or 35 games, we have a pretty good idea of how strong a team really is, but not a perfect one. Teams that have overachieved preseason expectations in the regular season have tended to underachieve in the tournament, and vice versa.

We’ve also found that, although the N.C.A.A. has some flaws in the way it selects and seeds teams, the seed lines nevertheless have some predictive power — even when other factors like a team’s computer rankings are accounted for.

The new wrinkle this year is that, for the first time, the N.C.A.A. publicly disclosed its full “S-curve” — exactly how it rated the teams from 1 through 68. The program now uses these rankings instead of the seed lines. Thus, for example, it doesn’t see much difference between Georgetown (the lowest-ranked No. 3 seed) and Michigan (the highest-ranked No. 4). Occasionally, in fact, a team may be bumped up or down from its natural seed line because of bracketing principles — if, for instance, it would be forced to play a team from the same conference unless it were moved. In these cases, the FiveThirtyEight model still uses the team’s “S-curve” ranking, since this is intended to represent the N.C.A.A.’s estimate of their true strength.

Although the four computer rankings and two human rankings represent the core of the system, there are three other adjustments that the system makes.

First, the system considers the geographical location of each game. Historically, the farther a team has had to travel to play, the worse it has done relative to its power rating. In extreme cases — say, one team playing 50 miles from its campus, the other coming from across the country — the local team is playing the equivalent of a home game.

Second, the system considers the effect of injuries and suspensions. This is done based on current injury reports, with the value of each player extrapolated out by using the win shares statistic at sports-reference.com.

Usually, the injury adjustment hurts a team. For instance, Michigan State lost freshman star Branden Dawson to an ACL tear earlier this month. The estimate based on win shares is that this will make Michigan State about 2 points worse than its power rating going forward, since the record they accumulated was based almost entirely on games that he was able to play in. This, it turns out, is the most consequential injury so far this year as far as the program is concerned. An injury that occurred very early in the season, like to Notre Dame’s Tim Abromaitis, will not have very much effect since Notre Dame’s power rating already reflects what they were able to achieve without him on the court.

Teams can also receive credit for having a player return from injury if he had missed significant time early in the season but is now ready to go. One example is Vanderbilt center Festus Ezeli, who missed 10 games early in the year during which Vanderbilt played disappointingly. He has been basically healthy since then, however, so those early games may underestimate Vanderbilt’s talent and artificially lower their power rating. The injury adjustment is able to correct for this.

The final adjustment occurs only once the tournament is already underway — the program adjusts its power ratings based on the results of games played so far. (In fact, it is fairly aggressive about
doing so; by the end of last year’s tournament, it would have had you betting on Virginia Commonwealth quite often against the point spread, even though it originally did not like them very much.

So is it worth going through all this trouble? How did the system perform last year?

You can find a complete rundown of that here.

The system did very well through the first full weekend of the tournament, calling 39 of 52 games correctly, or 75 percent. But then it hit the skids, getting just 5 of 15 right from the Sweet Sixteen onward.

Of course, the final few rounds of last year’s tournament were upset city — the team favored by the Las Vegas betting line also won just 5 of the last 15 games.

It was relative to Vegas, in fact, that the system’s performance was more impressive. It is possible to infer the point spread from the odds the program sets. In games where FiveThirtyEight point spread varied by more than 1 points against the Vegas line, it went 26-19 against the point spread, getting 58 percent of its “bets” right. And in cases where the FiveThirtyEight line and the Vegas line differed by more than 3 points, it did even better, going 10-2 against the spread.

That was almost certainly a lucky performance, at least in part. Normally, the forecasts the system makes are very similar to the Vegas line. This year, for instance, there is just one Round of 64 game that the FiveThirtyEight forecasts and the Vegas line call differently — we have Xavier as slight favorites to beat Notre Dame, while Vegas has just the opposite. (We also have Iona rather than B.Y.U. favored in their First Four game in Dayton.)

The real value in the system, however, probably comes in the way it looks at the tournament as a whole. Let me conclude with some specific observations about the bracket this year.

With the exception of Kentucky and Duke in the South region, the No. 1 seeds and the No. 2 seeds aren’t very well differentiated this year. In fact, the forecasts see No. 2 seeded Missouri as the slight favorite to emerge from the West region, and No. 2 seed Ohio State as a favorite over Syracuse in the East. The Midwest region, furthermore, is essentially a toss-up between North Carolina and Kansas.

Speaking of those No. 1 seeds, there is a higher-than-normal chance this year of one of them being beaten by a No. 16 seed for the first time in tournament history. Specifically, there is about a 15 percent chance that at least one of them will lose. The reason has less to do with the No. 1 seeds and more to do with the No. 16 seeds, who are pretty decent this year. The expansion of the play-in round last year meant that the very weakest No. 16 seeds will be weeded out before they play one of the big names, and a lot of the smaller conferences this year had at least one pretty good team.

The worst decision that the tournament committee made? Probably seeding Memphis, whom power ratings have as about the 10th best team in the country, as a No. 8 seed instead. This is one reason that the program likes Missouri rather than Michigan State to emerge out of the West region; Michigan State would be due to draw Memphis in the Round of 32. Another under-seeded team is Belmont, who should have been about a No. 9 seed based on their power rating but got a No. 14 seed instead. They could give Georgetown, whom the system thinks was slightly over-seeded as a No. 3, a good run when they play on Friday.

Memphis is one of just two teams that the FiveThirtyEight forecasts give shorter odds against winning the entire tournament than Las Vegas does (those futures bets are typically weighted heavily against the player). The other is Ohio State. One reason for this is that the teams seeded from about No. 3 through No. 6 this year are somewhat weaker than normal. Some of the teams in the No. 7 through No. 8 range are more interesting, but they almost always have tough draws since they have to play No. 1 or No. 2 seeds early on.

The team benefiting the most from the geographic adjustment is probably Kentucky. Their tournament would take them through Louisville, Atlanta and then New Orleans, all a comfortable travel distance from Lexington. Kansas also benefits, playing their games in Omaha and St. Louis; one reason the system sees the Midwest region as a toss-up is because Lawrence is closer to St. Louis than Chapel Hill in the event they and North Carolina play in the final.

Conversely, the team suffering the most from geography might be Gonzaga. Although they are the higher seed, they have to travel across the country to play West Virginia in Pittsburgh on Thursday in what is practically the Mountaineers’ back yard.

The longest of long shots is Mississippi Valley State. They have a 1 in 32,299,841 chance of winning the tournament.

And Harvard? They have a pretty tough draw, playing a No. 5 seed that the program likes a lot in Vanderbilt in their first game, and probably No. 4 Wisconsin in their second (the Badgers are a perpetual favorite of computer ranking systems). Thus, they have only a 6.7 percent chance of making the Sweet Sixteen. If they get there, however, there will be a big reward: their regional will be played in Boston.

Although the First Four games in Dayton are not listed on the probabilities page, the program does make picks in them. It has Western Kentucky, Iona, Lamar and California as very slight favorites.

Happy bracketing. We will be updating the tournament odds periodically as games are played.