How FiveThirtyEight Is Forecasting The 2016 NCAA Tournament

Mar 14, 2016

UPDATE (6:30 p.m. March 18): We’ve updated this post to add information about the excitement index.

Welcome to FiveThirtyEight’s forecasts of the men’s and women’s NCAA basketball tournaments. We’ve been issuing probabilistic March Madness forecasts in some form since 2011, when FiveThirtyEight was just a couple of us writing for The New York Times. While the basics of the system remain the same, we unveil a couple of new wrinkles each year.

Last season, we issued forecasts of the women’s tournament for the first time. Our big change for this year is that we won’t just be updating our forecasts at the end of each game — but also in real time. If a No. 2 seed is losing to a No. 15 seed, you’ll be able to see how that could affect the rest of the bracket, even before the game is over.

Live win probabilities

Our interactive graphic will include a dashboard that shows the score and time remaining in every game as it’s played, as well as the chance that each team will win that game. These probabilities are derived using logistic regression analysis, which lets us plug the current state of a game into a model to produce the probability that either team wins the game. Specifically, we used play-by-play data from the past five seasons of Division I NCAA basketball to fit a model that incorporates:

Time remaining in the game
Score difference
Pre-game win probabilities
Which team has possession, with a special adjustment if the team is shooting free throws.

These in-game win probabilities won’t account for everything. If a key player has fouled out of a game, for example, his or her team’s win probability is probably a bit lower than we’ve listed. There are also a few places where the model experiences momentary uncertainty: In the handful of seconds between the moment when a player is fouled and the free throws that follow, we use the team’s average free-throw percentage. Still, these probabilities ought to do a reasonably good job of showing which games are competitive and which are in the bag.

We built a separate in-game probability model for the women’s tournament that works in exactly the same way but uses historical women’s data. Thus, we’ll be updating our forecasts live for both the men’s and women’s tournament.

Excitement index

Our March Madness “excitement index” (loosely based on Brian Burke’s NFL work) is a measure of how much each team’s chances of winning changed over the course of the game and is a good reference for picking the best games to flip to.

The calculation is simple: It’s the average change in win probability per basket scored, weighted by the amount of time remaining in the game. This means that a late-game basket has more influence on a game’s rating than a basket near the beginning of the game. We give additional weight to changes in win probability in overtime. Ratings range from 0 to 10, except in extreme cases where they can exceed 10.

The index isn’t perfect — this year’s play-in game between Holy Cross and Southern was good, but perhaps not deserving of its 9.4 rating. But even if it doesn’t quite capture the difference between a closely contested slog and a Dunk City run to the Sweet 16, it does a nice job of quantifying how tight a game was and how many big shots were hit.

Elo ratings

Otherwise, the methodology for our men’s forecasts is also largely the same as last year. But we’ve developed our own computer rating system — Elo — which we include along with the five computer rankings and two human rankings we used previously.

If you’ve followed FiveThirtyEight, you’ll know that we’re big fans of Elo ratings, which we’ve introduced for the NBA, the NFL and other sports. We’ve now applied them for men’s college basketball teams dating back to the 1950s, using game data from ESPN, Sports-Reference.com and other sources.

Our methodology for calculating these Elo ratings is highly similar to the one we use for NBA. They rely on relatively simple information — specifically, the final score, home-court advantage, and the location of each game. (College basketball teams perform significantly worse when they travel a long distance to play a game.) They also account for a team’s conference — at the beginning of each season, a team’s Elo rating is regressed toward the mean of other schools in its conference — and whether the game was an NCAA Tournament game. We’ve found that historically, there are actually fewer upsets in the NCAA Tournament than you’d expect from the difference in teams’ Elo ratings, perhaps because the games are played under better and fairer conditions in the tournament than in the regular season. Our Elo ratings account for this and also weight tournament games slightly higher than regular season ones.

Elo ratings for the 68 teams to qualify for the men’s tournament follow below.

2016 NCAA Tournament team ratings
			RATINGS		PROBABILITY OF…
TEAM	REGION	SEED	ELO	COMPOSITE	FINAL 4	CHAMPS
Kansas	South	1	2097	94.5	45.1%	19.1%
North Carolina	East	1	2075	93.9	43.6	15.0
Virginia	Midwest	1	2052	92.5	30.4	9.8
Michigan State	Midwest	2	2078	91.8	33.9	8.9
Oklahoma	West	2	1972	90.0	32.0	6.8
Villanova	South	2	2045	91.3	22.4	6.4
Kentucky	East	4	2014	90.7	15.9	4.4
West Virginia	East	3	1956	89.3	16.2	3.4
Purdue	Midwest	5	1938	88.7	13.0	2.7
Oregon	West	1	2033	88.0	22.6	2.6
Texas A&M	West	3	1915	86.8	12.4	2.4
Xavier	East	2	1973	87.7	9.9	1.8
Arizona	South	6	1953	89.0	6.0	1.8
Duke	West	4	1910	87.3	12.1	1.7
Maryland	South	5	1876	87.4	6.3	1.3
Indiana	East	5	1938	87.4	5.8	1.1
Miami (FL)	South	3	1933	87.1	4.9	1.0
Iowa State	Midwest	4	1867	86.5	6.4	1.0
Baylor	West	5	1837	85.5	6.0	1.0
Texas	West	6	1788	84.7	5.9	0.9
Utah	Midwest	3	1887	86.6	5.3	0.8
Wichita State	South	11	1893	86.6	2.7	0.7
California	South	4	1871	86.5	4.0	0.7
Iowa	South	7	1904	85.9	3.2	0.6
Vanderbilt	South	11	1846	85.6	2.4	0.5
Gonzaga	Midwest	11	1916	86.0	3.2	0.5
Wisconsin	East	7	1896	84.8	2.9	0.4
Notre Dame	East	6	1832	84.4	2.6	0.3
Connecticut	South	9	1872	85.3	2.1	0.3
Cincinnati	West	9	1794	83.7	3.2	0.3
Butler	Midwest	9	1815	84.2	2.5	0.3
Seton Hall	Midwest	6	1914	84.5	1.8	0.2
Virginia Commonwealth	West	10	1798	83.1	2.2	0.2
Dayton	Midwest	7	1788	82.4	1.6	0.1
Syracuse	Midwest	10	1772	82.7	1.3	0.1
Pittsburgh	East	10	1787	82.3	1.2	0.1
Saint Joseph’s	West	8	1814	81.6	1.1	0.1
Providence	East	9	1824	82.5	0.8	0.1
Northern Iowa	West	11	1751	80.2	0.8	<0.1
Stephen F. Austin	East	14	1824	81.0	0.4	<0.1
Colorado	South	8	1756	81.5	0.4	<0.1
Yale	West	12	1792	80.2	1.0	<0.1
Texas Tech	Midwest	8	1777	81.3	0.4	<0.1
Tulsa	East	11	1690	79.9	0.2	<0.1
Michigan	East	11	1768	79.6	0.3	<0.1
Southern California	East	8	1733	81.4	0.2	<0.1
Arkansas-Little Rock	Midwest	12	1734	78.9	0.2	<0.1
South Dakota State	South	12	1735	78.6	0.2	<0.1
Temple	South	10	1730	78.5	0.2	<0.1
North Carolina-Wilmington	West	13	1722	77.7	0.2	<0.1
Oregon State	West	7	1740	77.6	0.2	<0.1
Iona	Midwest	13	1759	78.2	0.1	<0.1
Green Bay	West	14	1667	76.2	0.1	<0.1
Stony Brook	East	13	1663	77.1	0.1	<0.1
Chattanooga	East	12	1610	76.6	<0.1	<0.1
Hawaii	South	13	1737	78.0	<0.1	<0.1
Fresno State	Midwest	14	1708	76.6	<0.1	<0.1
Buffalo	South	14	1613	75.7	<0.1	<0.1
Cal State Bakersfield	West	15	1635	75.0	0.1	<0.1
Middle Tennessee	Midwest	15	1638	75.0	<0.1	<0.1
North Carolina-Asheville	South	15	1553	74.2	<0.1	<0.1
Weber State	East	15	1623	73.3	<0.1	<0.1
Florida Gulf Coast	East	16	1544	71.4	<0.1	<0.1
Southern	West	16	1392	68.0	<0.1	<0.1
Austin Peay	South	16	1477	68.8	<0.1	<0.1
Hampton	Midwest	16	1488	68.6	<0.1	<0.1
Holy Cross	West	16	1420	66.9	<0.1	<0.1
Fairleigh Dickinson	East	16	1417	66.7	<0.1	<0.1

Note, however, that Elo is still just one of six computer rankings that we use for the men’s tournament. The other five are ESPN’s BPI, Jeff Sagarin’s “predictor” ratings, Ken Pomeroy’s ratings, Joel Sokol’s LRMC ratings, and Sonny Moore’s computer power ratings. In addition, we use two human-generated rating systems: the selection committee’s 68-team “S-Curve”, and a composite of preseason ratings from coaches and media polls. The eight systems — six computer-generated and two human-generated — are weighted equally in coming up with a team’s overall rating.

We’ve calculated Elo ratings for men’s teams only. For women’s ratings, we rely on the same composite of ratings systems that we used last year. You can find more about the methodology for our women’s forecasts here.

As has been the case previously, our ratings are also adjusted for travel distance and (for men’s teams only) player injuries. Our injury adjustment has been slightly improved to account for the higher or lower caliber of replacement players on different teams: Stony Brook, for example, won’t be able to replace a star player as easily as Kentucky can.

As a final reminder, these forecasts are probabilistic — something especially important to consider in the men’s tournament this year when there’s about as much parity among teams as we’ve ever seen. In some sense, every team but the UConn women should be thought of as underdogs to win the tournament this year.

Check out FiveThirtyEight’s 2016 March Madness Predictions.

Tags: 2016 NCAA Basketball Tournament Basketball College Basketball March Madness March Madness Predictions