How We Made Our N.C.A.A. Picks


I participated in my first N.C.A.A. tournament pool in 1992 when, as a 14-year-old, I correctly predicted sixth-seeded Michigan to reach the Final Four.

I don’t particularly remember what went into the prediction, other than that the pool offered some ridiculous bonus for picking lower seeds — and also, since I grew up in East Lansing, Mich., I could be pretty sure that nobody else would be willing to go all-in on the team from Ann Arbor. I’ve never been asked to return the 75 “units” I took in for finishing first, despite the fact that Michigan’s Fab Five had to forfeit its wins for recruiting violations.

This year, we’ve decided to do something a bit more scientific, analyzing the results for all tournament games since 2003 (a total of 512 games) and evaluating which factors best predicted success. Our forecast is here.

The goal is to have a system that makes good statistical sense and also makes decent basketball sense, as opposed to identifying a bunch of spurious correlations. There’s no Da Vinci code for winning the tournament; it’s just a matter of playing good basketball.

Let me give you an overview of the system; there’s more elaboration down below for those interested in the gory details.

The Simple Version

– First, we create a power rating for each team. The power rating is an aggregation of four computer ratings and two human ratings, all of which have performed well at predicting tournament games in the past:

Objective (computer) ratings:

a) Jeff Sagarin “predictor” ratings;
b) Ken Pomeroy Pythagorean ratings;
c) Joel Sokol LRMC rankings;
d) Sonny Moore power ratings.

Human ratings:

e) Tournament seeds;
f) The Associated Press preseason poll. (The rationale for accounting for preseason expectations is explained here.)

– Next, this power rating is adjusted based on three factors: injuries (and other types of player absences like suspensions), the geographic location of the game, and (after the first round) a team’s performance in tournament games so far.

– Finally, by comparing two teams’ power ratings, we can estimate the likelihood that any team in the 68-team field will beat any other in any given game. This allows us to play out the rest of the tournament and estimate the probability that any team reaches any subsequent round, or wins the national championship.

We’re not promising 100 percent success; there are a lot of things that no system that account for. (There are also a lot of things that people think are important, but probably aren’t.) But, these projections might help you to get an extra game or two right, especially in the later rounds of the tournament when small advantages compound and the stakes are the highest.

Most of you can stop reading here and start filling out your brackets.

The Detailed Version

The basic approach behind the model is to create a power rating for each team that can be used to approximate a point spread. (This is a common approach and one also used, for instance, by Sagarin’s power ratings for USA Today). For example, if Ohio State has a power rating of 95 and Gonzaga has a power rating of 85, that means that Ohio State is a 10-point favorite.

There is uncertainty in this estimate, however, so the model then translates the point spread into a probability of winning: for example, a 10-point favorite should win about 83 percent of the time.

Since the model “knows” how likely any one team is to beat any other, it can then play out all possible permutations for the balance of the tournament, at which point it naturally considers factors like the strength of the opponents a team might have to face in subsequent rounds.

The key question is what goes into the calculation of the power rating; there are several factors that determine it:

Computer Ratings

There are dozens of college basketball power ratings; Kenneth Massey’s website identifies almost 50 systems that are being published. Another 40 or so systems have been published at some point in recent years but have since been retired.

I went about analyzing how well these different formulas had predicted the results of tournament games in the past. I performed several different tests on each of the systems — like how many winners they identified correctly, how well they had predicted margins of victory, and how much value they added in a regression analysis as compared with other approaches.

A problem quickly became apparent, however: there really isn’t that much difference between the systems. They all rely on the same basic information — the scores of games during the regular season, along with the locations where the games were played — and there are only so many sensible ways to blend that information together.

Power ratings that account for margin of victory or defeat in past games are almost certainly better for predictive purposes than systems like the Ratings Percentage Index which don’t. Otherwise, however, the relative performance of two systems in any given tournament will often have as much to do with luck as anything else.

There is a bit of judgment required, therefore, in determining which systems to include. In the end, I included four computer systems, each of which performed strongly across a number of tests. Just as important, these systems each take somewhat different approaches, rather than being derivatives of one another. The four computer systems are as follows:

– Sagarin’s ratings for USA Today. We use the “predictor” version of Sagarin’s ratings that accounts for margin of victory in past games. Although Sagarin has never fully disclosed his methodology — it is most likely based on an iterative process — these are among the oldest power ratings and have a long track record of success.

– Ken Pomeroy’s ratings. Unlike Sagarin’s ratings, Pomeroy’s account separately for offense and defense. They probably have the soundest theoretical basis of any of the systems, and have performed well.

– Third are the LRMC ratings — LRMC is an acronym for “logistic regression, Markov Chain” — designed by Joel Sokol of Georgia Tech. LRMC is a relatively new approach, but it has performed well in recent years. (Sokol also provided me with retroactive versions of his ratings going back to 2003). It is probably the most emperically-driven of the four systems, using the results of home-and-home series during the regular season.

– And finally, the Sonny Moore power ratings, which have been published since 1974. Although Moore’s ratings are superficially similar to Sagarin’s, they have some novel features, such as more heavily weighting the results of recent games.

These four systems each operate on somewhat different scales — for instance, LRMC only ranks the teams ordinally (e.g. Syracuse is ranked No. 13) rather than assigning them a particular rating (e.g. Syracuse has a rating of 86.42). So they are normalized to have the same mean and standard deviation and to be comparable to one another.

Individually, each of the four computer ratings account for one-sixth of the composite power rating, so collectively they account for two-thirds (four-sixths) of it.

Seeds

Another one-sixth of the power rating derives from a team’s seeding in the bracket. Warts and all — for instance, the tournament committee permits itself to place a team either one seed higher or one seed lower than its natural position in order to meet other objectives like regional balance — the seeds do seem to have some informational value.

A team’s seed is translated into a power rating based on the average power rating for teams with the same seed in past tournaments. For instance, a No. 3 seed equates to a power rating of 88.8, while a No. 4 seed equates to one of 87.1.

We take advantage of all the information that the seeding committee provides publicly. Since 2004, for instance, it has identified the top four overall seeds in order, so the team with the No. 1 overall seed will receive a slightly higher rating than the No. 4 overall seed, even though both teams are No. 1 seeds in their given regions. In addition, teams that are assigned to a play-in game are given a penalty equivalent to half a seed.

Preseason Ratings

Another one-sixth of the power rating is based on a team’s ranking in the Associated Press preseason poll.

The rationale for assigning a small weight to preseason ratings is explained at length here and probably has to do with their being an estimate of a team’s talent level (regardless of the results it has achieved over the course of the regular season, which may to some extent be determined by luck or other short-term factors). Preseason rankings, in fact, do roughly as well as any other system in forecasting the outcome of tournament games.

All teams that receive at least 5 points in the preseason poll are considered to be ranked (not just the Top 25). As with the tournament seeds and the other systems, a team’s preseason ranking is translated into a rating that is comparable to the other metrics.

*-*

These factors — the set of four computer ratings, a power rating based on a team’s seed, and a power rating based on it’s preseason ranking — constitute the bulk of the information that the model uses to make its predictions. But there are three other factors that the system considers on a game-by-game basis:

Geography

The location of the game matters; teams playing closer to home tend to overperform their seeds and in extreme cases can have something tantamount to home-court advantage. The geographic adjustment is explained in detail here.

Injuries and Player Absences

A team receives a penalty to its power rating if one of its good players will not be participating in the tournament because of injury, suspension, or for other reasons.

The specific magnitude of the injury adjustment is based on a statistic called win shares that measures the overall amount of value the player has contributed to his team over the course of the regular season. By using some algebraic gymnastics, we translate the player’s win shares into an estimate of how much the team’s power rating would be impacted if the player had been absent for the entire season.

This is roughly equivalent to measuring an injury’s impact in terms of the point spread. For instance, the absence of Brigham Young’s Brandon Davies — who was suspended from the team this month for violating the school’s strict honor code — hurts Brigham Young by about 1.7 points per game according to our math. While that might not sound like a lot, its impact is compounded because Brigham Young — like any team in the tournament — would need to win multiple games against strong opponents in order to advance to the late rounds. We estimate that, by suspending Davies, Brigham Young cut its chances of winning the tournament to about half of what they would be otherwise.

The injury adjustment assumes that a player’s minutes will be replaced by an average Division I basketball player — in other words that the bench player replacing him will be decent. This is arguably a conservative assumption, but keep in mind that most tournament teams are quite strong; even their reserves might be starters by the standards of another team. Still, the absence of player like Davies or St. John’s D.J. Kennedy has a large enough impact to be worth accounting for.

A few other ground rules: a team cannot benefit from “addition by subtraction.” If a mediocre player is hurt, the model instead assumes the injury has zero overall impact.

The way that the adjustment is calculated, an injury sustained late in the year will hurt a team more than one early on, since in the latter case, the impact of the injury will already be manifest in the quality of a team’s play over the course of the season and therefore its power ratings.

Injury information will be updated as necessary as the tournament progresses.

Quality of Play During the Tournament

Our projections will be updated after each day of tournament play (and sometimes more often than that). In addition to accounting for the winners and losers, our model will also adjust its estimate of the strength of team based on how we’ll they’ve performed so far. Our research suggests that the quality of play during one tournament game is worth the equivalent of two or three regular-season games in terms of predicting a team’s play during the rest of the tournament.

Specifically, this adjustment works by comparing a team’s actual margin of victory to that projected by the model before the game. If, for example, the model expected Old Dominion to lose to Butler by 5 points, but instead Old Dominion beats Butler by 10 points, the program will assume that it had underrated Old Dominion before and will increase its rating for subsequent rounds.

Although normally any team that wins a tournament game will be helped by this adjustment, its rating for future games could go down if it won by a smaller margin than expected. For example, if Brigham Young was expected to defeat Wofford by 12 points, and it instead won on a Jimmer Fredette 3-pointer in double overtime, Brigham Young’s rating would be revised downward. In this sense, the model is very Bayesian, constantly revisiting its assumptions as new information becomes available.

There is another, more technical implication to this for the serious probability geeks out there.

Suppose that Duke is facing St. Peter’s. What is the likelihood that St. Peter’s upsets Duke?

Actually, it depends on what round they’re playing in. Our model thinks if St. Peter’s played Duke in the first round, it would have only about a 3 percent chance of winning the game, before accounting for any geographic advantage.

But suppose, instead, that Duke and St. Peter’s were on opposite sides of the bracket, and somehow meet in the national championship game. What odds would St. Peter’s have then?

From the standpoint of Bayesian probability, they’d be quite a lot better (about 8 percent rather than 3 percent, our model thinks). This is because Duke and St. Peter’s meeting in the national championship is conditional on something else happening — namely, St. Peter’s winning five consecutive tournament games to get the championship, each as a heavy underdog. If you looked up the power rating for St. Peter’s after these fives games had been played, it would be quite a bit higher than it is now. Duke’s rating might also be a little bit higher. But because Duke would be favorites in most of these games, the effect would not be so profound.

Our model accounts for these conditional probabilities, and as a result, has a somewhat more optimistic view of lower-seeded teams in later rounds. To take an extreme case, it thinks that Princeton is “only” about a 40,000-to-1 underdog to win the tournament, whereas without this adjustment it would have considered them as a 300,000-to-1 underdog. Alternatively, it thinks that a No. 9 seed, Illinois, has an 0.7 percent chance to win the tournament, rather than 0.4 percent without this adjustment.

*-*

Are there other factors that our model doesn’t account for? Of course.

If you really, really know your hoops, you shouldn’t shy away from considering how two particular teams match up against one another, in terms of their styles of play, which players will guard one another, and so on. These things are hard to model statistically and our model doesn’t attempt to model them; instead, it assumes that a team with a power rating of 90 has the same chance of beating each of two teams with a power rating of 85. That may not be true in practice at all cases.

But there are also a lot of factors that we did evaluate that turned out not to matter much. Everything from free-throw shooting to the experience of the coach to the tempo that a team plays at to how well it plays in road games as opposed to home games — all of these had at most a marginal impact.

Rather, I suspect it’s too easy to get carried away with looking at the little stuff and that such efforts will lead you astray unless you’re in the 99th percentile of basketball dorks, at which point you might consider moving to Macao.

Instead, if you’re really serious about winning your tournament pool, there’s probably a lot more to be gained by thinking strategically. Because most tournament pools only give prizes to the top couple of places, high-risk strategies tend to be rewarded, and specifically picking teams that other people aren’t picking.

If, for instance, our forecast says that a team is a 40-60 or 45-55 underdog, it may nevertheless be the correct pick provided that other people aren’t making it (as opposed to a “trendy” upset pick, which can be a trap).

Don’t pick Pittsburgh to win the tournament if you live in Beaver Falls, Pa., or San Diego State if you live in Escondido, Calif.

Do consider the effects of any bonuses or eccentric scoring rules; they can have profound effects on strategy. In some cases — if your pool allows it — it may even be mathematically correct to fill out a logically inconsistent bracket; for instance, picking a No. 15 seed to upset a No. 2 seed and then “resurrecting” the No. 2 seed in the next round.

But most of all, don’t buy into the hype — the hype about a “hot” team, or a one that “knows how to win,” or one that is playing basketball “the right way,” or most of the other stuff that people talk about on television. It’s not necessarily that this stuff is entirely wrong — there might be some granules of truth amid the mountains of platitudes. But it’s exactly the stuff that everyone else in your pool is listening to, and winning your pool requires differentiating yourself from the pack.

Connecticut, for instance, is a very fashionable pick right now, and I wouldn’t necessarily bet my life on the proposition that they only have a 1-in-142 chance of winning the tournament, as our model seems to conclude (in part because they have a very difficult draw). But I would emphatically recommend against picking them, just because everyone else in your pool is liable to.

Unless everyone else in your pool reads FiveThirtyEight, in which case they’re a sweet pick.