We’re Predicting The Career Of Every NBA Player. Here’s How.

Oct 09, 2015

Congratulations! You, along with the other 3.2 billion people on the Internet, are now the proud owner of CARMELO, an algorithm that forecasts the future performance of NBA basketball players.

The basic premise of CARMELO is simple. For each current NBA player, CARMELO identifies similar players throughout modern NBA history³ and uses their careers to forecast the current player’s future.

According to CARMELO, for example, Washington Wizards point guard John Wall, through this point in his career, is similar to former NBA players Isiah Thomas, Jason Kidd, Steve Francis and Kenny Anderson. Kidd continued to improve as a player through his mid-to-late 20s, while Thomas had a long peak and led the Detroit Pistons to two championships. So both are favorable comps for Wall. Francis and Anderson are less favorable. So while Wall has the potential to develop into a superstar, he’s not a sure thing.

CARMELO originated out of work I did for a 2014 article about the New York Knicks’ Carmelo Anthony. Hence the name, CARMELO, which my colleague Neil Paine (our senior sports writer) and I later developed into a silly backronym (Career-Arc Regression Model Estimator with Local Optimization). But the real inspiration for CARMELO is PECOTA, a system I built for Baseball Prospectus in 2003 to forecast the careers of baseball players. I’ve been thinking about developing a “PECOTA for basketball” for more than a decade and, thanks to help from Neil, Allison McCann (one of our visual journalists) and the rest of my FiveThirtyEight colleagues, we’ve finally gotten around to it.

INTERACTIVE: Check out CARMELO projections for every player in the NBA.

CARMELO is considerably simpler than PECOTA, however. It has fewer bells and whistles. It projects each player’s playing time and overall value on offense and defense, but not his component statistics.⁴ The simplicity is partly by design. We think CARMELO gets the basics right and will be a fun and revealing way to explore the NBA. But we’d like to see how it does before complicating the model further.

Let’s take a quick tour of how CARMELO makes its projections, using Wall as our guinea pig. One warning: The descriptions in the next few sections explain how CARMELO works for veteran players who have completed at least one NBA season; projections for rookies are similar in spirit, but there are a few differences that I’ll explain later on.

Step 1. Define the player’s skills

Before CARMELO can identify comparable players, it needs to define each player’s skills and attributes statistically.

silver-carmelofaq-screenshot1

It starts with some basic biographical information for each player. The most important attribute of all, in terms of determining a player’s future career trajectory, is his age. NBA players, like MLB players, improve on average through about age 27 and then begin to decline after that. The age listed on a player’s CARMELO card reflects his age as of Feb. 1, 2016, the rough midpoint of the upcoming NBA season.

Next, we list a player’s vitals: his height, weight and draft position. It’s almost always better for a player to be taller and bigger, other things held equal. Players chosen with an earlier draft pick tend to have a higher ceiling, meanwhile, even once we control for other variables.⁵

Below a player’s vitals, you’ll see a number of statistics listed. Note that these categories are not projected statistics; instead, they reflect the weighted average of a player’s performance over his past three NBA seasons, with the most recent season weighted more heavily.⁶

We start with a few statistics related to his scoring and shooting ability. (For more precise definitions of these, see Basketball-Reference.com’s glossary page.) Usage rate reflects what percentage of a team’s possessions were “used” by a player in the form of a shot, turnover or trip to the free-throw line. Because there are five NBA players in a team’s lineup at one time, the average usage rate is 20 percent.

True shooting percentage is an “enhanced” version of shooting percentage that reflects the value of 3-pointers and made free throws, in addition to 2-point shots. Players like LeBron James and James Harden, who rank highly in both usage rate and true shooting percentage, are the best scorers in the game, providing both volume and efficiency. We also list a player’s free-throw percentage. Although less important to his overall value, it provides a purer gauge of shooting ability than true shooting percentage, which reflects both shooting ability and shot selection. In fact, it’s best to look at these categories in tandem with one another. The Clippers’ DeAndre Jordan has one of the best true shooting percentages in the league despite being an incompetent free-throw shooter because most of his shots are high-percentage dunks and layups near the rim.

The next two categories, 3-point frequency and free-throw frequency, reflect what shots a player is taking rather than how often he’s making them. (Three-point frequency is the percentage of a player’s field goal attempts that are 3-pointers; free-throw frequency is his ratio of free-throw attempts to field-goal attempts.) It’s usually desirable to rank highly in both departments. Free throws — unless you’re DeAndre Jordan — are generally the most efficient shots in the NBA and are a reward for a player’s ability to work effectively in the paint. Three-pointers, meanwhile, remain more efficient than 2-pointers, on average. Furthermore, ranking highly in one or (especially) both categories can reflect a player’s ability to stretch the court and provide for better floor spacing, which may have a favorable effect on his teammates.

Next are two familiar attributes related to a player’s ball-handling: his assist rate (what percentage of his teammates’ field goals are assisted by him, while he’s on the court) and turnover rate (the share of team possessions that result in a turnover by the player). For CARMELO purposes, a high turnover rate is considered bad, just as it is in the NBA. Wall’s high turnover rate is one of the few major negatives with his game, for example.

Finally are a set of categories related to a player’s rebounding and defense. His rebound rate is the share of rebounds he grabs while on the floor (10 percent is average). His block rate is the share of opponents’ 2-point field-goal attempts that result in his blocking a shot, and his steal rate is the share of opponents’ possessions that end in his stealing the ball. Last is a player’s defensive plus-minus rating. CARMELO’s plus-minus ratings reflect a 50-50 blend of Box Plus/Minus (BPM) and Real Plus-Minus (RPM). I’ll have more to say about plus-minus ratings in the “Fine Print” section down below; the important thing to know for now is that a rating of zero reflects an average defender, rather than a poor one.

Step 2. Identify comparable players

These statistics can sometimes tell a reasonably complete story about each player. In Wall’s case, they describe a high-volume, medium-efficiency scorer who distributes the ball really well. He’s also a good athlete who plays good defense, especially for a point guard. On the downside, Wall commits a lot of turnovers. And he neither shoots all that many threes nor draws all that many fouls, which can make his game flat at times.

These categories, along with a few others related to durability and playing time, form the basis for selecting CARMELO comparables. The basic idea is this: Because Wall is 25 years old this season, CARMELO runs a profile for past NBA players⁷ heading into their age-25 season.⁸ Then it identifies the most similar ones. Historical players start with a perfect similarity score of 100, and points are subtracted for every difference. Because Wall has a high assist rate, for example, a player with a low assist rate will lose a lot of points and is unlikely to be among Wall’s top comparables. CARMELO applies this process for 19 statistical categories, some of which are weighted more heavily than others.⁹

The process sounds complicated, but the comparisons are sometimes intuitively satisfying. As a Pistons fan growing up, for example, I can see the similarities between Wall and his No. 1 historical comp, Isiah Thomas. Compare their stats on Basketball-Reference.com and you’ll see where CARMELO is coming from: They are eerily alike in some respects. Even so, the comparison is not perfect. Thomas drew more contact around the basket, resulting in more free-throw attempts. But he was undersized, whereas Wall isn’t, exactly.

Like snowflakes, in other words, no two NBA players are exactly alike. While a theoretically perfect similarity score is 100, Thomas registers at a 57 instead. By CARMELO standards, that’s high: Many NBA players don’t have any comparables with a similarity score above 50. And similarity scores above 60 are even rarer.

This is partly because of the way CARMELO defines similarity scores. A score of 0 is average, not bad. Dominique Wilkins has a similarity score of about 0 relative to Wall, for instance; they’re not much alike, but they aren’t totally off one another’s radar. Many players will have negative similarity scores instead; Manute Bol’s similarity score to Wall is -113.¹⁰ Here’s a rough guide for interpreting similarity scores:

SIMILARITY SCORE	DESCRIPTION
100	Perfect score; identical
60-99	Separated at birth
50-59	Extremely similar
40-49	Highly similar
30-39	Mostly similar
20-29	Partly similar
1-19	Somewhat similar
0	As similar as dissimilar
<0	More dissimilar than similar

You can find a more technical description of CARMELO similarity scores, which are calculated using a version of a nearest neighbor algorithm, in the footnotes.¹¹

Step 3. Make a projection

Each player’s top 10 comparables are listed in his CARMELO card. Each comp has a mini-graph (sparkline) depicting how that player’s career progressed over the next seven seasons, where applicable,¹² based on wins above replacement (WAR):

So a player’s CARMELO projection is formed just by averaging the career tracks for his top 10 comparables? That’s pointing in the right direction … but doesn’t quite tell the whole story.

For one thing, though only the top 10 comparables are listed on a player’s CARMELO card, the system uses all historical players with a positive similarity score to make its forecasts.¹³ Usually this means that dozens and oftentimes hundreds of players are used in generating a forecast; 179 historical players have a positive similarity score to Wall, for instance. Each player’s contribution to the forecast is weighted by his similarity score: A player with a similarity score of 50 will have twice as much influence on the forecast as one with a score of 25, for example.

The second issue is more technical. Take a look at Stephen Curry’s CARMELO card, for instance.

Although Curry has a few extremely flattering comparables — Michael Jordan! — most of the others are not as good as him. Terrell Brandon, Terry Porter and Chris Mullin, for example, are listed among Curry’s top 10 comps. They were good players, perhaps slightly underrated players, but none achieved the heights of excellence that Curry has already realized. They were a poor man’s version of Steph Curry — in mostly the same style as Curry, but inferior across the board. CARMELO is aware of this problem and has a solution to it called a baseline projection, which I describe in the footnotes.¹⁴

Think probabilistically

A more important theme is that CARMELO’s forecasts are probabilistic. Wall is projected to finish with 8.7 WAR next season, for example. But there’s uncertainty around that estimate. Each player’s chart shows a range spanning the middle 80 percent of likely outcomes for the player.

These forecast ranges are often quite wide. Basketball is possibly the most predictable of the four major U.S. professional sports, but it still contains a lot of uncertainty. Wall’s range spans from 4.7 wins above replacement (not much better than a league-average player) to 12.9 wins (a possible All-NBA candidate), for example. And to reiterate, this range covers only 80 percent of his outcomes. If CARMELO is well-calibrated, then Wall has a 10 percent chance of exceeding the high end of his range (in which case he could be an MVP candidate) and a 10 percent chance of falling below the low end of his range (in which case, he’ll extend D.C.’s sports misery). Some players have wider ranges than others, especially young players such as Andrew Wiggins or players coming off injury such as Paul George.

The fine print

So far, we’ve mostly been discussing a player’s wins above replacement projection. But WAR is the endpoint in a CARMELO forecast and not the starting point. If you scroll down to the bottom of each player’s CARMELO card, you’ll see a section called “The Fine Print,” which provides further insight into how the WAR sausage is made.

In particular, WAR reflects a combination of a player’s projected playing time and his projected productivity while on the court.¹⁵ Productivity is measured by the statistic plus-minus, which requires some explaining.

Mathematically, plus-minus is not that hard to define: It reflects how many points a player contributes to his team’s scoring margin per 100 possessions, relative to an average player.¹⁶ Wall, for instance, had a plus-minus rating of +3.9 for the Wizards last year. That means with Wall on the court, along with four average players, the Wizards were outscoring their opponents by 3.9 points per 100 possessions. Plus-minus can be broken down into offensive and defensive components. Wall had an offensive plus-minus of +2.5 last season, which is how many points he added to the Wizards’ scoring per 100 possessions. And he had a defensive plus-minus of +1.4, which is how many points he subtracted from his opponents’ scoring with his defense.¹⁷

However, there are many versions of plus-minus, ranging from simple to complex. The version we use for CARMELO reflects a 50-50 blend of Daniel Myers’s Box Plus/Minus (BPM), a relatively simple statistic that can be calculated by using conventional “box score” statistics, and Jeremias Engelmann’s Real Plus-Minus (RPM), a more complex statistic derived from play-by-play data.¹⁸

Neil Paine, our senior sports writer, and I had a lot of debates (which echoed long-running arguments within the broader basketball stat-geek community) about which advanced statistics to use for CARMELO before deciding on this BPM/RPM blend. What settled the debate was that the BPM/RPM blend did better than alternatives like PER and Win Shares in a variety of out-of-sample testing.

However, no all-in-one advanced stat is magic, and this is a source of systematic uncertainty in any NBA projection system. If it seems as though CARMELO “loves” or “hates” a certain player, it may be because of how BPM and RPM rate the player. For instance, both BPM and RPM rate the Raptors’ Jonas Valanciunas poorly compared with statistics like PER. So if Valanciunas’s forecast seems pessimistic to you, it’s not because CARMELO expects his performance to decline (in fact, CARMELO has him getting a little better). It’s because BPM and RPM didn’t evaluate Valanciunas as being all that good to begin with.

CARMELO also projects each player’s minutes in upcoming seasons. These forecasts may seem pessimistic. Among the 29 players who played at least 2,500 minutes last year, for example, CARMELO forecasts 26 to play fewer minutes this year. But this reflects the reality of NBA history. Even players who had been entirely healthy up to a certain point in their careers, such as the Pacers’ Paul George, have sometimes suffered catastrophic injuries. Or they underwent some other life circumstance ranging from illness to suspension to an unexpected retirement. In fact, CARMELO’s playing time projections are designed to be slightly optimistic, on average.¹⁹

Which players are included in CARMELO?

Whew. We’ve gotten through WAR wars. Now for a few odds and ends. For instance, are you curious about a certain player but don’t see a CARMELO card for him? Or are you wondering why you are seeing a CARMELO card for a player who’s retired or hurt?

Our interactive includes every player who played at least 100 NBA minutes in the 2014-15 season, or 250 minutes in 2013-14. This includes players such as Shane Battier who we know are retired; we figure there’s no harm in showing their projections in case they decide to return to action this season.

We’re also showing projections for players who we know have suffered a serious, season-threatening injury, such as the Hornets’ Michael Kidd-Gilchrist. The reason for this is transparency; we think it’s cheating to omit a player based on news we’ve subsequently learned about him when that knowledge wasn’t available to CARMELO. However, we do account for injuries when formulating team depth charts, a process I’ll describe in a moment.

Rookie projections

We’ve also run projections for about 80 rookies with college experience; here’s D’Angelo Russell’s sweet-looking projection, for instance. These projections are derived from a database provided to us by ESPN Stats & Info, which includes strength-of-schedule-adjusted college statistics for prospects in the 2001 NBA draft class onward who subsequently played at least one NBA game.

Technically, these rookie projections are produced by a different program from CARMELO, one we sometimes call FABMELO after the Syracuse star (and seeming NBA flop) Fab Melo. However, the principles behind rookie and veteran projections are the same, and the differences boil down to a few relatively minor details:

Rookie projections omit a couple of statistics²⁰ that were not included in the Stats & Info database. They also use Effective Field Goal Percentage (eFG%) in place of true shooting percentage.
The weights assigned to identify comparable players are somewhat different. Draft position is weighted much more heavily in college projections, for example.
Whereas veteran projections treat age as an absolute — a 31-year-old will be compared against only other 31-year-olds — rookie projections are slightly more flexible. A 21-year-old draft pick might be compared against a 20-year-old draft pick if they’re otherwise extremely similar, for instance.
Whereas for veterans, CARMELO formulates a baseline projection based on a player’s age and playing time and plus-minus rating in his past three NBA seasons, rookie projections use a player’s age, draft position and height.²¹

The tl;dr version: Rookie projections rely heavily on a player’s age and draft position. A No. 1 overall pick is almost always going to get a reasonably favorable projection, while a late second-round pick almost always won’t. Still, now and then the system will find a player it really likes (such as Russell) or dislikes (such as Frank Kaminsky) relative to his draft position; we’ll see in a few years how those forecasts turn out.

Finally, there are a few oddball cases. We’ve run rookie projections for a couple of players such as Josh Huestis who were chosen in the 2014 NBA draft but received little or no NBA playing time last year. This includes the Lakers’ Julius Randle, who played in just one game last year before getting hurt. CARMELO is fairly punitive toward players with a “gap year” between their draft year and their first prolonged NBA action, however.

What about draft picks from Europe (or other continents) who didn’t play U.S. college ball? They don’t get full-fledged CARMELO projections. (Sorry, Kristaps Porzingis.) However, we do run simple, baseline projections for them based on their age, height and draft position, so you may see them included in team depth charts.

Team projections and depth charts

In addition to running player forecasts, we’re also releasing team-by-team projections that include projected win-loss totals for each club.²² Here are the Oklahoma City Thunder, for example.

Unlike the player forecasts, these team projections involve some human intervention. In consultation with ESPN’s NBA beat writers, we’ve developed a depth chart for each team, which accounts for current injury information along with other news about a team’s roster construction. However, we aren’t taking too many liberties. If the playing time we assign to a player significantly exceeds the playing time recommended by CARMELO, the system responds by lowering his plus-minus rating; Manu Ginobili would not be very effective if asked to play 36 minutes a game, for example. This has the effect of rewarding deep teams (like Ginobili’s San Antonio Spurs) and punishing those that are stretching to fill out the roster.

So … should I bet on these things?

Hmm. Umm. Probably not? FiveThirtyEight’s relatively simple, RPM-based projections performed quite well last year, edging out Vegas along with most other projection systems. In theory, based on our back-testing, CARMELO should be slightly more accurate still, improving on the simple RPM projections by about 10 percent. But back-testing is not the same thing as seeing how predictions perform in the real world against truly unknown data. Rookie forecasting models can be buggy, moreover. I’d probably hold off until the system has at least a year or two of experience under its belt.

Does Carmelo Anthony get a good CARMELO projection?

No, not really. In fact, CARMELO sort of hates the Knicks; it doesn’t play favorites.

Tags: Basketball CARMELO John Wall NBA