From a statistical standpoint, we know very little about football. Forty-six players on each team suit up per game, and most of them materially affect outcomes — yet, for most of them, only a handful of stats are recorded. The stats we do have are hopelessly entangled, as each player’s performance affects virtually every other’s. And just when the tiniest bit of clarity starts to emerge, the season ends.
That makes it hard for stats to tell us anything, much less everything we’d like to know. Imagine we consulted all the leading sports analytics experts and tasked them with designing and implementing an algorithmic decision engine that reflected the most advanced statistical analysis presently available, respective to each sport. The engine would have to make decisions of all kinds — roster management, game-planning, strategy, in-game tactics.
That engine would have the most trouble with football, by far. For baseball, it’s not clear that the algorithm would be at much of a disadvantage — player value is reasonably well-defined, so the algorithm would make smart win-maximizing roster moves, and there’s a wealth of analysis of in-game situations that sabermetrics has generated, so it would probably be tactically sound as well. In the NBA, the algorithm would likely make decent roster choices (particularly with new player-evaluation schemes that incorporate optical tracking), and it might be better than average at easy problems like substitutions. But it would struggle to determine how different types of players would perform together, and it wouldn’t be good at things like drawing up plays.
In the NFL, however, the algorithm would be a total bust. It might try to execute a Belichick-like win-maximizing roster strategy, but it would struggle to do it with any specificity since no advanced analytics can tell us what any football player is worth with much accuracy.
Even assuming foot-bot wouldn’t get bogged down in the quagmire of player statistics, it wouldn’t have many better insights. Beyond telling its team to go for it on fourth down more often, the algorithm would be incapable of designing good offensive or defensive game-plans. If the team were lucky, it might have a QB capable of executing the offense on his own, but if he wasn’t able, the team would have a hard time scoring points. Its defense, meanwhile, would likely get carved up by veteran offensive scheme-makers in short order.
And by the next season the algorithm would really be fried. The league completely changes every year or so.
That football is complicated and intractable doesn’t make its analysis any less valid, it just demands a different paradigm. If baseball is like checkers, and basketball is like chess, football is like multi-dimensional Go. We’re not going to solve it, ever. The best we can hope to do is to figure a few things out. And if we’re lucky, those things might be useful.
In that vein, football is a virtual playground for Bayesians, who approach empirical analysis by updating prior beliefs when new information emerges. The exciting side of Bayesianism is that it’s relentlessly progressive. Facing the right evidence, one should be willing to abandon long-held and hard-earned beliefs at a moment’s notice. But the equally important flip side to this is that new information must be treated skeptically, and must always be considered in the framework of what you already believe.
So as new data emerges on the field and in spreadsheets, which should win: a strongly-held intuition, with nebulous form, unknown implications and murky origins, or a new but seemingly clear piece of contradictory analysis based on data?
The answer is neither. The good Bayesian sees that either-or as its own controversy, which should be analyzed dispassionately, just as any other problem would be.
In this column, every week, I’m going to try to use that approach to make sense of what we’ve seen on the field in the NFL. What old understandings should we abandon for the new? What new understandings aren’t as compelling as they seem at first blush? What do we know, what don’t we know, and what can’t we know?
In other words, I’m going to write weekly missives about football, and we’ll see where it goes.
Charts of the week
This one is pretty basic — it’s the percentage of teams with a given record (at any point in the season) that made the playoffs since the league started using a 12-team playoff format in 1990. (The chart only covers teams with 11 wins or fewer, because every team that has won more than 11 games has made the playoffs, and vice versa.)
For example, teams that have started the season 3-1 have made the playoffs 63 percent of the time.
Using this, we can see which games in a season tell us the most about a team’s fate: If that 3-1 team wins its next game, it will be 4-1, and 4-1 teams made the playoffs 77 percent of the time. But if it drops to 3-2, such teams have made the playoffs only 50 percent of the time. The leverage of a 3-1 team’s fifth game, then, is 27 percentage points (the difference between 77 percent and 50 percent). The chart below shows this leverage number for each record:
Surprise! Early season games are extremely high leverage, despite the playoffs being practically a whole season away. Teams starting the season 3-0 have made the playoffs 75 percent of the time, while teams that start 0-3 have made the playoffs only 2 percent of the time. Early games are so rich with information value that this is basically the most informative period of the entire season. Nearly every game has big implications.
Gunslinger of the week
A team gets the ball with minutes to go, down by less than a touchdown. If it scores, it wins — but it’s probably the last chance. It picks up a couple of first downs and makes it into enemy territory; the fans are going crazy. Just get the ball in the end zone and the QB is a hero. But then, boom! He throws an interception. Now everyone hates him.
But not Skeptical Football! We salute you, anonymous quarterback.
In a game measured in wins and losses, there is no difference between high-risk and low-risk strategies: Either a decision increases a team’s chances of winning or it doesn’t. Even the craziest, riskiest-looking strategy is either the best call or it isn’t. If something “risky” — like, say, a pass with time running out that could be an interception — gives a team its best hope for victory, not attempting that thing is just flushing wins down the toilet.
To balance out the constant harping on “mistake-prone” or “non-clutch” QBs, every week I’ll pick a QB who took good, smart risks in their relentless pursuit of victory, whether or not those risks succeeded.
While an interception always leads to a bad result — a QB would always prefer not to have thrown it — some are much better than others. Here’s a basic rubric by which to evaluate interceptions:
- Was the situation appropriate for aggressive play?
- Was the ball thrown downfield? Would it have at least yielded a first down if the pass were completed?
- Did the pass have a fighting chance of being completed?
So, for example: Perpetual fourth-quarter goat Tony Romo had three interceptions in Week 1. All were with his team trailing substantially — a good time to throw picks, so he seems to do well on No. 1. But one of his picks was thrown on first and goal at the five. High-risk strategies in that spot are both costly and unnecessary, so that fails Nos. 1 and 2. Also, his first pick was right into the hands of his opponent, failing No. 3. So Tony Romo is not our Gunslinger of the Week.
This week’s winner is — drumroll please — Andrew Luck!
With 10 minutes to go in the Indianapolis Colts’ game against the Denver Broncos on Sunday, down 21 points, Luck threw his first touchdown of the day. He got the ball back still down 14 points with 7:46 to go, and drove his team to the Denver 32, and then, with a good chance to close the gap to one score, he threw an interception from the Denver 32 (with the ball traveling to the 12). Groan, right? But there’s no point in settling for field goals when you’re down 14 points with only 5:30 to go.
Following a heartbreaker like that, lesser quarterbacks might roll over and accept the inevitable, but Luck continued his aggressive play. He got the ball back with 4:15 left and scored again, ultimately leading to a potential game-tying drive that stalled on the Denver 39. Taking good risks equals improving a team’s chances of winning.
It’s fitting that Luck should be the inaugural winner of this award, since he had one of the greatest gunslinger-y games of all time in last year’s playoffs against Kansas City. After KC scored to push its lead to 31-10 late in the first half, I tweeted:
Luck proceeded to throw three interceptions, and at one point — down 38-10 in the third — the odds of a Colts victory were as low as 0.1 percent. But he also threw three more touchdowns and memorably took a Donald Brown fumble to the house. That play drew them within 3 points, and then it seemed inevitable:
Indianapolis got the win, and Luck added a playoff gem to his collection of fourth-quarter comebacks in the last two years.
The Hacker Gods read FiveThirtyEight!
Somewhere in a higher-order universe, the Hacker Gods who programmed this reality (and like to mess with us from time to time) are reading FiveThirtyEight, and it seems that this week, they’ve decided to make my analysis of Philip Rivers look good. In my AFC West preview, I wrote about Rivers’s tendency to lose close games — including a breakdown of when he tends to throw most of his interceptions and touchdowns. Conclusions: He tends to throw his interceptions early in the game and when it’s close, and he throws more touchdowns when substantially ahead or behind.
And lo, Rivers and the Chargers lost another close game — this time to the Arizona Cardinals 18-17, with Rivers throwing a characteristic early interception.
And he failed to gamble when he needed to most: On the last drive — starting with 2:25 left in the game and the Chargers down a point, Rivers threw 2, 5, 5, 5, and 3 yards downfield. Ugh! Not scoring on that one drive was going to lead to exactly the same number of losses as a whole game full of sacks and interceptions. Chuck it, Phillip!
Rookie QB watch
There aren’t many rookie QB’s to watch this early in the season, so let’s start with how to watch them best. Ignore completion percentage, ignore EPA and WPA, ignore interceptions, even ignore a team’s number of wins with the rookie behind center. What you shouldn’t ignore: the number of games a rookie starts, the number of touchdowns he scores, and number of yards he throws. Those are the best indicators of future success for a rookie, and that’s about it. There are a lot of ways to run regressions on rookie stats to see how likely these players are to have good careers, but here’s a basic version with quick and dirty t-values for each stat:
To see which stats are predictive, we’re looking to see which have t-values higher than 2 or lower than -2, so interception percentage doesn’t make the cut (I included it just to demonstrate that fact).
Meanwhile, completion percentage is like the RBI of football — any predictive value it could have is completely covered by other stats.
So what does that mean for this year’s crop of rookies? Every week that Teddy Bridgewater and Johnny Manziel sit is more bad news for their career prospects. From a purely predictive perspective, it’s better for them to play and have a bad game than to sit.
It’s better news for Oakland quarterback Derek Carr, who started in Sunday’s performance against the Jets (good), had two touchdowns (good) and no interceptions (doesn’t matter) in a losing effort (doesn’t matter), but only threw for 151 yards (bad). With Carr continuing to start while others continue to sit, his stock continues to rise.
Tweet of the week
The best thing about Fantasy Football is that it encourages a lot of people to pay closer attention to football. The worst thing about Fantasy Football is having to hear people talk about it all the time. It’s like hearing “bad beat” stories at the poker table, except less interesting and with less at stake. So my tweet of the week goes to:
Note: From time to time I’ll also respond to tweets directed at me (whether in question form or not) in this space, so please send your football-related inquiries and/or venom to me @skepticalsports.
Most empirically significant game of Week 2
Last weekend the New England Patriots got smacked down by mighty Ryan Tannehill and the Miami Dolphins. Tom Brady was pretty horrible, getting just 249 yards on 56(!) attempts. Only 21 percent of his passes went for first downs, which was the lowest of any quarterback this week. (Note: He also threw no interceptions, even after the Patriots fell further and further behind. No Gunslinger Award for you, Tom!)
This sets up a must-win game for the Patriots next week. That’s right, MUST WIN. In the Chart of the Week, there’s a 29 percentage point difference in playoff qualification rates between teams that start 0-2 and teams that start 1-1. But, you say, surely this doesn’t apply to New England: They were 12-4 last year.
You’d partly be right. Being 12-4 last year improves the Patriots’ chances (in a vacuum) of making the playoffs over a typical team. Here’s the equivalent of the chart above just for teams that won 12 or more games the year before:
At 0-1, if we knew nothing about the Patriots other than their record last year, we would expect them to make the playoffs only 37 percent of the time (which is still higher than a 25 percent rate for all teams). If they win their next game, they’ll join the 1-1 group that has made the playoffs 57 percent of the time. But if they lose, they’ll join the 0-2 group that has only made the playoffs 20 percent of the time. That means this coming game has 37 percentage point leverage, which is actually higher than it would be for a normal 0-1 team. Here’s what the leverage chart above looks like for 12+ win teams only:
When teams start the season cold, it makes it a lot more likely that they’re bad, with every loss dramatically decreasing their chances of making the post-season, and this is true no matter how good they were the year before.
New England’s magical post-salary-cap dominance is one of the most impressive organizational feats in sports history, and its persistence has major implications for how we assess the maturity of the NFL as a whole. Therefore, next Sunday’s Patriots-at-Vikings matchup is my most empirically significant: Another poor performance would have great implications for the Patriots, and the fate of New England has great implications for football analysis.
UPDATE (Sept. 15, 11:56 a.m.): The playoff probability charts in this story have been updated to include values for teams’ end-of-season records. Originally, because of an oversight, they only covered the first 15 games.
Charts by Reuben Fischer-Baum.