Your Guide To Deflate-gate/Ballghazi-Related Statistical Analyses


The controversy surrounding “Ballghazi” or “Deflate-gate” (depending on your politics, I guess) has only intensified. Press conferences were given. Locker room attendants’ bathroom visits were analyzed. The Columbia University physics department was called in. And, in a series of posts that made the media rounds late last week, the Patriots’ ability to avoid fumbling was declared “nearly impossible” (without, say, the systematic deflation of footballs).

The gist of the posts, written by a tout named Warren Sharp, is that the Patriots are a phenomenal statistical outlier when it comes to hanging onto the ball. Sharp presents a chart showing how far New England has stood above its peers in “offensive plays per (lost) fumble” over the last five seasons, giving the odds against such a performance happening by chance as 1 in about 16,234. He also notes that, over the same span of years, the Patriots have fumbled (whether the ball was lost or recovered) far less often than their peers, after excluding dome-based teams from the comparison. And finally, he notes that individual members of the Patriots appear to fumble far less with New England than when they play for other franchises.

The data science community responded with a number of rebuttals (I put together a roundup of my favorite ones below). Collectively, these posts did a great job of breaking down the Statistics 101 problems with Sharp’s original analyses. But even if Sharp had been less sloppy, it would have been right to take issue with the larger implication of his work — that any major outlier, if shown to be statistically significant, should be seen as evidence of rule-breaking.

Barry Bonds and Lance Armstrong were outliers. But so is Lionel Messi. And Phil Jackson. And the San Antonio Spurs. It would be irresponsible — and depressing — to assume every incredible performance equals cheating. Celebrating outliers is one of the best parts about being a sports fan.

I’d be remiss if I didn’t note that, in cases such as these, our traditional methods of determining statistical significance can severely underestimate the odds of something happening due to chance. That’s because of the so-called Wyatt Earp Effect, named after the frontier lawman known for taking part in lots of gunfights without getting hurt. Earp’s feat seems improbable in hindsight, but given the sheer number of shootouts in the Old West, it was actually pretty likely that somebody would make it out alive.

Likewise, it’s difficult to estimate the true odds against a team preventing fumbles to the extent Sharp originally suggested New England did. Knowing particulars about the Patriots after the fact can bias us into computing the odds that a specific team would have a specific fumble record over a specific period of years. But the real question regarding New England’s outlier-ness should surround the odds that any team would post any outlier statistic over any span of seasons. And the probability of that happening, as you may imagine, is a lot higher than the odds of a very specific set of circumstances.

Here were some of the responses to Sharp’s posts:

  • At Deadspin, statistics professors Gregory J. Matthews and (friend of FiveThirtyEight) Michael Lopez wrote a great, FireJoeMorgan-style, line-by-line takedown of Sharp’s most popular post. They refuted the 1-in-16,234 number (by Sharp’s own methodology it should be more like 1-in-297) and pointed out a massive data error in Sharp’s analysis of individual players (he mixed together some data that included special teams plays and some that excluded them). Matthews and Lopez also broke down team fumble rates by position, after which New England’s running backs and receivers don’t really look like major fumble-preventing outliers at all.
  • SoSH Football Central’s Daryl Sng broke down Sharp’s aforementioned data errors in even greater detail. After excluding kick and punt returns (which make no sense to include because teams don’t have any access to “K balls”) and correcting for Sharp’s original mishmash of regular-season and playoff data, the players in Sharp’s sample fumbled only about 23 percent more as non-Patriots, not 88 percent as was originally stated.
  • Political scholar Bill Herman also zeroed in on Sharp’s analysis of individual players’ fumble rates with the Patriots and other teams, identifying its aforementioned methodological errors. In addition, he looked at the six players featured in Michael Salfino’s Wall Street Journal article based on Sharp’s work, finding that their difference in fumbling was statistically significant. Of those six players (Danny Amendola, BenJarvus Green-Ellis, Danny Woodhead, Wes Welker, Brandon LaFell and LeGarrette Blount), four were common to Sng’s dataset, but both analyses found a 23 percent increase in fumbling while playing for teams other than New England.
  • Like Matthews and Lopez, data analyst Tom Hayden repudiated Sharp’s assumption that his “plays per fumble” metric was normally distributed across NFL teams (a necessary condition for the 1-in-16,234 claim).
  • The harshest counterargument belonged to data scientist Drew Fustin. Fustin challenged Sharp’s choice to exclude dome teams (Sharp’s own post says outdoor teams barely fumble more often than those based in domes), instead looking at fumble rates across all teams in outdoor games only — whereupon the Patriots don’t even rank first in the NFL at fumble avoidance over the 2010-2014 period. He also questions whether Sharp’s decision to use that 2010-14 period was a case of cherry-picking the timeframe that would make the Patriots look most like an extreme outlier.