Statkeepers Call the Shots, But They Can’t Agree on Them


In the third minute of added extra time in Tuesday’s Belgium-U.S. World Cup match, Belgium’s Kevin De Bruyne took a pass in the box, dribbled to his right and hooked the ball into the left side of the net. Finally, after 31 shots, the Belgians had broken through. Or … wait. Was it 32 shots?

It depends on which Twitter account you follow. ESPN’s Stats & Information Group tweeted that Belgium had scored on its 31st shot of the day. OptaJoe, the U.K. Twitter account of the soccer stats company Opta, said it was the 32nd.

At the World Cup, shots are in the eye of the beholder. At least three major soccer stats companies are logging every match, and they have yet to all agree on each team’s number of shots and shots on goal. For every one of the 58 games so far, the companies can’t quite get their stories straight. Sometimes their counts have differed by as much as two or three.

Even small discrepancies like these have repercussions beyond mere trivia. Advanced analyses of the sport, such as my colleague Benjamin Morris’s magnum opus on Lionel Messi this week, rely on match loggers for shot counts and characteristics. Some teams base tactics and personnel decisions partly on stats. And the disputes are proxy battles for soccer’s more philosophical debates: If a shot is deflected in a forest of defenders, was it on target?

According to World Cup organizer FIFA, it was; but according to Opta and Prozone, two of the companies that employ analysts to log every match of the tournament and provide data for media coverage, it wasn’t. That disagreement is responsible for the bulk of the numbers mismatch. Through the round of 16, FIFA’s official match stats — which are being collected by the Italian company Deltatre — included 68 percent more shots on target than Prozone’s, and 74 percent more than Opta’s.1 Remove blocked shots, though, and the discrepancies drop to 4 percent and 8 percent, respectively.

And what about a ball crossed in the box near the goalie — does it count as a shot or a cross? In the 120th minute of the Belgium-U.S. match, DeAndre Yedlin kicked the ball well wide of goal as the U.S. hunted desperately for an equalizer. Was he trying to score, or just to cross the ball? FIFA thinks the latter, but Opta thinks the former. Short of interviewing every player immediately after every subjective touch, the statkeepers are left to guess at the intent, divining purpose in actions that may have been performed instinctively, rather than with premeditation.

With 58 of the tournament’s 64 matches in the books through Friday, there have been 116 opportunities to compare the three data providers on a team’s shooting profile in a match. There have been just 14 times, or fewer than one out of eight, that all three organizations counted the same number of shots and shots on goal for a team in a match — and none for both teams in the same match.

The counts appear to reflect genuine disagreement over tricky cases — touches that look like passes to some but shots to others, say. Or, a shot that hits the post or crossbar and goes out. Typically these don’t qualify as shots on target, but they can if they are deflected onto the woodwork by the goalkeeper, who then gets credit for a save. If they are blocked onto the woodwork by a player other than the goalkeeper, that’s a block. The stats, then, pivot on an arbitrary criterion: Was the player who deflected the ball a goalkeeper or did he happen to play another position?

My analysis showed that, overall, the companies weren’t consistently stingy or generous in their statkeeping. No provider consistently tallied many more shots or shots on goals than another. The major philosophical divide was over (unblocked) shots on goal: Deltatre sees more than Prozone, which sees more than Opta. But that amounted to only about one additional shot on target counted in every three matches.

bialik-stats-discrepancy-5

The disputes have touched every team, to similar degrees, but teams with less active offenses tend to have higher differences among statkeepers because one uncounted shot matters more in their overall percentages. These include the U.S., England and Cameroon. Analysts attempting to study whether Cameroon threw its matches, as Der Spiegel has reported, might get subtly different results depending on which set of stats they consult. So might England manager Roy Hodgson and U.S. manager Jurgen Klinsmann as they assess how to improve their teams.

bialik-stats-discrepancy

Discrepancies between data providers don’t stop at shot counts. Most soccer events are subjective. Someone must decide, was that a tackle? Was that shot weak? Was that attack a dangerous one? Possession stats also differ by provider, as Slate noted last week.

Shooting stats have particular relevance for one form of analysis that tries to divine a team’s true skill by gauging whether or not they’re getting lucky. It’s a technique that’s based on the theory that generating chances is the part that teams can control — converting them is based more on luck (unless you’re named Lionel Messi). Teams that convert and save a high percentage of their chances are due for a regression in their results. Change the underlying data, and any conclusions about which teams are good and which are just lucky could shift.

When I spoke with Garth Lagerwey, general manager of Major League Soccer’s Real Salt Lake, in a telephone interview last week, he said data discrepancies are a prevalent problem in soccer stats at all levels, not just a World Cup anomaly. When I contacted the companies, they declined to comment or didn’t respond to a question about why their numbers differ. In other contexts, they tout the training they provide to match analysts; the consistent guidelines they enforce across analysts, competitions and time; and the oversight of experienced checkers. Some shots just might not look like shots to everyone.

Other sports’ stats also require subjective judgment: errors in baseball; assists in basketball. But in baseball and basketball, the official scorer’s decision is what goes into the record book and, generally, what fuels advanced statistical analysis. In soccer, with different leagues and competitions worldwide at varying levels of stats sophistication, third parties with standardized methods report alternative numbers to the official ones. Opta and Prozone are scoring every match alongside the official scorers and releasing their numbers in real time to media organizations — hence the potential for conflicting tweets like those about Tuesday’s Belgium-U.S. match.2

“Shots should not be that subjective, let alone shots on goal,” Lagerwey said. On the other hand, “A lot of companies use human beings to code this stuff. It’s easy to understand how you’re going to have an error rate.”