What Analytics Can Teach Us About the Beautiful Game

Jun 12, 2014

Sports analytics, no matter the field’s renegade posturing, has now been around long enough to have its own pieces of conventional wisdom. Baseball’s cognoscenti know all about the primacy of on-base percentage over batting average, and they’ve also come to realize once-treasured strategies like bunting and stealing bases are best used sparingly. In basketball, the mid-range jump shot is slowly being phased out as an inefficient relic of antiquity. Spreadsheets are shaming football coaches into rolling the dice more often on fourth downs.

But for many American fans tuning into the World Cup this month and next, soccer’s nuggets of analytic insight remain as foreign as the game itself. There are set pieces to orchestrate, attacking strategies to plan, areas of the defense to exploit — and it isn’t always apparent which tactics are best. But analytics has clear advice on how to do some things right.

Soccer analytics is very much viewed as a discipline in its infancy. And the sport itself is often described as especially resistant to the pull of number-crunching, whether due to its fluid nature, its sportocratic establishment culture, or a fear that the unsentimentality of data will rob the Beautiful Game of its celebrated elegance.

There’s not much truth to that. Off and on, people have been tracking relatively detailed soccer data in some form for more than six decades, up to and including the modern companies that exhaustively log every event on the pitch.

That said, WAR isn’t coming to soccer anytime soon. Most attempts to create an all-in-one statistical index for soccer players (like we have for basketball and baseball) have suffered from a distinct lack of transparency⁹ and a noticeable bias toward strikers and other scorers, whose output is most readily quantifiable. There are a number of interesting metrics at fans’ disposal, but no magic algorithm that accounts for a player’s role on his club, the system he plays in, the quality of his teammates and countless other factors. By necessity, even the individual plus/minus ratings ESPN uses for the talent portion of our Soccer Power Index fall prey to this phenomenon — we simply have to be more conservative when assessing the impact of a fullback than of a prolific goal-scorer. That makes it hard to distinguish between the value of, say, Manchester United teammates Wayne Rooney and Nemanja Vidić.

At the team level, though, the numbers offer more hope. They have the potential to provide soccer with broad strategic conventions comparable to the sabermetric-minded rules of thumb in other sports. None of these is a hard-and-fast decree, but they offer guidelines generated by actual data instead of blind hunches.

In “The Numbers Game” by Chris Anderson and David Sally — probably the definitive volume on statistical analysis in soccer — the authors tell the story of Charles Reep, a former Royal Air Force Wing Commander who was tracking play-by-play data for matches and serving as a quantitative consultant for Football League teams as early as the 1950s.

Reep’s research was quite groundbreaking for its time, even if it was fatally flawed. The Wing Commander gathered data on how often a given number of successful passes were strung together, and how frequently goals resulted from those sequences, broken down by length. Reep determined that a team’s probability of retaining possession dropped precipitously with each consecutive pass attempt, and that most goals were scored on possessions of fewer than three passes — often originating from quick counterattacks.

In Reep’s mind, this meant teams should abandon trying to control possession and maneuvering through the defense with endless passing. Instead, they should focus on getting the ball downfield in as few movements as possible on offense, and applying pressure on defense to generate opportunistic counter-rushes. The numbers seemed to suggest that the long game was the most efficient tactic for soccer success.

But subsequent analysis has discredited this way of thinking. Reep’s mistake was to fixate on the percentage of goals generated by passing sequences of various lengths. Instead, he should have flipped things around, focusing on the probability that a given sequence would produce a goal. Yes, a large proportion of goals are generated on short possessions, but soccer is also fundamentally a game of short possessions and frequent turnovers. If you account for how often each sequence-length occurs during the flow of play, of course more goals are going to come off of smaller sequences — after all, they’re easily the most common type of sequence. But that doesn’t mean a small sequence has a higher probability of leading to a goal.

To the contrary, a team’s probability of scoring goes up as it strings together more successful passes. The implication of this statistical about-face is that maintaining possession is important in soccer.¹⁰ There’s a good relationship¹¹ between a team’s time spent in control of the ball and its ability to generate shots on target, which in turn is hugely predictive of a team’s scoring rate and, consequently, its placement in the league table. While there’s less rhyme or reason to the rate at which teams convert those scoring chances into goals, modern analysis has ascertained that possession plays a big role in creating offensive opportunities, and that effective short passing — fueled largely by having pass targets move to soft spots in the defense before ever receiving the ball — is strongly associated with building and maintaining possession.

As for the long ball, it’s proven futile in today’s game. During the 2013-14 English Premier League season, the percentage of a team’s passes classified as “long” by Whoscored.com’s data was very negatively correlated¹² with how many goals it scored.¹³

The same goes for trying to spearhead an offense from the wings instead of attacking up the middle. In their book, Anderson and Sally write about a seminal piece of quantitative analysis on the 1986 World Cup from researcher Mike Hughes: “Successful teams played a passing game through the middle in their own half and approached the other end of the pitch predominantly in the central areas of the field, while the unsuccessful teams played significantly more to the wings.” The numbers from the 2013-14 season in Europe’s “Big Four” leagues¹⁴ bear this out as well. The percentage of a team’s attacks made up the middle did have a moderately positive¹⁵ relationship to its scoring rate relative to the league average, while the relationship between wing attacks and scoring was of the same magnitude and in the negative direction.

This, coupled with the fact that corner kicks are surprisingly ineffective at generating goals, is probably related to the negative correlation between a team’s propensity for winning aerial duels¹⁶ and its overall goal-scoring rate. By the numbers, it’s a losing bet to count on goals in the air via set pieces — or even off crosses in open play — as a steady way to generate offense, just as it is to rely on the long ball to consistently produce chances. Instead, the statistics seem to support an approach more in line with the artful tiki-taka style exemplified most notably by FC Barcelona and the Spanish national team. In soccer, data and aesthetics are not mutually exclusive, just as they aren’t in any other sport.

That’s the one bit of analytics wisdom that could stand to become more conventional. For now, though, we have a reasonably good idea of which metrics correlate with a team’s success more than others. Keep those in mind as you gorge on soccer over the next month.

Tags: Soccer Analytics World Cup