For his day job, James Curley, 36, is an assistant professor of psychology at Columbia University, working on the neuroendocrinological basis of social behavior. But in his downtime he tries to answer different kinds of mysteries: What was the first soccer game he ever went to? (He remembered the vague details, but not the specifics.) How often have his two favorite teams played each other over the last century? And is soccer really as dull as some people say?
The answers to those questions were surprisingly difficult to find. But Curley used the same approach he uses in his academic career: data, lots and lots of data. By cobbling together game results from several different sources, he has compiled what is almost certainly the world’s biggest compendium of English football scores. Sitting on his GitHub page, devoid of any fanfare whatsoever, are the scores of nearly 200,000 English soccer games played in the top four leagues since 1888, the days of Jack the Ripper and Queen Victoria. These 14 megabytes can tell remarkable stories, dating back more than 125 years to the founding of the English football league.
Take the most common final score, for example. In 188,060 league games, the final tally was most often 1-0, proof, for Curley, that soccer was as low-scoring as he suspected. This result has occurred in more than 30,000 games — 16 percent of the total. Other common scores: 2-1 (about 27,000 games), 2-0 (about 22,000) and 1-1 (about 22,000).
In 85,694 games — dangerously close to half the total — at least one of the teams forgot to score at all. That led Curley to an answer for one of his questions: “Soccer is a bit dull,” he told me.
Here is the distribution of home and away teams’ goal-scoring throughout history:
Scores are likely to be low. In more than 85 percent of all games, neither team scored more than three goals.
Those low scores help lead to thousands of draws — 47,412 since the foundation of the league system, to be exact. That’s more than a quarter of all games. And 7 percent of games overall have ended with no one scoring, and no one winning — there have been 13,475 nil-nil draws.
In another testament to the sport’s “dullness,” draws have become more common over football’s long history. (Last season, 27 percent of games ended without a winner. ) This chart shows the prevalence of drawn games:
In 1890, just 12 percent of games were drawn, and in 1977, 626 games out of 2,028, or 31 percent, were draws. While this number is down slightly today, we’re near the historical high.
That’s partly because of a decrease in scoring generally. As English soccer has wound its way through the decades, its scoring has withered. Here are the historical averages of goals scored per game, by league level:
The average number of goals per game has at times wildly fluctuated, particularly with the sudden spikes and subsequent declines in the two postwar eras. In 1925, FIFA amended the offside rule. Prior to the change, three players had to be between an attacker and the goal when the ball was passed to him. The new rule changed this to two players (typically a defender and the goalkeeper), giving more leeway to attackers, and led to a dramatic, instant increase in scoring.
The reasons for the other big shifts are less clear. Rule changes — the kinds of things that would usually explain variation in goals — are quite rare in soccer’s history.
In 1958, substitutions were allowed for the first time, but only for an injured player. This roughly corresponds with the beginning of a steep decline in scoring in the 1960s. This could make for a plausible causal explanation: Perhaps playing with an injured player left teams extremely vulnerable on defense, leading to many goals. The addition of the substitute may have mitigated these effects.
Other rules changes — the introduction of red and yellow cards in 1970, another tweak to the offside rule in 1990, banning goalkeepers’ handling of back-passes in 1992 — don’t seem to correlate with any major changes in scoring. In particular, the decline after 1930, and the rise after 1950, aren’t well explained. Some of these changes may be due to the evolution of football tactics, something that is laid out, for example, in Jonathan Wilson’s “Inverting the Pyramid.” In the early days, soccer featured a large number of forwards, but tactical changes led to a larger number of defensive and midfield players. The shifts in the game, and in the game theory of its tactics, may well have led to shifts in overall scoring.
In 1981 there was a rule change of another type: To calculate standings, teams were given three points for a win and one point for a draw. Before 1981, only two points were awarded for a win. This change gave teams less incentive, generally, to settle for a draw. This could have led to more aggressive play, and more goals. However, the effect may not operate in just one direction. Once a team does score, that team has all the more incentive to shut the game down and hold out to win having scored just a goal. 1981 did indeed see a small jump in goals, and goal-scoring was elevated for a few years after.
However, the change was not large and has not persisted. Goal-scoring seems to have reached something of an equilibrium in the past 30 years or so, corresponding with some of the lowest levels of scoring of the past 125 years.
Curley’s reticent about how long his mammoth database took to put together. “I’m not sure I want to tell you, actually,” he joked, “Because then my wife would find out.”
Curley is also quick to add that the data did exist elsewhere — although it’s typically scattered, proprietary, or hard to access. He assembled it from the webpages of the Rec.Sports.Soccer Statistics Foundation, from other compilers and GitHub users, from ESPN’s own database, and elsewhere, and made it freely available.
“Because I believe in open access to data — I’m a strong advocate of that in science — I just generally have a view that if data is out there, and as long as it’s not owned by someone, then it’s good to have it out in the public,” he said. “I knew there were people who would enjoy it, so I thought, ‘Well, why not give it to them?’”
Curley’s academic work and soccer work overlap. Much of his academic work, for example, is concerned with pairwise contest models — contests where two entities compete at a time — and social hierarchies. These issues are often tackled with formulae like the Elo system, which calculates soccer rankings. The parallel to his soccer hobby is obvious. Soccer games, after all, are pairwise contests.
Other psychological concepts infused our discussion of soccer. Unlike most fans of English football, Curley roots for two teams. Aston Villa is nature — Curley’s father, and his father’s father, back 100 years, were season ticket-holders — York City is nurture. Conveniently, using the data set of his own creation, he can chart his two teams’ shared history, answering one of his questions. (“Fortunately, they’ve barely ever even crossed paths. So I’ve never had to choose.”)
And like the academic he is, he’s performed a sort of peer review of other sources’ soccer data. Case in point, on Nov. 26, 1983, Doncaster Rovers played Chester to a 0-0 draw, in a fourth-tier match. This game is unknown even to ESPN’s database. But not to James Curley’s.
“An appropriately completely dull game,” he said. And just one of 188,060.
CORRECTION (Oct. 4, 5:30 p.m.): A footnote in an earlier version of this story misstated the number of teams in the top four tiers of the English football league system; there are 92 such teams, not 94.