How Baseball’s New Data Is Changing Sabermetrics


Every March since 2012, sabermetricians have gathered in Phoenix for their own version of spring training: the SABR Analytics meeting, which serves as a showcase for some of the latest developments in baseball analysis.

I made the pilgrimage to the desert for this year’s edition, expecting the modern baseball research conference’s usual emphasis: how to communicate sabermetric insights to coaches, players and executives — a worthy (if not groundbreaking) endeavor. However, the conference brought sabermetrics back to its roots in the data instead. And the game’s newest methods of collecting information, including such diverse offerings as the radar tracking system Statcast and neurological monitoring, have the potential to upend a number of sabermetric truths we once thought settled.

Judging from the long shadow it cast at the conference, Statcast might be the chief disruptor of Sabermetrics 1.0. Daren Willman of MLB Advanced Media and Mike Petriello of MLB.com demonstrated the power of the system to monitor factors as varied as the spin on Jake Arrieta’s curveball and Kevin Kiermaier’s defensive range. Sabermetric evaluations of defense, in particular, may benefit greatly from Statcast, as analysts will be able to more precisely measure all aspects of defensive play — from a fielder’s first step to his maximum range and the velocity of his throws.

For example, during the first wave of sabermetric defensive measurements, shifts especially confounded our ability to home in on a player’s true fielding skill. Statcast addresses that flaw by measuring the positioning of every fielder before each pitch, and when that information eventually becomes public11 it will undoubtedly reshape our defensive metrics.

Baseball Info Solutions analyst Scott Spratt offered one potential transformation at SABR, with a new model for integrating shifts into fielding stats such as Defensive Runs Saved. In situations where several fielders could make a play on the ball because of an extreme shift — which places several players on one side of the field, making it difficult for current metrics to apportion individual credit for a defensive run saved — the model gives some of the credit to the whole team. According to Spratt, the Tampa Bay Rays (not surprisingly, one of the most sabermetrically savvy teams in baseball) led the league in these separate, team-based runs saved on shifts last year.

Another of the conference’s talks dug deeper into Statcast’s exit velocity information. Most notably, Brian Cartwright, creator of the Oliver projections at FanGraphs, discussed how exit velocity alters our view of defense-independent pitching statistics. DIPS theory is one of sabermetrics’ most treasured counterintuitive insights — the idea that pitchers bear no responsibility for the results of balls in play — but Cartwright showed that a ball’s velocity off the bat is partly attributable to the pitcher (even if the batter deserves more of the credit). He also broke down exit velocity by angle and explained that even fly balls allowed by ground-ball pitchers travel at a lower angle, making them more difficult for the defense to field. For instance, Andrew McCutchen and his fellow Pirate outfielders were notably harmed by their ground-ball pitching staff’s tendency to allow these low screamers.

Most analyses of exit velocity so far have concluded that, contrary to DIPS, pitchers do vary some in their ability to prevent hits. So even if, generally speaking, a pitcher’s fielding-independent metrics are more predictive than his ERA, Cartwright’s results suggest that pitchers still deserve some credit in a given year for the batting average they allow on balls in play. As we come to better understand the granular data from Statcast, it’s possible that popular DIPS metrics such as fielding independent pitching will become outmoded.

New data could also revamp our understanding of player health. Injuries are one of the last remaining unknown areas in sabermetrics, partially because they are not tracked in the box score or other sources of data. (Although the disabled list gives some injury information, it’s nowhere near complete, as many major leaguers play through pain and discomfort.) But Baseball Info Solutions began tracking injury data in 2015, with stringers manually rating every incident in which a player limped on the bases or was struck by a foul ball.

Unsurprisingly, Joe Rosales of Baseball Info Solutions reported at SABR that catchers suffer by far the largest injury burden, thanks primarily to foul balls and backswings from the batter. Rosales also showed that catchers coming off games with multiple injuries to the head see a reduction in offensive performance for the next few days.12 That injuries have an impact on performance isn’t shocking, but gathering the data to prove it is a big step forward.

Finally, there’s another source of data even more exotic than exit velocity and defensive positioning. A company called deCervo specializes in monitoring the brain activity of athletes as they perform tasks such as pitch recognition. DeCervo’s software simulates the flight of a pitch and asks users to decide whether it will be in or outside of the strike zone. In a separate game, users can practice their pitch recognition by identifying the pitch type based on its motion. Using a combination of techniques,13 deCervo CEO Jason Sherwin showed that certain areas of the brain light up as athletes monitor the flight of the “pitch” and make the split-second decision to hit a button to react.

Sherwin had preliminary results that showed correlations between neurological readouts and performance (for example, on-base percentage), so deCervo’s technology could be promising for identifying athletes with major league potential. And even without any neural monitoring, it allows athletes to “gamify” their training by attempting to distinguish the motion path of different pitches at varying speeds and arm angles based on real PitchF/X data. Sherwin said he believed this kind of software would offer a new way for athletes to sharpen their pitch recognition skills.

What many of these new data sources have in common is an emphasis on process. Outcomes — strikes, walks, home runs and so forth — are already well-tracked and have been scrutinized by sabermetricians for decades. But the new generation of data will allow analysts to understand how those outcomes are generated, perhaps even down to the level of a player’s brain activity. Some of this process-oriented data challenges cherished analytics theories like DIPS; some of it confirms the utility of sabermetric dogma like shifting. And some of it will probably advance our understanding of baseball in ways we can’t yet predict.

Disclosure: The author works as a statistical consultant for the Toronto Blue Jays.