Parameters Undefined: Crowdsourcing the Best MLB Foul Ball Seats


 

ideal seat mlbThere are 9540 posts tagged #foulball on Instagram.

Make that 9541.

Crowdsourcing fan engagement is rapidly on the rise in major league sports. And the data collection possibilities are nearly infinite.

But for every big data solution created, there are just as many big data hurdles.

Or at least this is what a small group of baseball stats geeks quickly learned as they attempted to gather information on the only statistic not tracked in MLB – foul ball locations.

What is crowdsourcing exactly?

Crowdsourcing, open source, gamification, social media, big data. We hear these terms with regularity and use them even more regularly.

But to understand how sports technology can make use of crowdsourcing to collect data, it is important to know where it all started. To Wikipedia we go…

Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers.

Or in other words, getting fans to do work for you.

Jeff Howe and Mark Robinson, editors at Wired Magazine, coined the term “crowdsourcing” in 2005. What was happening was like “outsourcing to the crowd,” which quickly led to the portmanteau “crowdsourcing.”   

This is not to be confused with open source, which refers to a strategy that allows the crowd to access code or data for developing a product or predicting an outcome. This difference is the topic of a very insightful article by Simon Phipps.

Crowdsourcing+Fans = Foul Ball Data

Three years ago, a small group of baseball fans with statistics backgrounds, and minimal technology skills, watched as a fan caught a foul ball at Safeco Field.

Soaked by the sun during a rare rainless day and fueled by the Pacific Northwest’s finest microbrews, the group wondered “can we use statistics to find the seat with the highest probability to sit in the sun, drink a cold beverage, and catch a foul ball.”

Also know as the “Ferris Bueller.”

The group, in search of the “ideal seat,” reached out to MLB and was informed that no database of foul ball locations existed. Anywhere. In a sport where every statistic is tracked, analyzed, cross-analyzed, and over-analyzed, foul balls are a statistical anomaly.

After some initial research, the IdealSeat team approximated that slightly over 30 foul balls a game entered the stands for a total of 100,000 foul balls in any given year. With an increasing number of fans on mobile devices at the ballpark, a strategy of crowdsourcing through gamification developed into the number one option.

Many fans have suggested, “why can’t one simply use basic knowledge of batted ball patterns to put themselves in the best foul ball location?” Thousands of ball “hawks” chase fouls without the benefit of data each year. One ball hawk, Zack Hample, has caught 7,000 balls with only knowledge gained from watching the game.

It is true that there are generally four main foul ball areas in any given game. The first is straight back with a slight angle for lefty and righty batters. The second is down the line just outside of first base and third base. Lefty batters generally foul balls to the third base side and righties to the opposite side of the diamond.

But what one is unable to conclude from the above is where specific hitter-batter match-ups might result in a foul ball. IdealSeat Advisor Sam Fuld has a theory that certain hitters can foul balls off on key counts to extend the at-bat. Think Ichiro and Derek Jeter. And the locations of their foul balls are likely also predictable.

With a goal to collect a full season at Safeco Field the group set into action. But they wondered, “Would the quality of the information rank high enough for MLB standards?” And, “how much data would a crowdsourced system generate?”

As it turned out, the answers were less favorable then anticipated.

Modern crowdsource applications

Crowdsourcing it turns out is nothing new. The first recorded instance of collecting information through the crowd was with the Oxford English Dictionary, which solicited definitions from its readers in the mid-19th century.

Wikipedia followed the tradition of crowdsourcing information in 2001 with the launch of their now ubiquitous website which now hosts 30 million user generated pages in 287 languages.

Crowdfunding smartly combined the concepts of crowdsourcing and finance starting in 2006 with the biggest standouts KickStarter and Indiegogo. The industry is estimated to be worth $5.1 billion as of 2013. Not too shabby.

But Instagram and Twitter take the crowdsource title.

Instagram, founded in 2010, has 150 million users. Twitter now has over 500 million users who post 340 million tweets a day. Within both platforms lies a powerful tool for crowdsourcing data collection and mining for random or not-so-random information.

Interestingly, in sports initiatives using dedicated crowdsourcing techniques are less frequent than one might think. Outside of using social media to tag and mine data, not many applications rise to the cream of the crop.

But a few efforts have taken the crowdsource approach to innovate and engage.

MLB has done some great work engaging the crowd to collect information and create products. During the 2013 Bases Coded competition, MLB offered, “participating teams the opportunity to hack at the convergence of sports and technology while utilizing MLBAM’s private data API.” Through this process developers gained insight into the sports-tech machine that is BAM, while the tech giant collected a range of innovative gaming concepts.

The NFL is not far behind in the crowdsource game. During Super Bowl XLVIII, the league crowdsourced voting between the Broncos and Seahawks fans to determine the color of the empire state building color. Aptly named “crowdvoting” taps into public opinion much like a traditional vote, and then uses the information to create or design something unique. New stadiums in San Francisco and Atlanta promise to have technology and crowdsourcing incorporated at ever step of the experience.

On the court the NBA works social media to its advantage, having one of the most dynamic presences on Twitter. Basketball’s most impressive crowdsource campaign to date is Dallas Mavericks’ owner Mark Cuban’s competition to have fans design a new uniform for the team, which yielded some very innovative results.

In alternative sports, ASP Surfing has led a quiet revolution on video, data, and crowdsourced fan engagement. During webcasts of contest from around the globe, fans are asked to participate and interact to provide information about their personal experience. Surfing is a very “social” sport and the ASP has clearly captured the crowd.

Even groups like the Surfrider Foundation, a global organization representing surfers and the coast, have tapped into the crowd.  Like the leaders of the major sports league, Surfrider CEO Jim Moriarty says the key to crowdsourcing is to “track engagement levels across this network and direct our resources towards helping people engage better, faster and easier. The internet is a distributed network, that architecture matches distributed organizations like Surfrider’s very well.”

Better sports data and a better planet. Awesome. And there are some pretty geeky data points that can be collected.

Parameters well defined

The initial challenge of tracking foul ball locations appeared easy enough. Unfortunately the quantity and quality did not generate itself.  Instead, the IdealSeat team asked a new question – “what amount of data is needed to solve this statistical challenge?

“And how can we guarantee the quality of the information?”

The answer to both came in one form. Research teams.

IdealSeat at game
Tracking foul balls at Safeco Field

Over the course of two years the group developed dedicated research teams at six MLB stadiums. Each member attending a moderate amount of games, generally 5-10, but collecting large amounts of data per game. Approximately 100 balls per game, both fair and foul, are tracked.

A “modified crowdsource” model significantly increases the amount of plays recorded and quality of information collected.  This strategy leads to more precise data due to higher user frequency and a known number of games per year.  Providing confidence in the data to the point where the system can accurately advise fans on ticket purchases.

As of mid-season 2014, the group collected nearly 7,500 data points using the modified crowdsource model, with a goal to achieve 10,000 total at 10 stadiums by year-end. Beyond this season, the scalability is nearly infinite throughout baseball and beyond with potential to reach 100,000 data points per year on batted balls, weather conditions, concessions locations, and fan experience feedback.

Initial results have shown definite trends in foul ball locations beyond the classic four zones that ball hawks have relied upon for years. Trends at the park level are emerging and it is also clear that fans need not purchase a premium ticket to have a high probability of bringing home a souvenir.

Once the parameters are well defined, crowdsourcing data collection is a powerful tool for sports technology. Understanding the scale of the problem, the amount of data needed to reach confidence, and a focused collection strategy makes all the difference.

Even for finding the ideal seat to sit in the sun, drink a cold beverage, and catch a foul ball.