How these 4 sports are using Data Science

Richard Benjamins    15 February, 2017
Thousands of companies around the world may have started their journey to become data-driven, harnessing the full potential of Big Data, however, the world of professional sports is only just starting to explore this world of applying Data Science to gain a competitive advantage. 

Until now, sports coaches have been able to boast about their experience or their gut feelings when making decisions and have therefore been somewhat resistant to the world of Big Data – something which we all saw so perfectly illustrated in Moneyball where Brad Pitt shows the tension between human experience and data-driven.

However, things are changing – and slowly but surely we’re starting to see a lot more research around the role of data in sport as well as an increasing number of jobs working directly with professional sports teams to enhance their performance. But which sports are leading the way? We took a look:

1. Formula 1

As our CDO, Chema Alonso, mentioned the other day in a talk with the Movistar cycling team, Formula 1 teams are pioneers when it comes to data-driven decisions. With every race generating huge amounts of data, on the track, vehicles, conditions and drivers – Williams saw a unique opportunity. 

They optimized team pits-stops by taking bio-metric measurements from the technical team allowing them to understand when each team member functions optimally. Eventually, they ended up reducing their pitstop time to 1.92 seconds – the fastest ever recorded.

Formula 1
Figure 1: Formula 1 team

2. Football

Some years ago, we obtained some data from the Spanish football league for the 2012-2013 season, allowing our Data Scientists to carry out an in-depth analysis. The data was generated by cameras that take up to 10 photos per second, and are post-processed so that individual players can be identified. In the figures below you can see heatmaps of Barcelona vs Atletico Madrid. The area represents the field, and the goal of the team is located in the pointed parts with the darkest colours. The darker the color, the longer the players are at a certain location.  It becomes immediately clear that Barcelona were more of an attacking team throughout that season, unlike Atletico who tended to have a more defensive approach.

Football stadiums
Figure 2: Barcelona’s pitch activity (left) vs Atletico Madrid’s pitch activity (right).

It was also possible to follow individual players, and in the images below we can see the paths of two players throughout a match. The green points show that the player ran at approximately 5 m/s (the equivalent of running 100m in 20 seconds) and red points at approximately 7 m/s.  It is clear that the first players runs much more than the second, but what does that mean? That the first player is better than the second? That they have different roles? Looking at only this data, if you were the trainer, which player would you prefer to buy?

The "work rate" figure
Figure 3: The “work rate” of Xavi Hernandez (left) vs Leo Messi (right).

Well, the first player is midfielder Xavi Hernandez, and the second player is Leo Messi, who doesn’t need any further introduction.  

3. Cycling

More recently, we had the opportunity to analyze data from the 2016 “Vuelta a España“, looking at Movistar Team’s performance. We had access to the data of 8 cyclists from the team throughout the 21 stages from start to finish. Every second, 7 types of data of each cyclist are captured resulting in more than 2 million data feeds. The variables captured, include location, altitude, force, speed, heart rate and pedal rate.

    Movistar Team

    With this data, apart from analyzing individual cyclists, it becomes possible to analyze how the team works together, and to understand and compare different stages. Looking at the data, it becomes very evident how professional cycling is a team sport with differentiated roles for the different team members: today it is impossible to win one of the main competitions “flying solo”. What we have learned is that it is important to:

    • Understand when team members peak in terms of performance so that training can be planned for peaks to coincide with competitions.
    • Determine the context variables (altitude, weather), the training variables and the personal cyclist variables which impact most in the cyclist’s performance and subjective experience. 
    • Combine the roles that cyclists play in the different stages with performance and fatigue variables to plan the recovery of the cyclists and the next stages during the competition.

    4. Cricket

    Cricket, which is the most popular sport in India, and the second most popular sport in the world is also embracing the growing value of Big Data. IBM launched their #ScorewithData campaign during the Cricket World Cup which included a Social Sentiment Index which predicted correctly who would win certain phases of the tournament.

    The England Cricket team have also been pioneers and their ex-team coach, Peter Moores, even said “we use advanced data analytics as the sole basis for some of our decisions – even affecting who we select for the team.” Nathan Leamon, who was hired by the new head coach for his expertise in maths and statistics, also used to create spreadsheets using Hawk-Eye technology to run match simulations which ended up being accurate to within 5% – breaking the field up into different segments for players to target when batting.

    Cricket match
    Figure 5: Big Data in the world of cricket.
    As you can see, Big Data and Data Science aren’t just limited to the world of big business – they are in fact affecting every single part of our lives.  In the context of sport, the most successful will embrace data on and off the field if they want to fill up their trophy cabinets any time soon.


    Leave a Reply

    Your email address will not be published. Required fields are marked *