Major League Soccer’s Philadelphia Union (MLS) Ingests API Data Into Delta Lake for Real Time Sport Analytics

Photo Courtesy of Philadelphia Union


Innovations and advancements in technology have provided a way to gather real-time precise data points during any sporting event. The result is better insights into hundreds if not thousands of variables that might give a team a strategic advantage on the field and in the team’s back office as it relates to player performance, match strategy, and putting together a winning season. 


Since 2018, the Philadelphia Union have accumulated the most points in MLS over six seasons, including its Supporters’ Shield winning campaign in 2020. The Union recognizes the power of data and the insights that can be derived from it and is finding even more innovative ways to feed its data appetite. Accessible through API’s, Philadelphia Union brings in data through three (3) different data sources: Sportec Solutions (STS), Stats Perform Provision, and Catapult to collect, analyze, and generate insights.


The challenge for Philadelphia Union was that with access to all of that data, from three different and disparate data sources, each with their own purpose they had to figure out a way to import, clean and transform the data, move it across different departments, and then use different programming languages to generate analytics.


Each of the three data sources, all available via API, provide different data for different purposes. 

AWS is the platform of choice by the League Governing body, and Philadelphia Union followed suit during their integration, leveraging S3.

Sportec Solutions (STS) – As a joint venture between Deltatre and DFL Group, headquartered in Munich, develops next-gen solutions in the fields of match data and sports technology. Founded in 2016, the collaboration between Deltatre – the global leader in fan-first experiences – and DFL Deutsche Fußball Liga was conceived as an innovation-focused internal lab, tasked with improving data deployment in German football.

PROVISION, the Stats Perform data platform, a data source with a global database covering 730,000 players across 250 competitions, provides professional teams with a rich dataset and innovative platforms that can reduce risk, help identify talent and inform decision-making throughout the recruitment process.

CATAPULT API provides access to data derived from their many wearable sensor products. The data is leveraged for player training to improve the health, wellbeing and fitness of the player. Sensor data has the ability to provide various data parameters such as the average velocity a player travels, the biometrics of a player, including heartbeat, and many more data points that can help teams optimize their players and mitigate the risk of injuries. Leveraging Catapult API, Lovelytics partnered with Philadelphia Union’s Sports Scientists to automate the ingestion of data for each of the four teams within the organization – Union, Union II, U17, U15. The data is automatically populated into a Power BI dashboard. This enables Performance Analysts to easily access a team’s training and game data from specific days, saving roughly 40 hours of work across the organization a week.


To eliminate the need for manual processes, Lovelytics built an automatic data ingestion pipeline solution to ingest data into Philadelphia Union’s instance of Databricks’ delta lake; saving them 60 hours of manual work a week. A Delta Lake works as a single source of truth for various departments and provides a repository of reliable, governed data. 

The automatic data ingestion pipeline solution fetches the required data from the API sources and updates the existing delta tables automatically, removing the need to manually refresh the tables to fetch the latest data. In addition, we built a set of predefined data quality control and cleaning methods for Philadelphia Union to ensure they are generating accurate and precise insights automatically.

This automated data ingestion pipeline can be run in less than 1 minute for each API source, saving significant and valuable man-hours and provides the updated insights instantaneously, 24/7. 

In addition to building and implementing the automated data ingestion pipeline solution, we provided the Philadelphia Union data and analytics team with a framework and training enabling them to ingest any additional API data sources in the future, without the need to bring us back in. 

Lovelytics also provided training to the developers, upskilling them in Databricks and the AWS platform, Big Data concepts, data pipeline creation, best-practices and how to integrate Tableau visual analytics with the data pipeline.

The robust AWS platform empowers Lovelytics to help customers govern and secure current data from existing pipelines, while allowing them flexibility to scale with new data pipelines.


“The Lovelytics’ team was hands-on through the entire process.  They made sure they understood our use cases to best build out our data pipelines, while training our team to manage and update our new data architecture as we build our analytics at scale” – Addison Hunsicker – Philadelphia Union


The automated data ingestion pipeline solution can be leveraged for any industry that needs to bring in data from multiple sources to understand their data like never before. Want help building scalable and reliable data pipelines with real-time insight generation utilizing Business Intelligence tools such as tableau and machine learning models? 

Contact [email protected] to get started or visit us at to learn more.