Major League Soccer’s Philadelphia Union (MLS) Ingests API Data Into Delta Lake for Real Time Sport Analytics

Photo Courtesy of Philadephia Union

Innovations In Analysis

Innovations and advancements in technology have provided a way to gather real-time precise data points during any sporting event.  The result is better insights into hundreds if not thousands of variables that might give a team a strategic advantage on the field and in the team’s back office as it relates to player performance, match strategy and putting together a winning season. 

Philadelphia Union Is Hungry For More Data To Drive Insights

Since 2019, the Philadelphia Union have accumulated the most points in MLS over three (3) seasons, including its Supporters’ Shield winning campaign in 2020.  The Union recognizes the power of data and the insights that can be derived from it and is finding even more innovative ways to feed its data appetite.  Accessible through API’s, Philadelphia Union brings in data through three (3) different data sources:   Second Spectrum, Stats Perform Provision, and Catapult to collect, analyze, and generate insights.

A Goal For Data

The challenge for Philadelphia Union was that with access to all of that data, from three different and disparate data sources, each with their own purpose they had to figure out a way to import, clean and transform the data, move it across different departments, and then use different programming languages to generate analytics. When performed manually, this process took the data and analytics team around 6-9 person-hours to deliver a final dashboard. 

Different Data Sources Different Data Purposes

Each of the three data sources, all available via API, provide different data for different purposes. 

Second Spectrum, is the official tracking provider for both the Premier League, and MLS, and provides an unparalleled breakdown of every game.  The Second Spectrum repository provides rich data about game stats, player tracking, and other information such as the number of shots on goal by a player, player position while taking the shots, and various other data points which can be used to create a match report. 

ProVision, the Stats Perform data platform, a data source with a global database covering 730,000 players across 250 competitions, provides professional teams with a rich dataset and innovative platforms that can reduce risk, help identify talent and inform decision-making throughout the recruitment process.

Catapult API provides access to data derived from their many wearable sensor products. The data is leveraged for player training to improve the health, wellbeing and fitness of the player. Sensor data has the ability to provide various data parameters such as the average velocity a player travels, the biometrics of a player, including heartbeat, and many more data points that can help teams optimize their players and mitigate the risk of injuries

Lovelytics Gets The Assist 

To eliminate the need for manual processes, Lovelytics built an automatic data ingestion pipeline solution to ingest data into Philadelphia Union’s instance of Databricks’ delta lake.  A Delta Lake works as a single source of truth for various departments and provides a repository of reliable, governed data. 

The automatic data ingestion pipeline solution fetches the required data from the API sources and updates the existing delta tables automatically, removing the need to manually refresh the tables to fetch the latest data. In addition, we built a set of predefined data quality control and cleaning methods for Philadelphia Union to ensure they are generating  accurate and precise insights automatically.

This automated data ingestion pipeline can be run in less than 1 minute for each API source, saving significant and valuable man-hours and provides the updated insights instantaneously, 24/7. 

In addition to building and implementing the automated data ingestion pipeline solution, we provided the Philadelphia Union data and analytics team with a framework and training enabling them to ingest any additional API data sources in the future, without the need to bring us back in. 

Lovelytics also provided training to the developers, upskilling them in Databricks, Big Data concepts, data pipeline creation, best-practices and how to integrate Tableau visual analytics with the data pipeline.


“The Lovelytics’ team was hands-on through the entire process.  They made sure they understood our use cases to best build out our data pipelines, while training our team to manage and update our new data architecture as we build our analytics at scale” – Addison Hunsicker – Philadelphia Union

Not Just For MLS Teams

The automated data ingestion pipeline solution can be leveraged for any industry that needs to bring in data from multiple sources to understand their data like never before. Want help building scalable and reliable data pipelines with real-time insight generation utilizing Business Intelligence tools such as tableau and machine learning models? 

Contact [email protected] to get started or visit us at to learn more.

Need help with your data?

We devise intelligent, data-based ways to help our clients run more efficient and optimized businesses.

Our Services


Are you ready to work with Lovelytics?

Send us a message to get started.