Data Analytics | Data Science | Data Strategy | Databricks | Healthcare and Life Sciences | Uncategorized

Gaining Insights Into Your HL7 Data With Smolder and Databricks-#1 of 3

This is Blog #1 in a 3 Part Blog Series On HL7 and Healthcare Data Interoperability

What is HL7?

If you’ve worked with healthcare data, chances are you’ve heard of HL7v2. To give some background for those who aren’t familiar, HL7v2 is a messaging standard that is used to electronically exchange clinical data between systems of a healthcare organization. It has been adopted by healthcare institutions across the world to transmit large volumes of healthcare data in real-time. While this standard is great for exchanging information, it was not designed with analytics in mind. In it’s raw form, HL7v2 data is difficult to analyze and requires a significant amount of pre-processing. This poses an issue for healthcare organizations who want to perform queries on large volumes of data to quickly gain insights. 

Smolder Solves the Problem

The solution to ingest and perform ETL on large volumes of HL7v2 data is to use Smolder within Databricks. 

Smolder is an open-source library that makes it very easy to parse HL7v2 data in Spark. It enables you to load HL7 messages directly into Spark DataFrames and integrates seamlessly into Delta Lake. 

There are a few prerequisite steps that need to be taken in order to get Smolder set up on your cluster so you can start parsing HL7v2 data. We’re going to make this process as simple as possible so you can start working with your data sooner. 

Smolder Library Jar File for Spark

First, the library needs to be compiled as a JAR file. We’ve gone ahead and done this step for you. All you need to do is download the JAR file from here

Databricks Workspace

Start a cluster in Databricks as per your requirements. Once the cluster is running, navigate to the “Library” tab and select the “Install new” button. Simply click on “Drop JAR here” and select the JAR file to install it on the cluster. You can also just drag and drop the JAR file. 

Now that Smolder is installed on the cluster, we can start using it to parse our .hl7 files. For instance, we have an HL7 file consisting of 400 ADT_A01 (Patient Admit/Visit) messages that was uploaded to FileStore via File → Upload data. The HL7 file was obtained from Simulated Hospital and can be downloaded here. 

We can simply read in our HL7 file via the Spark DataFrameReader API and specify a line separator in order to distinguish between messages. However, looking at the schema our values are still strings that we need to separate into two columns: Message and Segments. 

Smolder makes this very easy with the “parse_hl7_message” function. We see in the image below that we have now parsed our HL7 data and separated the raw data into messages and segments. 

From here, now that we’ve parsed our data we can begin processing it to build out our ETL pipelines which we have outlined in the next blog. Click here to read Blog #2 in the 3-part series.

Lovelytics is a preferred partner of Databricks and has helped many clients install and configure their Databricks instances. To learn more please visit us at www.lovelytics.com/partners/databricks or connect with us via email at [email protected]

Healthcare Interoperability is a focus for Lovelytics and Databricks. Please join us at the Databricks AI Summit on June 27-30 2022 in San Francisco to see Healthcare Interoperability in action. Click here to register to attend.

Author

Related Posts

Nov 11 2025

Taxonomy Agentic AI: Building the Foundation for Smarter Data and AI Outcomes

Across industries, organizations face a common challenge: messy, inconsistent product, parts, and content taxonomies. Whether in manufacturing, retail, CPG, or travel,...
Oct 09 2025

Gridlytics AI: Transforming Utility Grid Operations with Unified Ontology and Interpretive AI

As the energy landscape rapidly evolves, utilities face unprecedented challenges. Aging grid infrastructure, decentralized renewables, surging demand from electric...
Sep 30 2025

Customer Story: Locality Is Changing Local Advertising with Audience Intelligence

Scaling local advertising has always been hard. Fragmented workflows, rising costs, and limited ownership of audience data slowed progress. Locality has set out to...
Sep 29 2025

How Locality Is Redefining Local Advertising with Unified Audience Intelligence

Campaign planning, audience activation, and measurement have long been handled in silos. Teams jump between platforms, vendors, and manual processes. That slows down...
Aug 27 2025

Why “Data as a Product” Is the Shift Business Leaders Need Now

Most companies don’t have a data problem. They have a data usability problem. You have data. Lots of it. But when it’s time to make a business decision, whether it’s...
Aug 19 2025

Beyond Prompt Engineering: Building Agentic Workloads with DSPy, MLflow, and Databricks

Learn how enterprises can move beyond fragile prompt engineering to build reliable AI agents with DSPy, MLflow 3.0, and Databricks.

Blog title image with logos for OpenAI and Databricks
Aug 13 2025

Harnessing the Power of OpenAI gpt-oss and GPT-5 with Databricks and Lovelytics

The AI landscape is advancing rapidly, with breakthroughs unlocking new possibilities for businesses every day. OpenAI’s recent release of the gpt-oss and GPT-5 models...
Aug 04 2025

How Lovelytics and Databricks Partnered to Migrate and Automate Databricks’ Internal Reporting to AI/BI

Introduction: What is AI/BI and Why It’s a Game-Changer For years, BI tools have helped organizations analyze and visualize data, but the landscape has shifted....
Jun 24 2025

What is Databricks AI/BI Genie and how do you use it?

AI/BI Genie is an agent that allows us to interact with data through conversation. In this article, we’ll explain what challenges it addresses, how much it costs, and...
Jun 23 2025

From Productivity Paradox to GenAI Acceleration: Key Takeaways from DAIS 2025

Historical Perspective on Innovation: From Dynamos to AI Agents In the late 19th century, the promise of electrification captured the imagination of industrialists....