Snowflake has recently rebranded (or at least repackaged) their managed Polaris offering as Snowflake Open Catalog. Let’s dig in and see exactly what this means for people looking to utilize the solution and for the Iceberg community at large.
Overview
Although Snowflake has seemingly scrubbed the references to their open source Apache Polaris project from the documentation, Snowflake Open Catalog appears to be a hosted version of that tool. What that means is that consumers using the service will get access to a catalog that implements Iceberg’s REST API, making it compatible with a number of platforms including Apache Spark, StarRocks, Trino, and Snowflake itself.
Now, when I say “catalog” here, it refers to a catalog in more of a technical sense. Open Catalog will help you track metadata like tables, columns, and data types as well as storage locations across different compute platforms, but it isn’t exactly going to be a competitor to an Alation or an Atlan. These tools provide consumers of data with everything they need in order to understand their data, including table and column descriptions, data quality information, lineage, etc. For now, at least, these features don’t appear to be part of Snowflake’s solution. Some of these features are available as part of Snowflake’s Horizon Catalog offering, but that’s a bit of a different beast at the moment (more on that a bit later).
Setup
If you already have a Snowflake account (or once you sign up for one), setting up a new catalog is simple enough – just go into Snowsight (Snowflake’s UI) and go through a quick setup process.
One thing to note about the implementation of Open Catalog is that it functions as a separate account within Snowflake. So even if you’re already using Snowflake, your Open Catalog will have a new set of users, permissions, connections, etc. that have to be managed. It also seems that it’s currently missing the ability to integrate any sort of single sign-on, which may be a large barrier to entry for many enterprise customers.
Integration with Snowflake
With that catalog account being separated from any existing Snowflake accounts, there are extra steps required to either read Open Catalog tables in Snowflake or sync Snowflake tables back to the catalog. These steps are fairly straightforward, but seem like unnecessary overhead for two services hosted by the same provider. It also requires managing an extra set of credentials in Snowflake, which I would prefer not to have to do within a single service.
Unfortunately, Iceberg tables in Open Catalog also fall prey to some of the challenges I mentioned in a previous post about Snowflake’s Iceberg support. In this case, any Snowflake-managed Iceberg tables that you sync over to Open Catalog will be read-only in Open Catalog, whereas any Open Catalog-managed tables will be read-only in Snowflake. This obviously creates a challenge for anyone trying to create an open lakehouse with these technologies with the capability of having multiple technologies contributing.
Another major challenge of having the accounts be separated is the split approach to data governance. Snowflake already has their aforementioned Horizon Catalog solution that provides the ability to define and enforce security policies on assets that are internal to Snowflake. Policies defined in Horizon aren’t currently synced over to Open Catalog (nor are Open Catalog policies synced back to Horizon), meaning that separate sets of policies need to be created and kept up to date for both systems. To further complicate the matter, the privileges that can be granted are different between the two catalogs. While Horizon has more traditional data warehousing-style privileges (e.g. SELECT, INSERT, UPDATE, etc.), Open Catalog has a separate set (e.g. TABLE_READ_PROPERTIES, TABLE_READ_DATA, TABLE_WRITE_DATA, etc.). This makes keeping the two systems in sync a difficult proposition.
External Integrations
One of the draws of using an open data catalog is the ability to connect a number of different tools and services to a single version of the data and metadata. This prevents the need to move data around between systems and instead allows a number of different platforms to interact with the data where it already resides.
Due to its implementation of the Iceberg REST API, this is where Open Catalog does hit its stride. Connecting to the catalog using a compute engine like Apache Spark is as easy as creating credentials in Open Catalog (called “service connections”) and adding those credentials to your Spark configuration. Any platform that runs Spark would be able to use this approach to interact with the catalog, providing interoperability across vendors and even clouds.
Other tools like Apache Flink and Trino have a similar configuration process, granting access to the catalog with relative ease. These tools can then all seamlessly interact with the data in place, enabling a wide range of possibilities across your data stack.
Pricing
Snowflake’s documentation currently states that Open Catalog will be free for six months after its general availability date and then will be billed per-request in mid-2025. Once that billing period starts, their consumption table has Open Catalog being billed at a rate of 0.5 credits per million requests. Credits typically start around $2 for standard service tiers but can go all the way up to almost $10 for higher service tiers. For most implementations these costs should be reasonable, but with a number of engines connected to a catalog and sending requests there is the potential for larger costs.
Wrap Up
If you’re looking for a hosted technical catalog for Iceberg tables that can be easily hooked into different platforms, Snowflake Open Catalog delivers on that promise. If you were looking for a more robust cataloging solution to help make sense of data assets across your data platform, though, it seems Snowflake still has a ways to go. I think Snowflake is on the right track with trying to build a more open data platform, but so far their implementation is leaving a little to be desired.
Interested in learning more about Nousot’s approach to creating a modern data platform or how to implement a holistic data catalog across your organization? Reach out to us at [email protected].
* This content was originally published on Nousot.com. Nousot and Lovelytics merged in April 2025.
