Blog | Data Analytics | Databricks

The Problem with “Snowberg”

Using an open lakehouse architecture – one that stores data in a user’s own cloud storage accounts using an open source storage format (like Apache Iceberg) and data catalog – can help to combat the velocity of change in an ever-evolving analytics ecosystem. Iceberg has continued to grow in popularity recently due to its flexibility and functionality, and it’s backed and/or supported by a large list of heavy hitters in the industry, including Netflix, Apple, and Snowflake.

Unfortunately, while Apache Iceberg is all about openness and cross-engine/platform support, Snowflake’s Iceberg implementation (“Snowberg”, if you will) seems to be the opposite. With a couple of added layers of complexity and technical quirks it forces users to make hard choices that don’t seem aligned with the full breadth of Iceberg functionality.

One major choice is whether to use managed or internal Iceberg tables in Snowflake. Here’s a quick rundown of the tradeoffs there:

Managed Iceberg tables

  • Snowflake compute can read/write, but other tools can’t write.
  • If you just use Snowflake’s proprietary Iceberg catalog, you’re locked into a closed ecosystem where only Spark can read your data, none of the other major platforms can read Snowberg.
  • If you want interoperability with other Iceberg clients, you have to also use Polaris Catalog, which introduces an additional catalog to provision, govern, and pay for.
  • Snowflake’s managed Iceberg tables don’t support table partitioning, which makes it virtually impossible to create large tables and achieve good performance. Snowflake substitutes partitioning their proprietary Snowflake clustering which makes reads fast on Snowflake, but they’ll still be slow on external engines. In order to take advantage of this clustering you’re locked into using Snowflake compute.
  • The small file problem. Iceberg file sizes have to be 16MB to match Snowflake internal proprietary data format for good query performance on Snowflake. So, a 1TB table will have 65K files which does not sound a lot but a 100TB table will have 6.5M files. In contrast, the default file size per the Iceberg spec is 512 MB, that’s 32x bigger than Snowlake Managed Iceberg tables.    

Unmanaged Iceberg tables

  • Other tools can read/write, but now Snowflake can’t write.
  • Requires external Iceberg catalogs (based on the Iceberg REST Catalog spec), e.g. AWS Glue or Apache Polaris or object storage as the catalog.

You might be asking yourself “shouldn’t Snowflake’s managed Polaris catalog make all this simpler?” Well… not really. Polaris is just a catalog, and you need to configure engines to write to those tables. However, Snowflake itself doesn’t write to Polaris, making it difficult to actually use Snowflake and another tool at the same time. Snowflake’s managed Iceberg tables can only be registered in Polaris as external Iceberg tables, and even then there’s a catch…you have to use Snowflake’s proprietary catalog integration between Snowflake and Polaris in order to sync your managed Iceberg tables to Polaris. All of this complexity… and you still need to decide whether Snowflake or some other tool is able to write to those tables, you can’t have both. 

At Nousot, we frequently help our clients design, build, and operate architectures that take advantage of interoperability. These architectures help us to provide innovative analytics solutions, maximizing the value we create for our clients. As the analytics and AI landscape continues to evolve, these open architectures will be critical in allowing organizations to quickly adapt and remain competitive. We hope Snowflake evolves to a true open lakehouse by simplifying their Iceberg support into a single Iceberg table type that allows reads/writes from any engine, supports all Apache Iceberg features, and adopts a single, open catalog that adheres to the Iceberg REST Catalog specification.


* This content was originally published on Nousot.com. Nousot and Lovelytics merged in April 2025.

Author

Related Posts

A conversation with Lovelytics' new databricks MVPs
Jan 22 2026

The New Era of AI: A Conversation with Lovelytics’ New Databricks MVPs

As AI reshapes the enterprise landscape, Databricks has launched a new AI MVP designation to recognize the practitioners leading the charge. We are thrilled to...
Jan 20 2026

Lovelytics at DTECH 2026: Navigating the AI-Driven Grid

The power and utilities industry is at a critical inflection point. As we prepare for DTECH 2026 in San Diego from February 2–5, the conversation has shifted from "why"...
Dec 24 2025

Tackling the Telco Reliability Crisis: From Reactive Chaos to AI-Driven Resilience

In the telecommunications industry, the pressure has never been higher. As demand for seamless connectivity skyrockets, providers are grappling with aging...
Dec 16 2025

Validating the Shift: How Lovelytics & Databricks Solve the Agent Reliability Paradox

This blog analyzes the recently published Measuring Agents in Production study, identifying the critical engineering patterns that separate successful AI agents from...
practical guide for leaders who need a clear plan for stronger governance in 2026
Dec 09 2025

10 Steps to Updating Your 2026 Data Governance Strategy

It is the holiday season and organizations are preparing to accelerate their new budgets and plans for 2026. With the desire to drive AI use cases and further enable...
From category to data leadership
Dec 02 2025

From Category to Data Leadership: Reflections on My First Two Months at Lovelytics

After more than two decades in the CPG and retail world partnering with some of the biggest brands and retailers to drive category growth, I thought I had seen it all....
Nov 18 2025

What Our LATAM Team Loves Most About Working at Lovelytics

At Lovelytics, our LATAM team brings together talented professionals across countries, cultures, and time zones to deliver innovative, high-impact work.  The...
Nov 11 2025

Taxonomy Agentic AI: Building the Foundation for Smarter Data and AI Outcomes

Across industries, organizations face a common challenge: messy, inconsistent product, parts, and content taxonomies. Whether in manufacturing, retail, CPG, or travel,...
Oct 16 2025

What Our Team Loves Most About Working at Lovelytics

At Lovelytics, our people are at the heart of everything we do. When we asked employees about their favorite part of working here, common themes quickly emerged:...
Oct 09 2025

Gridlytics AI: Transforming Utility Grid Operations with Unified Ontology and Interpretive AI

As the energy landscape rapidly evolves, utilities face unprecedented challenges. Aging grid infrastructure, decentralized renewables, surging demand from electric...