Databricks Summit: Three Things That Tested Our Roadmap

My team runs an Iceberg lakehouse on Dremio and a Kafka platform. I also built an AI data steward and put the guardrails on it myself. None of it is Databricks. I went to Databricks Data + AI Summit anyway, to pressure-test our roadmap against where the biggest vendor in the room is heading. Three talks did the testing, and they pointed the same direction: the engine you query with is becoming the swappable part, and the open, governed layer underneath it is the thing that lasts. I went in ready to change my mind. Most of what I saw told me to keep the roadmap. One thing showed me where it's thin.

"AI without context is wishful thinking," on an expo wall at Databricks Data + AI Summit 2026.

1. One catalog, many engines is real, in production

The clearest proof wasn't a keynote. It was Magnite, an adtech shop running a petabyte of Apache Iceberg, querying one copy of it from Snowflake, Spark, Databricks, DuckDB and PyIceberg at once. Their head of data engineering, Kayvon Raphael, presenting with Brian Likosar, said interoperability was the top reason they chose Iceberg, "and it's not even close," and that moving onto open formats improved performance instead of costing it: 50 to 70% lower p95, around 70% lower warehouse cost, no changes to the queries underneath. He was honest about the gap, too. Delta and Iceberg aren't merged yet, probably a year out, and a spec is not an implementation.

One Iceberg copy, many engines: Magnite's consumer-access list. via Magnite, Databricks Summit 2026.

That's the bet my team made on Dremio with an open Polaris catalog, the same architectural call, validated by someone running it at a far larger scale. A vendor session later in the day gave the idea a cleaner name, query federation versus catalog federation: the difference between reaching across sources at query time and reading one open copy off shared storage. Same shape either way. The catalog in the middle is the asset, and the engine on top is a choice you can change.

2. Managed ingestion, versus building the streaming layer ourselves

The next talk came at the same problem, onboarding a source without hand-writing a pipeline, from the opposite direction. Corteva, a seed and agriscience company, cut new data-source onboarding from 30 to 45 days down to 4 to 7. Their platform engineer Mehul Bhuva walked through how: ingestion became metadata-driven. You add a row to a config table, source, schema, destination, SCD type, an active flag, and the pipeline generates itself. No new code per source. That kills the boring part of the job, the permissions and the CI/CD and the per-source pipeline you rewrite every time, all handled once by that config row. It's the same goal we chase with one federated engine. They move the data in and let metadata build the pipeline; we leave the data where it lives and let one engine query across it. Same destination, opposite road.

The part I came to pressure-test was streaming. For real-time plant telemetry they used Zerobus, a managed Databricks service, and Harshit Rai, the Databricks engineer presenting with him, was blunt about why: it "eliminates the need to maintain a separate Kafka cluster." We run Confluent Kafka on Kubernetes, KRaft, OAuth, mTLS, schema registry, custom Connect images, so the build-versus-buy fork was sitting right there on stage. My read: a managed bus is the right call when the data has one destination and operating a broker buys you nothing. The moment you have many consumers, your own sinks, and a schema contract to hold across teams, the platform you run yourself earns its weight. Same shape as the catalog question, one layer down: how much of it you want to own. The only way I learned to tell the cases apart was running the hard one.

Corteva's streaming path: OPC UA to Zerobus, picked to avoid running a Kafka cluster. via Corteva, Databricks Summit 2026.

3. Trust under the AI, versus my steward's guardrails

If Corteva was about what to build, Warner Music Group was about what to put underneath it. Michael Jones, a senior engineering director there, kept coming back to one line: quality problems are often really explainability problems. They have 600-plus pipelines and billions of rows a day, and the question that broke trust was mundane. Why do three systems report three different numbers for how many times an artist has been streamed since release. The answer turns on what you count as the release date, which in music is not even one date: Jones described it as closer to a vector, with tracks that can surface before the official date. So before any of this served a chatbot, they built a trust layer underneath it: golden tests that reconcile against the legacy sources to prove a number with worked examples, and a standing forum to settle definitions like what counts as a video stream versus a short.

The question that broke trust, on a slide. via Warner Music Group, Databricks Summit 2026.

That is what I built into my AI data steward. I put a hard production guardrail and a two-phase check on it before it was ever cleared to run against real data, because an agent that writes SQL is worth nothing if no one trusts the number it hands back. WMG even stores the model's generated SQL and execution logs in Postgres as explainability artifacts, the same instinct: write down what the model did so a person can check it. Nobody on stage was bolting governance on after the model shipped. They were putting it underneath, which is the same call as the open catalog and the self-run broker, one layer down: put the durable thing underneath, let the swappable thing ride on top.

One thing did move. WMG also showed the piece we don't have yet: a low-latency serving tier on top of the analytical lakehouse, a managed Postgres called Lakebase, which took their serving from thirty seconds to 130 milliseconds. Our serving is Dremio reflections and a proxy API in front of them. That's the one place I left with a gap instead of a confirmation.

What I'm taking back

Three companies, three stacks, one shape. The engine stopped being the moat. Magnite, Corteva and WMG all put an open, governed layer in the middle, Iceberg or Unity Catalog, and treated the engine on top as swappable. WMG said the sharp version out loud: managed Iceberg-compatible tables are, in their words, "better than federated queries." We run federated queries on Dremio. WMG is right for what they build. We're right for what we run.

WMG's table-format call, and the line we run the other way on: "better than federated queries." via Warner Music Group, Databricks Summit 2026. Federation wins when you can't move or own the data: forty sources, customer-facing, no mandate to migrate anyone. Their managed path wins when you build the products end to end. There's no single right answer, only a spot on the spectrum you can defend, and the teams that can say why they sit where they sit are the ones who stay fine as the tools keep moving. I came in on open Iceberg with my own guardrails. I left more sure that's the right end of the spectrum for us.

Databricks Summit: Three Things That Tested Our Roadmap

1. One catalog, many engines is real, in production

2. Managed ingestion, versus building the streaming layer ourselves

3. Trust under the AI, versus my steward's guardrails

What I'm taking back

Get new posts by email

Data Engineering Is Going Full-Stack

I Built an AI Data Steward. The Hard Part Wasn't the AI.