BI platform
Self-serve BI on Apache Superset
The BI layer over the lakehouse, plus one-click dashboards generated straight from the data catalog.
Problem
People wanted to explore lakehouse data and build dashboards without filing a ticket or learning the query tools underneath.
What I built
I run Apache Superset in staging and production on Kubernetes: a forked Helm chart, a custom image with the Dremio driver, SSO, and our dashboard assets baked in, and a Helmfile pipeline that ships both environments. It connects straight to the lakehouse for exploratory analysis. On top of it, a one-click generator builds a starter dashboard for any dataset and drops the link into that dataset's catalog page, so finding a dataset and seeing it charted are one hop apart.
Scope
The BI layer over the lakehouse, plus one-click dashboards generated from the catalog. Staging and production.
My role
I run Superset as a versioned, deployed platform and built the one-click dashboard generator.
Architecture
- A forked Helm chart tracking upstream Superset with our own values.
- A custom image with the Dremio driver, SSO, and dashboard assets baked in, so the container comes up ready to talk to the lakehouse.
- A Helmfile pipeline that pins versions and ships staging and production through CI.
- Connects straight to the lakehouse for exploratory analysis, no extract or copy.
- A generator profiles a dataset's columns, picks sensible charts, assembles a starter dashboard, and links it from the dataset's catalog page.
Outcomes
- Self-serve BI over the lakehouse in two environments.
- Discovery and visualization are one hop apart: find a dataset, see it charted.
- Upgrades stay routine because it is treated like a real, versioned platform.