Skip to content

BI platform

Self-serve BI on Apache Superset

The BI layer over the lakehouse, plus one-click dashboards generated straight from the data catalog.

Problem

People wanted to explore lakehouse data and build dashboards without filing a ticket or learning the query tools underneath.

What I built

I run Apache Superset in staging and production on Kubernetes: a forked Helm chart, a custom image with the Dremio driver, SSO, and our dashboard assets baked in, and a Helmfile pipeline that ships both environments. It connects straight to the lakehouse for exploratory analysis. On top of it, a one-click generator builds a starter dashboard for any dataset and drops the link into that dataset's catalog page, so finding a dataset and seeing it charted are one hop apart.

Scope

The BI layer over the lakehouse, plus one-click dashboards generated from the catalog. Staging and production.

My role

I run Superset as a versioned, deployed platform and built the one-click dashboard generator.

Architecture

  • A forked Helm chart tracking upstream Superset with our own values.
  • A custom image with the Dremio driver, SSO, and dashboard assets baked in, so the container comes up ready to talk to the lakehouse.
  • A Helmfile pipeline that pins versions and ships staging and production through CI.
  • Connects straight to the lakehouse for exploratory analysis, no extract or copy.
  • A generator profiles a dataset's columns, picks sensible charts, assembles a starter dashboard, and links it from the dataset's catalog page.

Outcomes

  • Self-serve BI over the lakehouse in two environments.
  • Discovery and visualization are one hop apart: find a dataset, see it charted.
  • Upgrades stay routine because it is treated like a real, versioned platform.

Stack

Apache SupersetDremioKubernetesHelmHelmfile