Writing

Blog

Write-ups from building data infrastructure. Mostly the bugs that cost me an afternoon, and what I changed so they wouldn't cost the next person one.

— May 27, 2026 · 3 min read

I Built an AI Data Steward. The Hard Part Wasn't the AI.

A pipeline that documents a data catalog with an LLM sounds like a prompt-engineering problem. Almost none of my time went to prompts. It went to making the boring parts trustworthy.

— May 12, 2026 · 3 min read

When Your Identity Provider Lies About Group Claims

We wired group-based admin access through OIDC, granted the right group, and nobody got in. The token was the problem, and no amount of server config could fix it.

— Apr 28, 2026 · 3 min read

Credential Vending Only Works If Your Storage Speaks STS

An Iceberg REST catalog can hand short-lived storage credentials to query engines. That feature quietly assumes your object store implements one specific AWS STS call. Ours didn't.

— Jun 12, 2025 · 2 min read

Self-Serve BI, and Dashboards You Don't Have to Build

A lakehouse is only useful if people can see the data. I run Apache Superset over ours in staging and prod, and wired up a one-click generator that builds a starter dashboard for any dataset and links it from the catalog.

— Oct 18, 2024 · 2 min read

Building the Data Platform's Front Door

A platform nobody can find doesn't get used. I built the team's internal portal so people could discover data products, read the docs, and try a query, all behind SSO and shipped on Kubernetes.

— Mar 12, 2024 · 3 min read

One REST Call Instead of a Three-Step Auth Dance

Our internal services spoke an RPC dialect and made you sign in three times just to make one call. I designed the data mesh API layer and built MuleSoft proxies that turn all of it into a single authenticated REST request.

— Jun 20, 2023 · 3 min read

Crawling a Data Catalog Into Collibra Every Night

One of the first things I built. The company's datasets lived across dozens of scheduled jobs with no current inventory, so I wrote a crawler that finds them all and syncs them into the governance catalog on its own, every night.