Data Platform Engineer · SF Bay Area
Jordan Lewis
I take data platforms from proof of concept to production. Right now that's a data lakehouse on Kubernetes: one query engine over 40 data sources, backing a customer-facing product and about a dozen internal teams. There's a Kafka streaming platform in staging and AI tooling on top, plus the on-call and disaster recovery that keep it running.

The platform I run, by the numbers
Selected work
Things I've built
- 01
Platform
Production data lakehouse on Dremio + Kubernetes
A production-critical service at 99.9% uptime. It powers a customer-facing SaaS product and is queried directly by a dozen internal teams, data scientists, and engineers.
DremioApache IcebergKubernetesHelmHDFSSnowflakeGrafana - 02
Streaming
Kafka platform on Confluent for Kubernetes
Self-hosted, secured real-time data streaming, running in staging. The streamed data lands straight in the lakehouse, ready to query.
Apache KafkaConfluentKRaftOAuthDebeziumApache Iceberg - 03
AI / Platform
AI data steward + MCP server
Writes documentation for a data catalog every day, and answers questions about it from an engineer's code editor.
PythonLLMMCPApache IcebergSupersetPrometheus - 04
BI platform
Self-serve BI on Apache Superset
The charts-and-dashboards layer over the lakehouse, plus one-click dashboards built straight from the data catalog.
Apache SupersetDremioKubernetesHelmHelmfile
Latest thinking
Data Engineering Is Going Full-Stack
A Data Engineering Open Forum keynote by Airbnb's Jerry Wang argued that AI is collapsing the data stack into one role: the full-stack data engineer who owns the whole lifecycle. Looking at my own work, that shift already happened to me.
Read more
Let's talk.
Happy to talk data platforms, lakehouses, or where AI actually earns its keep in infrastructure. LinkedIn is the fastest way to reach me.