About & resume

A bit more about me

I'm a data platform engineer on a data mesh team at a marketing technology company. Data engineering + platform engineering + distributed systems. Right now that's a Dremio lakehouse, a Kafka streaming platform on Kubernetes, and an Iceberg catalog pulling on-prem and cloud data into one place. The lakehouse backs a customer-facing product and the internal teams that build on it.

Most of the job is figuring out how the pieces fit, and why they break. I own whole systems: the upgrades, on-call, the parts that decide if anyone trusts it. Outside work I'm usually on a mountain bike somewhere around the Bay Area.

View resume (PDF)

Trajectory

How I got here

2019
B.S. and a first data job
Finished a B.S. in Computer Science at Arizona State and started at USAA as a data engineer in Plano, Texas.
2019 – 2022
USAA: data engineering
Built ETL pipelines and runtime data-integrity controls in Python, IBM DataStage, and Airflow across Snowflake, Oracle, and Netezza. Led the first team in the org to move business-built models onto IT-supported infrastructure: 60% less code, 95% less monthly effort, about $84K saved per model a year. Tech-led a team of five toward the end.
2020 – 2024
M.S. at night, full time by day
Started an M.S. in Computer Science at Georgia Tech in 2020 and finished in 2024, taking classes at night across both jobs while working full time.
2022
Joined the data mesh team
Joined the Data Mesh team to build a new internal data platform from scratch. The team was part of Valassis Digital under Vericast at the time; RR Donnelley acquired the Valassis business from Vericast in 2024, and it now operates under RRD's Iridio marketing platform.
2022 – 2024
Cataloging, APIs, and the internal portal
Built the cataloging and lineage backbone (crawlers that load thousands of assets into the governance catalog), a data-product API layer, and the team's internal web portal.
2024
Lakehouse, from scratch
Stood up the first Dremio lakehouse on Kubernetes with an Iceberg catalog and object storage. Wired in Kerberized HDFS and federated sources, and built the first Prometheus and Grafana observability for it.
2025 – now
Production platform, streaming & AI
Hardened the lakehouse into an automated multi-environment platform in production. Added a Kafka streaming platform in staging, took on-call and HA/DR, and built AI tooling on top: an LLM data steward and an MCP server.

Work Experience

RR Donnelley

San Francisco Bay Area · Remote · Aug 2022 – Present

Senior Data Engineer · Data Platform
May 2024 – Present
Our Dremio data platform: a production service at three nines that backs a customer-facing product and about a dozen internal teams. One query engine over 30+ data sources, plus the streaming and AI tooling around it.
- Took the Dremio lakehouse from a single-VM proof of concept to a production-critical service at three-nines availability that backs a customer-facing product and about a dozen internal teams: dev, staging, and prod on Kubernetes from a maintained Helm chart, deployed through GitLab CI/CD, behind F5 ingress, with RBAC and vault-synced secrets.
- Federated 30+ data sources across six backend types (HDFS, Hive, S3, Snowflake, Postgres, Mongo) behind one SQL engine with an Apache Iceberg catalog and semantic layer, joinable in a single query.
- Cut multi-minute analytical queries to seconds by isolating heavy workloads onto dedicated reserved engines and tuning query parallelism on partition-heavy datasets.
- Own on-call, incident response, and the HA/DR posture (RTO/RPO targets, coordinator HA, cross-region backups). Resolved a production storage-auth outage in about 90 minutes, then used the window to remove the cloud-credential dependency that caused it rather than just patch it.
- Drove the build-vs-buy decision for a streaming platform through a day-2-operations bake-off, then built it on Confluent for Kubernetes (KRaft, OAuth/RBAC, TLS, schema registry, Connect) with Debezium CDC landing topics in the lakehouse as Iceberg tables.
- Built an AI data steward: a two-phase LLM pipeline that documents the catalog under a cost guardrail, an MCP server engineers query from their editor, and bronze/gold observability over its runs.
- Own SSO and OIDC for Dremio and Superset, including migrating the platform identity provider, partnering with the identity and security teams on access and group mapping.
- Run Apache Superset as the BI layer over the lakehouse, plus a one-click generator that turns any dataset into a starter dashboard.
- Primary technical liaison to DevOps, networking, and cloud-security teams; integrated the first Prometheus/Grafana/JMX metrics and query/audit logging for the platform.
Data Engineer · Data Platform
Aug 2022 – May 2024
Worked on the company's Data Mesh platform supporting governed data access, analytics enablement, and internal data products.
- Stood up the team's first Dremio lakehouse on Kubernetes (an Iceberg catalog, Kerberized HDFS in HA, and federated on-prem and cloud sources): the foundation the production platform grew from.
- Built the platform's first data catalog, a nightly crawler that loaded ~4,000 data assets with schemas, lineage, and owners, and made it boring and fast: import errors dropped from ~700 to ~30 per run and runtime from ~34h to ~13h.
- Designed the data-mesh API layer and built MuleSoft proxy APIs over internal RPC services, collapsing a three-step auth flow into one authenticated REST call across dev, QA, and prod.
- Built and launched the team's internal Next.js portal (SSO, Kubernetes, CI/CD): a data-product catalog, docs, and an in-browser query page, so people could find and use the platform.
- Configured ETL pipelines for internal consumers and data products published to the Snowflake Marketplace.
- Wrote internal documentation and engineering updates through Data Mesh Guild posts.

USAA

Plano, TX · Onsite · Sep 2019 – Jul 2022

Data Engineer
Aug 2021 – Jul 2022
Internal title: Software Engineer II
Worked on enterprise data pipelines and analytics infrastructure supporting financial forecasting and modeling workloads.
- Served as technical lead for a team of five engineers from May to July 2022, leading design discussions, code reviews, and implementation planning.
- Built batch data-processing pipelines using Domino, R, Python, Git CI/CD, and Airflow.
- Led the first team in the organization to migrate business-developed models to IT-supported infrastructure, cutting code by 60% and monthly operational effort by 95% (roughly $84K in annual savings per model).
- Developed ETL pipelines using Python, shell scripts, and IBM DataStage.
- Implemented runtime validation and monitoring to maintain data integrity across multiple pipeline stages.
- Orchestrated data workflows across Snowflake, Oracle, and Netezza databases.
- Built visualizations for financial-forecast data using React, D3, Plotly, and Tableau.
Associate Data Engineer
Sep 2019 – Aug 2021
Internal title: Software Engineer III
- Built data pipelines that moved application data into enterprise data warehouse platforms.
- Implemented validation controls to maintain data integrity across multiple stages of the pipeline.
- Developed ETL jobs using Python, shell scripts, and IBM DataStage.
- Taught quarterly internal training sessions on DataStage and ETL development practices.

Education

2024
M.S. Computer Science
Georgia Institute of Technology
2019
B.S. Computer Science
Arizona State University

Toolbox

Tools I reach for

Platforms: DremioConfluentSnowflakeSupersetCollibraMuleSoft
APIs: RESTGraphQLOAuth / OIDCApache Thrift
Streaming & data: Apache KafkaKafka ConnectSchema RegistryDebeziumApache IcebergSpark
Infra: KubernetesHelmHelmfileDockerGitLab CI/CDF5 BIG-IP
Observability: PrometheusThanosGrafana
Languages: PythonSQLTypeScriptBash

Community

Around the data community

I show up where data engineers do. A couple from the field.

Jordan Lewis with Zach Wilson of DataExpert.io at a data engineering meetup — With Zach Wilson (DataExpert.io) at a local data engineering meetup.

Jordan Lewis with Alex Merced of Dremio at the Data Engineering Open Forum in San Francisco — With Alex Merced (Dremio) at the Data Engineering Open Forum, SF.

Off the clock

Outside the day job

Jukeblox

A rhythm game I built in 48 hours for the GMTK Game Jam 2023, then kept polishing. Play it in the browser.

Mountain bike edits

My main hobby and a creative outlet. I edit and share helmet-cam recordings of rides around the Bay Area.

How I got here

B.S. and a first data job

USAA: data engineering

M.S. at night, full time by day

Joined the data mesh team

Cataloging, APIs, and the internal portal

Lakehouse, from scratch

Production platform, streaming & AI

Work Experience

Senior Data Engineer · Data Platform

Data Engineer · Data Platform

Data Engineer

Associate Data Engineer