Data Engineering Is Going Full-Stack

At the Data Engineering Open Forum in San Francisco earlier this year, a keynote by Jerry Wang, who runs data development infrastructure at Airbnb, stuck with me. He'd spent fifteen years across Airbnb, Apple, Netflix, and Meta, and his call to action was that the data engineer's job is no longer managing pipelines, it's owning the data's entire lifecycle. The specialist roles that used to sit at each layer of the stack are collapsing into one person. The shorthand a lot of people are using for it is the full-stack data engineer.

The model on the slide was a single role spanning the whole lifecycle: data modeling and ETL, the platform and tooling, the semantic layer, data products and APIs, and the analytics on top. One engineer, the whole stack.

I sat there realizing that's the shape my career already took. Not on purpose. Every job just kept pulling me one layer up.

How I ended up across the stack

I started in modeling and ETL. Batch pipelines in Python and shell, jobs in an enterprise ETL tool, and runtime checks that caught bad data before it reached a warehouse. The work was moving data between systems and making sure it stayed correct.

Then the platform. I took a Dremio lakehouse from a proof of concept to a production service on Kubernetes: three environments, automated deploys, SSO, and the on-call that keeps it up. One SQL engine over 40 sources, backing a customer-facing product. That's where I spend most of my time, and it's still the center of gravity.

Then the layers on top of it. A semantic layer, so a query reads from how data is actually used instead of where it happens to sit. Proxy APIs that fold a multi-step internal call into one request. A Kafka platform, in staging, landing topics in the lakehouse. Superset for the BI.

Then the AI. An LLM tool that documents a development catalog, with the freshness gate and the guardrails that make an automated writer safe to trust.

None of it was a plan. Each piece was just the next thing that was broken or missing.

Why AI is the thing pushing this

Each layer used to need its own specialist because of friction. AI tooling cuts it, so one engineer can reach the whole stack.

The reason each layer used to need its own specialist was friction. Documenting a catalog by hand is a job nobody wants, so it doesn't happen. Wiring up a new source took a ticket and a week. A new dashboard meant waiting on someone.

AI tooling cuts that friction. The steward I built writes catalog docs against a development catalog, work that used to be nobody's job. Codegen can turn a new connector or a Superset dashboard into a short task instead of a sprint. When the cost of touching a layer drops, one engineer can reasonably own more of them. That is the whole argument.

What full-stack does not mean

Full-stack is range on top of one thing you are genuinely deep in. The platform and the reliability around it is the foundation.

It does not mean you are an expert at everything. I'm not. It means you own the seams between the layers, not just the boxes.

And the seams aren't only technical. The layers are owned by different people. Ingress is DevOps, SSO is the identity team, secrets are security. Owning a seam means owning the handoff to a team you don't sit on, and on a fully remote team that means earning trust with people you've never met, over calls and docs, before any of the stack above the seam is yours to build on. The hardest part of going wide wasn't learning the next layer. It was the people between them.

The depth still matters. A platform that falls over at three nines is not saved by the engineer also knowing BI. Full-stack is range on top of one thing you are genuinely deep in. For me that is the platform and the reliability around it.

What I'd take from this

If you are a data engineer wondering where to put your energy, three things I would act on.

Go deep before you go wide. Pick one layer and earn real expertise in it before you spread out. Mine is the platform and the reliability around it, and everything else is reach I added on top of that core. Width without a deep core is just shallow everywhere, and people can tell.

Treat the seams as the real work. Owning more layers is rarely blocked by the next tool. It is blocked by the handoffs to the teams that own the pieces you depend on. Getting good at earning trust across teams, especially remote, is what actually lets you go wide, and it is the part the tooling will not do for you.

Use AI as the lever, not the headline. The reason one engineer can reasonably own more now is that codegen and LLM tooling drop the cost of touching a new layer, so a new connector or a catalog's docs becomes a short task instead of a sprint. Learn to wield it and your reach compounds.

The specialist seams are closing. That is the bet I am making, mostly because it is the job I already have.

How I ended up across the stack

Why AI is the thing pushing this

What full-stack does not mean

What I'd take from this

I Built an AI Data Steward. The Hard Part Wasn't the AI.