Side project

SteamBangers

A data product that scores every game on Steam for bang-for-your-buck. One number, 0 to 100, the Bang Score, for how much real game you get per dollar. The pipeline was the part I worried about least. The hard part was that the obvious way to measure value optimizes for the opposite of value.

Live on steambangers.com~1,500 games scoredOpen lakehouseAI-free pipelinePublic API + MCP

Open steambangers.com

SteamBangers architecture: ingest to an Iceberg lakehouse to the edge

What it is

I grew up doing this math by hand. The PC was a hand-me-down and twenty dollars was a real decision, so before I bought a game I'd estimate the hours, the quality, whether it was worth it. SteamBangers is that math for the whole Steam catalog: ~1,500 games scored so far, the rest on the way.

It's the first data product I've ever shipped end to end. At work my team runs the platform other people build their products on. This is me owning the whole arc, pick the question, design the metric, ship the thing people click, on a question I've cared about since I was that kid.

Nobody owns the question I actually ask before I buy: is this worth the money. HowLongToBeat owns playtime, Metacritic owns reviews, SteamDB owns the raw data. The Bang Score is one opinionated number for value, on the one question with no incumbent. I show the score and every input behind it. I never show the recipe. A published formula gets gamed in a week; a secret score people argue about is the thing that brings them back.

It doubles as a public data-platform showcase. It ingests a handful of flaky third-party sources, matches messy entities, computes a versioned metric, and serves thousands of fresh pages, the same work the day job wants, except this one is clickable.

How it's built

Bronze. Raw crawl payloads on an Apache Iceberg table in R2. Schema, snapshots, time-travel, queryable straight from DuckDB or Spark.
Silver + gold. A nightly job re-scores everything and writes typed, computed tables to Neon Postgres. Deterministic, no AI in the pipeline.
Edge. Gold gets published to a Cloudflare D1 SQLite replica the site reads on every request, so a page load never waits on Postgres.
Rebuild. The whole pipeline runs nightly on Modal. An output-size guard blocks a green-but-wrong build from overwriting the live catalog.
Consume. A versioned, key-gated public API at /api/v1, an MCP server, and llms.txt, all sourced only from the public gold tables.

The product itself runs zero AI: the Bang Score is a pure function of the data, so it stays reproducible and auditable, the same number from the same inputs every time.

The write-up

I wrote up the two rules this project cost me a rewrite to learn: a metric that feels right can optimize for the opposite of what you want, and a green build can be the worst bug you ship. Read it on the blog →