Everyone's Burning Millions on AI Coding. I Tried to Fix the Bill — and Built the Wrong Thing.
I spent days building a proxy to cut my AI coding bill. The benchmark deleted it — and showed me the one setup change that actually cuts cost by an order of magnitude. It wasn't the model.
You've seen the headlines — the industry torching hundreds of millions of dollars in tokens on AI coding agents. My reaction was probably the same as a lot of engineers': there's got to be waste in there, and waste is a product.
So I built one. Hannah was a proxy you'd point your coding agent at. It would trim and compress the context being re-sent every turn, and route the easy work to a cheap local model while the hard work went to a frontier model. Classic middleman-takes-a-cut economics. On paper, obvious.
Then I benchmarked it properly, and the data quietly took the idea out back.
The setup
I built the same two real apps — a Next.js PWA and a mobile game — four ways: on Codex (gpt-5.5), on Claude (opus-4.8), on a local model, and through my router. Same tasks, real build-and-test verification on every run, actual token and dollar accounting.
Three things fell out, and none of them were what I expected.
1. "Subscription" coding agents are not free — they just report $0
The harness shows you nothing because it isn't pricing that path, but the API-equivalent bill is real: $0.48 to build the PWA on Codex, $0.50/M). A terser model beats a verbose one on the same task, even when it reads more context. The thing I built Hannah to optimise barely mattered.$0.95 on Claude. And the bill is dominated by output tokens ($25–30/M), not the input you're tempted to compress — cached input reads are nearly free (
2. The 10× lever isn't the model — it's the harness
This is the one that actually matters. Same model (gpt-5.5), same kind of build, two harnesses:
| Harness | Input tokens / turn | Build cost |
|---|---|---|
| Native Codex CLI | ~50–70k | ~$5–11 |
| Lean harness (Pi) | ~4k | $0.22 |
That's ~15× more context shipped per turn — purely from the harness, before a single optimisation. The native CLI re-ships a big system prompt, full tool schemas, and accumulated history every turn; the lean harness ships a tiny prompt and four tools.
And no — caching doesn't save you. The native CLI cached well here (most of its input was cache-reads), but at enough volume even near-free reads add up. The harness sends more; caching only softens the blow.
3. Routing barely pays on real work
On a from-scratch build there's no big "easy slice" to offload — the planner sent almost everything to the frontier anyway, and the local model got one or two buggy steps. The routed run ended up the most expensive and the slowest: you pay a planner tax on top of frontier executors. Routing only wins when there's a large, separable, low-risk chunk — which complex builds don't have.
The conclusion: it's a setup choice, not a product
Across both tasks, one setup was consistently the cheapest, the fastest, and the only clean first pass:
Pi + Codex. $0.22–0.48 a build, 2–4 minutes, correct on the first try. Roughly an order of magnitude cheaper than the same build in a heavyweight native harness.
And it needs no proxy, no routing, no local model — the exact machinery I'd spent days building. The whole savings story collapses to two decisions you make once, for free: use a lean harness, pair it with a terse frontier model.
The proxy was a clever answer to a question a good default already solves. Two things did survive the cull, and I think they're the real lesson:
- Cost has to be visible — per-task, per-model, in real dollars. "It says $0" is how the bill gets to nine figures.
- You cannot trust an agent's own "done." A wall-clipping game with a dead control pad passed every "no console errors" check I had. A functional check — actually run it, screenshot it — caught the bug in seconds.
I'll publish the full benchmark and the Hannah code separately for anyone who wants to pull the numbers apart. But the one-line takeaway is cheap and worth more than the proxy ever was: before you optimise the middle, look at the edges.