Benchmark evidence
Hannah already has a concrete benchmark in the repo: a browser Pac-Man build run through Codex with default optimization versus plain passthrough. The point is not abstract token theory; it is measurable savings on a real coding-agent task while still passing the quality gate.