Lawrence Huibuilds AI · writes in public
01Home02Projects03Writings04About05Resume07Email
01Developer infraOpen source

Hannah

Cache-aware proxy for Codex and Claude Code that cuts token waste, improves cache efficiency, and makes session costs measurable.

Hannah sits between coding agents and the model API and rewrites only the expensive parts, cutting wasted tokens while leaving the cheap passthrough alone.

P
Measured proof

Benchmark evidence

Hannah already has a concrete benchmark in the repo: a browser Pac-Man build run through Codex with default optimization versus plain passthrough. The point is not abstract token theory; it is measurable savings on a real coding-agent task while still passing the quality gate.

Input tokens per turn
52,22933,951
Prompt-cache hit rate
~10%59%
Tool schema per turn
54 KB7 KB
Wall-clock
11m 41s10m 28s
01
Hannah

Core idea

Hannah is a practical response to the fact that coding-agent sessions waste tokens differently depending on the harness. Codex and Claude Code do not benefit from the same optimization strategy, so a useful proxy should adapt per harness rather than applying one global trick.

02
Hannah

What it proves

The benchmarked Codex path cut re-sent input significantly while raising cache-hit rate and keeping task quality intact. The broader point is that operational AI systems need measurement and harness-aware control, not just blind prompt optimization.

  • /Cross-harness token and cache reporting
  • /Tool-output compression for Codex
  • /Lossless observation compression for Claude Code
  • /Per-session cost visibility