01Developer infraOpen source

Hannah

Cache-aware proxy for Codex and Claude Code that cuts token waste, improves cache efficiency, and makes session costs measurable.

← Back to projects

Hannah sits between coding agents and the model API and rewrites only the expensive parts, cutting wasted tokens while leaving the cheap passthrough alone.

Measured proof

Benchmark evidence

Hannah already has a concrete benchmark in the repo: a browser Pac-Man build run through Codex with default optimization versus plain passthrough. The point is not abstract token theory; it is measurable savings on a real coding-agent task while still passing the quality gate.

Input tokens per turn

52,229→33,951

Prompt-cache hit rate

~10%→59%

Tool schema per turn

54 KB→7 KB

Wall-clock

11m 41s→10m 28s

Hannah

Core idea

Hannah is a practical response to the fact that coding-agent sessions waste tokens differently depending on the harness. Codex and Claude Code do not benefit from the same optimization strategy, so a useful proxy should adapt per harness rather than applying one global trick.

Hannah

What it proves

The benchmarked Codex path cut re-sent input significantly while raising cache-hit rate and keeping task quality intact. The broader point is that operational AI systems need measurement and harness-aware control, not just blind prompt optimization.

/Cross-harness token and cache reporting
/Tool-output compression for Codex
/Lossless observation compression for Claude Code
/Per-session cost visibility