01Voice AIActive client work

Voice AI Receptionist

Self-hosted AI voice agent for live operational workflows, turning phone conversations into structured actions and dispatch data.

← Back to projects

A phone-first intake system for non-emergency patient transport, combining telephony-aware ASR, explicit eligibility logic, structured capture, and safe handoff to human operators.

System proof

Operational proof

The repo already shows a real system boundary, not just a prompt. The system exposes a FastAPI `/ws/call` voice service, coordinates ASR and TTS around an explicit state machine, persists answers and transcripts in a session layer, and returns structured results for downstream operators. It also carries the telephony-specific safeguards that real calls need: playback-complete waits, echo-tail suppression, silent-input hallucination filtering, and retry-aware confirmations.

Voice loop

WebSocket audio, VAD buffering, latency fillers, playback waits, and echo-tail suppression

Inference modes

MLX on Apple Silicon, or Linux CPU inference with faster-whisper and llama.cpp

Decision layer

Eligibility and transport state machines with explicit confirmations, retry limits, and outcome codes

Persistence

Supabase sessions, answers, transcript metadata, perf traces, and JWT-gated results

Flow resilience

Published flows can load from Supabase, but live calls fall back to in-memory definitions if that path fails

Voice AI Receptionist

What it is

This is the clearest example in my current work of deploying frontier AI into a real operational environment rather than a sandbox. The system answers a live call, captures identity and mobility details, checks transport eligibility, and produces a structured outcome that a human team can actually use.

The hard part is not just transcribing speech or generating plausible replies. It is closing the full operational loop under real latency, audio quality, and auditability constraints.

Voice AI Receptionist

Why it is harder than a demo

Telephone audio is narrowband, messy, and unforgiving. The code already reflects that reality: domain-specific ASR prompting, silent-input hallucination filtering, retry budgets, confirmation loops, filler speech while inference runs, and playback-complete waits so the bot does not talk over itself.

That makes the project much closer to operational voice infrastructure than to a browser speech demo with a large model behind it.

/Real-time WebSocket audio ingestion with VAD-driven buffering
/Structured questionnaire capture with explicit confirms and retry limits
/Safe outcome coding for eligibility, escort, and transport requirements
/Session persistence and JWT-gated results retrieval for downstream teams
/Performance traces and transcript metadata for review and debugging

Voice AI Receptionist

What the system already proves

The application boundary is explicit rather than magical. A FastAPI WebSocket service coordinates ASR, TTS, state transitions, session storage, and results retrieval, while a Supabase-backed session layer persists answers, transcripts, and outcome codes.

It can run MLX locally on Apple Silicon or a Linux CPU stack using faster-whisper and llama.cpp, and it can load published eligibility flows from Supabase while still falling back safely to in-memory flows if the editor path is unavailable.