Architecture
System design, request lifecycle, and data flow of the Breeze Buddy platform.
Request lifecycle
Every voice interaction starts with a lead push and flows through a deterministic pipeline. Here is the full lifecycle:
High-Level design
Breeze Buddy is a multi-process, async-first Python application built on FastAPI.
One call, one pod. At answer-time, an external router (Agent’s Cloud) allocates a dedicated Kubernetes pod for the call and anchors the media stream to it. The pod is released when the call ends. Each call owns its runtime — a stuck LLM or a crash on one call never touches another. Horizontal scale is adding more pods; latency stays predictable under load.
What we run for you
Breeze Buddy is AI + Infra. Alongside the LLM flow, the platform runs pools, scheduling, rate limiting, webhook delivery, recording, and observability. You push leads; the platform handles the rest.
Component map
| Layer | Component | Description |
|---|---|---|
| API | app/api/routers/ | REST endpoints — templates, leads, webhooks, Daily connect |
| Agent | app/ai/voice/agents/breeze_buddy/agent/ | Core bot — pipeline assembly, transport setup, flow execution |
| Template Engine | app/ai/voice/agents/breeze_buddy/template/ | Template loading, context building, variable resolution, node transitions |
| Services | app/ai/voice/agents/breeze_buddy/services/ | Telephony providers, Daily rooms, call limiter, agent router |
| Handlers | app/ai/voice/agents/breeze_buddy/handlers/ | Internal (warm transfer, end call, outcome updater) and HTTP handlers |
| Database | app/database/ | Async queries (SQL builders), accessors (business logic), decoders (row → model) |
| Core | app/core/ | Static/dynamic config, logging, security, background tasks |
Execution modes
The execution_mode field on each lead determines how the call is placed. The Call Execution Configuration controls scheduling, retries, and provider selection per template.
| Mode | Transport | Use Case |
|---|---|---|
DAILY | Daily.co WebRTC rooms | Production web/mobile voice sessions |
DAILY_TEST | Daily.co WebRTC rooms | Playground / development testing |
TELEPHONY | Twilio / Plivo / Exotel | Production outbound & inbound calls |
TELEPHONY_TEST | Twilio / Plivo / Exotel | Test calls (excluded from analytics) |
Configuration hierarchy
Configuration follows a cascading precedence model. Lower layers override higher ones:
- Environment variables — ~198 static settings loaded at startup
- Redis / DevCycle — Dynamic feature flags, checked per-request
- Template-level config — STT, TTS, VAD, LLM settings
- Node-level overrides — Per-node VAD, input collection settings
- Playground overrides — Runtime config override via
is_playground=true
Reset-then-Apply Pattern
On every node transition, VAD and input collection configs are reset to template defaults first, then the node-level overrides are applied. This prevents configuration bleed between nodes.
Platform stack
| Layer | Technology | Customizable |
|---|---|---|
| API | FastAPI + Uvicorn | — |
| Voice Engine | Pipecat-AI | — |
| WebRTC | Daily.co | Room config, recording |
| STT | Soniox, Deepgram, Sarvam, Google, OpenAI | Per-template |
| LLM | Azure OpenAI (GPT-4o) | Model, temp, tokens |
| TTS | ElevenLabs, Cartesia, Sarvam, Google | Voice, speed, style |
| Telephony | Twilio, Plivo, Exotel | Per-merchant |
| Database | PostgreSQL (asyncpg) | — |
| Cache | Redis Cluster | — |
| Observability | Langfuse, OpenTelemetry | Traces, metrics |
Database pattern
The database layer follows a strict three-tier pattern with no ORM:
| Layer | File Pattern | Responsibility |
|---|---|---|
| Queries | database/queries/*.py | Pure SQL builders — return query strings and params |
| Accessors | database/accessor/*.py | Business logic — call queries, handle transactions, apply rules |
| Decoders | database/decoder/*.py | Type conversion — asyncpg.Record → Pydantic models |
Process model
Each voice call runs in its own subprocess from a pre-warmed process pool. This design avoids the 5–6 second cold-start penalty of Python subprocess creation.
- Daily room pool — Pre-created rooms ready for immediate use via the Connect API
- Voice agent process pool — Pre-warmed Python processes awaiting pipeline assignment
- Pod isolation — 1-pod-1-call architecture via Smart Router for production scale
Observability
Built-in monitoring and tracing across the entire call lifecycle. See Observability for full details.
- Langfuse — Full LLM tracing, auto-evaluation, cost tracking, latency metrics
- OpenTelemetry — Trace context propagation across async boundaries
- Contextual Logging —
contextvars-based structured logging with correlation IDs - Slack Alerts — Automated alerting on evaluation failures
Next steps
Build your first voice agent in 5 minutes
AuthenticationJWT tokens, S2S tokens, and RBAC
Template SystemHow conversation flows are defined
Voice PipelineDeep dive into the STT → LLM → TTS pipeline