Concept

Architecture

System design, request lifecycle, and data flow of the Breeze Buddy platform.

Request lifecycle

Every voice interaction starts with a lead push and flows through a deterministic pipeline. Here is the full lifecycle:

Push Lead via API POST /leads with template & payload
Validate & Pre-Check Schema match, blacklist, pre-checks, template existence
Route by Execution Mode See Execution Modes below
Voice Pipeline STTLLMTTS real-time processing
Conversation LLM navigates template nodes via function calling
Webhook Callback Reporting webhook — transcription, data, outcome & analytics

High-Level design

Breeze Buddy is a multi-process, async-first Python application built on FastAPI.

One call, one pod. At answer-time, an external router (Agent’s Cloud) allocates a dedicated Kubernetes pod for the call and anchors the media stream to it. The pod is released when the call ends. Each call owns its runtime — a stuck LLM or a crash on one call never touches another. Horizontal scale is adding more pods; latency stays predictable under load.

Client / Phone
FastAPI Gateway
Process Pool
Pipecat Pipeline
Providers (STT/LLM/TTS)
Transport (Daily/WebSocket)

What we run for you

Breeze Buddy is AI + Infra. Alongside the LLM flow, the platform runs pools, scheduling, rate limiting, webhook delivery, recording, and observability. You push leads; the platform handles the rest.

Component map

LayerComponentDescription
APIapp/api/routers/REST endpoints — templates, leads, webhooks, Daily connect
Agentapp/ai/voice/agents/breeze_buddy/agent/Core bot — pipeline assembly, transport setup, flow execution
Template Engineapp/ai/voice/agents/breeze_buddy/template/Template loading, context building, variable resolution, node transitions
Servicesapp/ai/voice/agents/breeze_buddy/services/Telephony providers, Daily rooms, call limiter, agent router
Handlersapp/ai/voice/agents/breeze_buddy/handlers/Internal (warm transfer, end call, outcome updater) and HTTP handlers
Databaseapp/database/Async queries (SQL builders), accessors (business logic), decoders (row → model)
Coreapp/core/Static/dynamic config, logging, security, background tasks

Execution modes

The execution_mode field on each lead determines how the call is placed. The Call Execution Configuration controls scheduling, retries, and provider selection per template.

ModeTransportUse Case
DAILYDaily.co WebRTC roomsProduction web/mobile voice sessions
DAILY_TESTDaily.co WebRTC roomsPlayground / development testing
TELEPHONYTwilio / Plivo / ExotelProduction outbound & inbound calls
TELEPHONY_TESTTwilio / Plivo / ExotelTest calls (excluded from analytics)

Configuration hierarchy

Configuration follows a cascading precedence model. Lower layers override higher ones:

  1. Environment variables — ~198 static settings loaded at startup
  2. Redis / DevCycle — Dynamic feature flags, checked per-request
  3. Template-level configSTT, TTS, VAD, LLM settings
  4. Node-level overrides — Per-node VAD, input collection settings
  5. Playground overrides — Runtime config override via is_playground=true

Reset-then-Apply Pattern

On every node transition, VAD and input collection configs are reset to template defaults first, then the node-level overrides are applied. This prevents configuration bleed between nodes.

Platform stack

LayerTechnologyCustomizable
APIFastAPI + Uvicorn
Voice EnginePipecat-AI
WebRTCDaily.coRoom config, recording
STTSoniox, Deepgram, Sarvam, Google, OpenAIPer-template
LLMAzure OpenAI (GPT-4o)Model, temp, tokens
TTSElevenLabs, Cartesia, Sarvam, GoogleVoice, speed, style
TelephonyTwilio, Plivo, ExotelPer-merchant
DatabasePostgreSQL (asyncpg)
CacheRedis Cluster
ObservabilityLangfuse, OpenTelemetryTraces, metrics

Database pattern

The database layer follows a strict three-tier pattern with no ORM:

LayerFile PatternResponsibility
Queriesdatabase/queries/*.pyPure SQL builders — return query strings and params
Accessorsdatabase/accessor/*.pyBusiness logic — call queries, handle transactions, apply rules
Decodersdatabase/decoder/*.pyType conversion — asyncpg.Record → Pydantic models

Process model

Each voice call runs in its own subprocess from a pre-warmed process pool. This design avoids the 5–6 second cold-start penalty of Python subprocess creation.

  • Daily room pool — Pre-created rooms ready for immediate use via the Connect API
  • Voice agent process pool — Pre-warmed Python processes awaiting pipeline assignment
  • Pod isolation — 1-pod-1-call architecture via Smart Router for production scale

Observability

Built-in monitoring and tracing across the entire call lifecycle. See Observability for full details.

  • Langfuse — Full LLM tracing, auto-evaluation, cost tracking, latency metrics
  • OpenTelemetry — Trace context propagation across async boundaries
  • Contextual Loggingcontextvars-based structured logging with correlation IDs
  • Slack Alerts — Automated alerting on evaluation failures

Next steps

Was this helpful?