Concept

Architecture

System design, request lifecycle, and data flow of the Breeze Buddy platform.

Request lifecycle

Every voice interaction starts with a lead push and flows through a deterministic pipeline. Here is the full lifecycle:

Push Lead via API POST /leads with template & payload

Validate & Pre-Check Schema match, blacklist, pre-checks, template existence

Route by Execution Mode See Execution Modes below

Daily Room

Phone Call

Voice Pipeline STT → LLM → TTS real-time processing

Conversation LLM navigates template nodes via function calling

Webhook Callback Reporting webhook — transcription, data, outcome & analytics

High-Level design

Breeze Buddy is a multi-process, async-first Python application built on FastAPI.

One call, one pod. At answer-time, an external router (Agent’s Cloud) allocates a dedicated Kubernetes pod for the call and anchors the media stream to it. The pod is released when the call ends. Each call owns its runtime — a stuck LLM or a crash on one call never touches another. Horizontal scale is adding more pods; latency stays predictable under load.

Client / Phone

FastAPI Gateway

Process Pool

Pipecat Pipeline

Providers (STT/LLM/TTS)

Transport (Daily/WebSocket)

What we run for you

Breeze Buddy is AI + Infra. Alongside the LLM flow, the platform runs pools, scheduling, rate limiting, webhook delivery, recording, and observability. You push leads; the platform handles the rest.

Component map

Layer	Component	Description
API	`app/api/routers/`	REST endpoints — templates, leads, webhooks, Daily connect
Agent	`app/ai/voice/agents/breeze_buddy/agent/`	Core bot — pipeline assembly, transport setup, flow execution
Template Engine	`app/ai/voice/agents/breeze_buddy/template/`	Template loading, context building, variable resolution, node transitions
Services	`app/ai/voice/agents/breeze_buddy/services/`	Telephony providers, Daily rooms, call limiter, agent router
Handlers	`app/ai/voice/agents/breeze_buddy/handlers/`	Internal (warm transfer, end call, outcome updater) and HTTP handlers
Database	`app/database/`	Async queries (SQL builders), accessors (business logic), decoders (row → model)
Core	`app/core/`	Static/dynamic config, logging, security, background tasks

Execution modes

The execution_mode field on each lead determines how the call is placed. The Call Execution Configuration controls scheduling, retries, and provider selection per template.

Mode	Transport	Use Case
`DAILY`	Daily.co WebRTC rooms	Production web/mobile voice sessions
`DAILY_TEST`	Daily.co WebRTC rooms	Playground / development testing
`TELEPHONY`	Twilio / Plivo / Exotel	Production outbound & inbound calls
`TELEPHONY_TEST`	Twilio / Plivo / Exotel	Test calls (excluded from analytics)

Configuration hierarchy

Configuration follows a cascading precedence model. Lower layers override higher ones:

Environment variables — ~198 static settings loaded at startup
Redis / DevCycle — Dynamic feature flags, checked per-request
Template-level config — STT, TTS, VAD, LLM settings
Node-level overrides — Per-node VAD, input collection settings
Playground overrides — Runtime config override via is_playground=true

Reset-then-Apply Pattern

On every node transition, VAD and input collection configs are reset to template defaults first, then the node-level overrides are applied. This prevents configuration bleed between nodes.

Platform stack

Layer	Technology	Customizable
API	FastAPI + Uvicorn	—
Voice Engine	Pipecat-AI	—
WebRTC	Daily.co	Room config, recording
STT	Soniox, Deepgram, Sarvam, Google, OpenAI	Per-template
LLM	Azure OpenAI (GPT-4o)	Model, temp, tokens
TTS	ElevenLabs, Cartesia, Sarvam, Google	Voice, speed, style
Telephony	Twilio, Plivo, Exotel	Per-merchant
Database	PostgreSQL (asyncpg)	—
Cache	Redis Cluster	—
Observability	Langfuse, OpenTelemetry	Traces, metrics

Database pattern

The database layer follows a strict three-tier pattern with no ORM:

Layer	File Pattern	Responsibility
Queries	`database/queries/*.py`	Pure SQL builders — return query strings and params
Accessors	`database/accessor/*.py`	Business logic — call queries, handle transactions, apply rules
Decoders	`database/decoder/*.py`	Type conversion — `asyncpg.Record` → Pydantic models

Process model

Each voice call runs in its own subprocess from a pre-warmed process pool. This design avoids the 5–6 second cold-start penalty of Python subprocess creation.

Daily room pool — Pre-created rooms ready for immediate use via the Connect API
Voice agent process pool — Pre-warmed Python processes awaiting pipeline assignment
Pod isolation — 1-pod-1-call architecture via Smart Router for production scale

Observability

Built-in monitoring and tracing across the entire call lifecycle. See Observability for full details.

Langfuse — Full LLM tracing, auto-evaluation, cost tracking, latency metrics
OpenTelemetry — Trace context propagation across async boundaries
Contextual Logging — contextvars-based structured logging with correlation IDs
Slack Alerts — Automated alerting on evaluation failures

Next steps

Quick Start

Build your first voice agent in 5 minutes

Authentication

JWT tokens, S2S tokens, and RBAC

Template System

How conversation flows are defined

Voice Pipeline

Deep dive into the STT → LLM → TTS pipeline

Was this helpful?

Edit on GitHub