Build with Daily — Breeze Buddy Docs

Overview

Browser and mobile voice sessions via Daily.co WebRTC. SDK setup, connect API, recording, and RTVI events.

Prerequisites

You need a Daily.co API key configured, a deployed Breeze Buddy backend with a valid API token, and leads pushed with execution_mode set to DAILY, DAILY_TEST, or DAILY_STREAM via the Leads API.

Connection flow

Daily.co provides the WebRTC transport layer that streams audio directly between the user’s browser and Breeze Buddy’s voice pipeline — no phone network required. For PSTN-based calls, see Telephony.

Push Lead (DAILY)

POST /connect

Create Room

Generate Tokens

Spawn Bot

WebRTC Session

Execution modes

Daily sessions run under one of three execution modes. All three use the same Daily connect endpoint and the same RTVI events — they differ in what the backend runs between STT and TTS.

Mode	LLM?	Template flow?	Use when
`DAILY`	✓	✓	Production web voice sessions. The LLM drives the conversation through your template.
`DAILY_TEST`	✓	✓	Same as `DAILY` but excluded from analytics and playground overrides allowed. Use from the template playground.
`DAILY_STREAM`	—	—	Stream mode. Backend runs STT + TTS + transcription capture only; your client decides what the bot says via `tts-speak`. No LLM, no template flow, no function calls.

When to pick stream mode

Use DAILY_STREAM when your app already owns the conversation logic and just needs Breeze Buddy to handle the audio pipe:

Custom agents where you drive the LLM yourself (e.g. a specialised orchestrator elsewhere in your stack).
Scripted demos or onboarding flows where every utterance is predetermined.
Human-in-the-loop consoles where an operator types what the bot should say.

The client pushes the lead with execution_mode: "DAILY_STREAM", calls POST /agent/voice/breeze-buddy/connect to join the room, then calls client.sendClientMessage('tts-speak', { text: '...' }) whenever it wants the bot to speak. User speech is still transcribed and captured to the lead’s transcript — you just lose the LLM, function calls, and template flow.

Stream mode caveats

Template configurations for STT, TTS, VAD, turn detection, interruption mode, keyword filter, and noise filter are all honored — tune them the same way you would in agent mode.
LLM, function call, and user-idle events do not fire in stream mode. All other RTVI events still work.
tts-speak utterances queue FIFO — they play after the current bot utterance, not interrupting it. For barge-in, let the user speak (VAD handles it).
Max 2000 characters per tts-speak call; longer text is silently truncated.

Push a lead via the Leads API with execution_mode: "DAILY" or "DAILY_TEST"
Call the connect endpoint with the lead’s ID
Room creation — a Daily room is created with recording enabled
Token generation — separate user and bot tokens are generated
Bot process spawned — the Pipecat pipeline starts in the process pool
Client joins — your frontend uses the returned room_url and token

Connect endpoint

POST /agent/voice/breeze-buddy/connect Creates a Daily room, spawns the bot, and returns connection credentials.

Request

{
  "lead_id": "uuid-of-the-lead"
}

Response

{
  "room_url": "https://your-domain.daily.co/room-abc123",
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "session_id": "sess_abc123",
  "lead_id": "uuid-of-the-lead"
}

Transport configuration

Parameter	Daily (WebRTC)	Telephony (WebSocket)
Sample Rate	`16000 Hz`	`8000 Hz`
Audio Codec	WebRTC (Opus)	μ-law / PCM
Real-Time Events	RTVI protocol	WebSocket messages

Next steps

Web SDK Setup — install the Pipecat client SDK
RTVI Events — real-time transcription and bot events
Recording — cloud recording configuration

Was this helpful?

Edit on GitHub