Overview
Browser and mobile voice sessions via Daily.co WebRTC. SDK setup, connect API, recording, and RTVI events.
Prerequisites
Connection flow
Daily.co provides the WebRTC transport layer that streams audio directly between the user’s browser and Breeze Buddy’s voice pipeline — no phone network required. For PSTN-based calls, see Telephony.
Execution modes
Daily sessions run under one of three execution modes. All three use the same Daily connect endpoint and the same RTVI events — they differ in what the backend runs between STT and TTS.
| Mode | LLM? | Template flow? | Use when |
|---|---|---|---|
DAILY | ✓ | ✓ | Production web voice sessions. The LLM drives the conversation through your template. |
DAILY_TEST | ✓ | ✓ | Same as DAILY but excluded from analytics and playground overrides allowed. Use from the template playground. |
DAILY_STREAM | — | — | Stream mode. Backend runs STT + TTS + transcription capture only; your client decides what the bot says via tts-speak. No LLM, no template flow, no function calls. |
When to pick stream mode
Use DAILY_STREAM when your app already owns the conversation logic and just needs Breeze Buddy to handle the audio pipe:
- Custom agents where you drive the LLM yourself (e.g. a specialised orchestrator elsewhere in your stack).
- Scripted demos or onboarding flows where every utterance is predetermined.
- Human-in-the-loop consoles where an operator types what the bot should say.
The client pushes the lead with execution_mode: "DAILY_STREAM", calls POST /agent/voice/breeze-buddy/connect to join the room, then calls client.sendClientMessage('tts-speak', { text: '...' }) whenever it wants the bot to speak. User speech is still transcribed and captured to the lead’s transcript — you just lose the LLM, function calls, and template flow.
Stream mode caveats
- Template
configurationsfor STT, TTS, VAD, turn detection, interruption mode, keyword filter, and noise filter are all honored — tune them the same way you would in agent mode. - LLM, function call, and user-idle events do not fire in stream mode. All other RTVI events still work.
tts-speakutterances queue FIFO — they play after the current bot utterance, not interrupting it. For barge-in, let the user speak (VAD handles it).- Max 2000 characters per
tts-speakcall; longer text is silently truncated.
- Push a lead via the Leads API with
execution_mode: "DAILY"or"DAILY_TEST" - Call the connect endpoint with the lead’s ID
- Room creation — a Daily room is created with recording enabled
- Token generation — separate user and bot tokens are generated
- Bot process spawned — the Pipecat pipeline starts in the process pool
- Client joins — your frontend uses the returned
room_urlandtoken
Connect endpoint
/agent/voice/breeze-buddy/connect Request
{
"lead_id": "uuid-of-the-lead"
} Response
{
"room_url": "https://your-domain.daily.co/room-abc123",
"token": "eyJhbGciOiJIUzI1NiIs...",
"session_id": "sess_abc123",
"lead_id": "uuid-of-the-lead"
} Transport configuration
| Parameter | Daily (WebRTC) | Telephony (WebSocket) |
|---|---|---|
| Sample Rate | 16000 Hz | 8000 Hz |
| Audio Codec | WebRTC (Opus) | μ-law / PCM |
| Real-Time Events | RTVI protocol | WebSocket messages |
Next steps
- Web SDK Setup — install the Pipecat client SDK
- RTVI Events — real-time transcription and bot events
- Recording — cloud recording configuration