Overview

Overview

Browser and mobile voice sessions via Daily.co WebRTC. SDK setup, connect API, recording, and RTVI events.

Prerequisites

You need a Daily.co API key configured, a deployed Breeze Buddy backend with a valid API token, and leads pushed with execution_mode set to DAILY, DAILY_TEST, or DAILY_STREAM via the Leads API.

Connection flow

Daily.co provides the WebRTC transport layer that streams audio directly between the user’s browser and Breeze Buddy’s voice pipeline — no phone network required. For PSTN-based calls, see Telephony.

Push Lead (DAILY)
POST /connect
Create Room
Generate Tokens
Spawn Bot
WebRTC Session

Execution modes

Daily sessions run under one of three execution modes. All three use the same Daily connect endpoint and the same RTVI events — they differ in what the backend runs between STT and TTS.

ModeLLM?Template flow?Use when
DAILYProduction web voice sessions. The LLM drives the conversation through your template.
DAILY_TESTSame as DAILY but excluded from analytics and playground overrides allowed. Use from the template playground.
DAILY_STREAMStream mode. Backend runs STT + TTS + transcription capture only; your client decides what the bot says via tts-speak. No LLM, no template flow, no function calls.

When to pick stream mode

Use DAILY_STREAM when your app already owns the conversation logic and just needs Breeze Buddy to handle the audio pipe:

  • Custom agents where you drive the LLM yourself (e.g. a specialised orchestrator elsewhere in your stack).
  • Scripted demos or onboarding flows where every utterance is predetermined.
  • Human-in-the-loop consoles where an operator types what the bot should say.

The client pushes the lead with execution_mode: "DAILY_STREAM", calls POST /agent/voice/breeze-buddy/connect to join the room, then calls client.sendClientMessage('tts-speak', { text: '...' }) whenever it wants the bot to speak. User speech is still transcribed and captured to the lead’s transcript — you just lose the LLM, function calls, and template flow.

Stream mode caveats

  • Template configurations for STT, TTS, VAD, turn detection, interruption mode, keyword filter, and noise filter are all honored — tune them the same way you would in agent mode.
  • LLM, function call, and user-idle events do not fire in stream mode. All other RTVI events still work.
  • tts-speak utterances queue FIFO — they play after the current bot utterance, not interrupting it. For barge-in, let the user speak (VAD handles it).
  • Max 2000 characters per tts-speak call; longer text is silently truncated.
  1. Push a lead via the Leads API with execution_mode: "DAILY" or "DAILY_TEST"
  2. Call the connect endpoint with the lead’s ID
  3. Room creation — a Daily room is created with recording enabled
  4. Token generation — separate user and bot tokens are generated
  5. Bot process spawned — the Pipecat pipeline starts in the process pool
  6. Client joins — your frontend uses the returned room_url and token

Connect endpoint

POST /agent/voice/breeze-buddy/connect

Request

{
  "lead_id": "uuid-of-the-lead"
}

Response

{
  "room_url": "https://your-domain.daily.co/room-abc123",
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "session_id": "sess_abc123",
  "lead_id": "uuid-of-the-lead"
}

Transport configuration

ParameterDaily (WebRTC)Telephony (WebSocket)
Sample Rate16000 Hz8000 Hz
Audio CodecWebRTC (Opus)μ-law / PCM
Real-Time EventsRTVI protocolWebSocket messages

Next steps

Was this helpful?