How-to

Web SDK setup

Step-by-step guide to integrating the Pipecat client SDK with Daily transport in your frontend application.

Install dependencies

Install the Pipecat client SDK and Daily transport packages:

terminal

bash

# npm
npm install @pipecat-ai/client-js @pipecat-ai/daily-transport @daily-co/daily-js
# pnpm
pnpm add @pipecat-ai/client-js @pipecat-ai/daily-transport @daily-co/daily-js
# yarn
yarn add @pipecat-ai/client-js @pipecat-ai/daily-transport @daily-co/daily-js

TypeScript Support

All packages ship with built-in TypeScript declarations. No additional @types/* packages are needed.

Initialize the client

Create a PipecatClient with DailyTransport and register your event callbacks:

voice-client.ts

typescript

import { PipecatClient } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
//
const client = new PipecatClient({
  transport: new DailyTransport(),
  enableMic: false,
  enableCam: false,
  callbacks: {
    onConnected: () => client.enableMic(true),
    onUserTranscript: (data) => {
      if (data.final) appendToTranscript('user', data.text);
    },
    onBotLlmText: (data) => appendToStream('bot', data.text),
    onBotStoppedSpeaking: () => finaliseStream('bot'),
    onDisconnected: () => showStatus('ended')
  }
});

Option	Type	Description
`transport`	`Transport`	Transport instance — use `DailyTransport` for WebRTC
`enableMic`	`boolean`	Start with the mic off; enable it on `onConnected` so the bot’s greeting plays cleanly
`enableCam`	`boolean`	Always `false` for voice-only sessions
`callbacks`	`PipecatClientCallbacks`	Event handlers (full list in RTVI events)

Connect to a session

Push a lead via the Leads API, then hand the returned lead_id to startBotAndConnect. The Breeze Buddy backend does the Daily-room provisioning + bot spawn and responds with credentials that the SDK uses to join the room.

session.ts

typescript

// Request mic permission once before the first connect.
await client.initDevices();
//
const endpoint = 'https://your-api.example.com/agent/voice/breeze-buddy/connect';
const headers = new Headers({
  Authorization: `Bearer ${apiToken}`
});
//
await client.startBotAndConnect({
  endpoint,
  headers,
  requestData: { lead_id: 'your-lead-id' }
});
//
// …later, when the conversation is complete:
await client.disconnect();

Field	Required	Description
`endpoint`	Yes	Absolute URL of `POST /agent/voice/breeze-buddy/connect` on your Breeze Buddy backend.
`headers`	Yes	Pass your API token here (`Authorization: Bearer …`). See Authentication.
`requestData.lead_id`	Yes	The `lead_call_tracker_id` returned by the Leads API.

Execution mode lives on the lead, not the client

The client always calls the same endpoint. The bot’s behaviour is decided by the lead’s execution_mode (DAILY, DAILY_TEST, or DAILY_STREAM) — see Build with Daily — Execution modes.

Server-proxied connect

The example above puts a Breeze Buddy API token in the browser. If you’d rather keep your S2S token on your server — and have your server own lead creation too — point startBotAndConnect at your backend instead. Your server calls Breeze Buddy with its S2S token and relays the Daily credentials back.

Browser: startBotAndConnect(endpoint: /api/voice/start)

Your server: POST /leads (S2S token)

Your server: POST /connect (S2S token)

Your server: relay { room_url, token }

Browser joins Daily room

api/voice/start.server.ts

typescript

// Server endpoint the browser hits. Uses your S2S token; the browser never sees it.
export async function POST({ request }) {
  const { agent_type } = await request.json();
  const agent = getAgent(agent_type); // your lookup: { template, payload }
  //
  const lead = await fetch(`${BACKEND_URL}/agent/voice/breeze-buddy/leads`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${process.env.BREEZE_S2S_TOKEN}`
    },
    body: JSON.stringify({
      reseller_id: 'your-reseller',
      merchant_id: 'your-merchant',
      template: agent.template,
      payload: agent.payload,
      execution_mode: 'DAILY_TEST'
    })
  }).then((r) => r.json());
  //
  const connect = await fetch(`${BACKEND_URL}/agent/voice/breeze-buddy/connect`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${process.env.BREEZE_S2S_TOKEN}`
    },
    body: JSON.stringify({ lead_id: lead.lead_call_tracker_id })
  }).then((r) => r.json());
  //
  // The browser's PipecatClient expects { room_url, token } in the body.
  return Response.json(connect);
}

client.ts

typescript

// Point the SDK at your server; no Authorization header from the browser.
await client.initDevices();
await client.startBotAndConnect({
  endpoint: '/api/voice/start',
  requestData: { agent_type: 'appointment-reminder' }
});

When to pick this pattern

Use server-proxied connect when any of these apply: you can’t expose an API token in the browser; the lead payload is sensitive or derived server-side; you want a single integration seam (your own /api/voice/start) instead of two Breeze Buddy endpoints. Direct connect is simpler when the browser already has a short-lived scoped token.

Event handling

Callbacks are registered in the constructor (above). The full catalog — user/bot speech, LLM tokens, function calls, metrics, errors, custom server messages — lives in RTVI events. A couple of common ones:

events.ts

typescript

// Inside the PipecatClient(...) callbacks object:
onUserTranscript: (data) => {
  if (data.final) appendToTranscript('user', data.text);
  else renderInterim(data.text);
},
onBotLlmText: (data) => appendToStream('bot', data.text),
onBotStoppedSpeaking: () => finaliseStream('bot'),
onLLMFunctionCall: (data) => {
  if (data.function_name === 'appointment_confirmed') {
    showConfirmation(data.arguments);
  }
},
onTransportStateChanged: (state) => {
  if (state === 'error') showError('Connection lost');
}

Sending messages to the bot

Client → server messages go through sendClientMessage(type, data). Today Breeze Buddy handles a single message type end-to-end: tts-speak, available in DAILY_STREAM mode.

client-messages.ts

typescript

// Tell the bot to speak a specific utterance (stream mode only).
await client.sendClientMessage('tts-speak', {
  text: 'Hello! Let me confirm your appointment details.'
});
//
// Chain multiple — they queue FIFO in the pipeline.
await client.sendClientMessage('tts-speak', { text: 'I see you booked April 15.' });
await client.sendClientMessage('tts-speak', { text: 'Shall I go ahead and confirm?' });

Rules of the road

Only works when the lead’s execution_mode is DAILY_STREAM. In agent modes (DAILY, DAILY_TEST) the LLM owns what the bot says — the message is ignored.
Queues after the current utterance (no interruption). For barge-in, speak into the mic — VAD will handle it.
Max 2000 characters per call; longer strings are silently truncated server-side.
Each tts-speak fires the normal onBotTts*, onBotStartedSpeaking, and onBotStoppedSpeaking events — reflect the spoken text in your transcript UI from those listeners.

See the full event reference for details.

Audio visualization

The remote bot audio track arrives on onTrackStarted — attach an AudioContext analyser to it for waveform or volume visuals:

visualizer.ts

typescript

// Inside the PipecatClient callbacks object:
onTrackStarted: (track, participant) => {
  if (track.kind !== 'audio' || participant?.local) return;
  const ctx = new AudioContext();
  const source = ctx.createMediaStreamSource(new MediaStream([track]));
  const analyser = ctx.createAnalyser();
  analyser.fftSize = 256;
  source.connect(analyser);
  const bins = new Uint8Array(analyser.frequencyBinCount);
  const draw = () => {
    analyser.getByteFrequencyData(bins);
    renderBars(bins);
    requestAnimationFrame(draw);
  };
  draw();
}

Error handling

Handle network drops, permission denials, and pipeline errors via the transport-state and custom-message channels:

error-handling.ts

typescript

// Inside the PipecatClient callbacks object:
onTransportStateChanged: (state) => {
  if (state === 'error') showError('Unable to connect. Check your network.');
},
onDisconnected: () => {
  if (!intentionalDisconnect) {
    showError('Connection lost. Please refresh to reconnect.');
  }
},
onServerMessage: (msg) => {
  if (msg.type === 'pipeline-error') {
    showError('The voice assistant encountered an error. Reconnecting...');
    setTimeout(() => reconnect(currentLeadId), 2000);
  }
}

Mic permission denied

Browser microphone denials surface from initDevices(). Catch the rejection from the initDevices() / startBotAndConnect() await to prompt the user.

React integration

Example React hook wrapping the client lifecycle:

useVoiceSession.tsx

tsx

import { useEffect, useRef, useState, useCallback } from 'react';
import { PipecatClient } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
//
type Status = 'idle' | 'connecting' | 'active' | 'ended' | 'error';
//
export function useVoiceSession(baseUrl: string, apiToken: string) {
  const clientRef = useRef<PipecatClient | null>(null);
  const [status, setStatus] = useState<Status>('idle');
  const [transcript, setTranscript] = useState<string[]>([]);
  //
  useEffect(() => {
    const client = new PipecatClient({
      transport: new DailyTransport(),
      enableMic: false,
      enableCam: false,
      callbacks: {
        onConnected: () => { client.enableMic(true); setStatus('active'); },
        onDisconnected: () => setStatus('ended'),
        onTransportStateChanged: (s) => { if (s === 'error') setStatus('error'); },
        onUserTranscript: (d) => {
          if (d.final) setTranscript((t) => [...t, `You: ${d.text}`]);
        },
        onBotLlmText: (d) => {
          setTranscript((t) => [...t, `Bot: ${d.text}`]);
        }
      }
    });
    clientRef.current = client;
    return () => { client.disconnect(); };
  }, [baseUrl]);
  //
  const connect = useCallback(async (leadId: string) => {
    setStatus('connecting');
    await clientRef.current?.initDevices();
    await clientRef.current?.startBotAndConnect({
      endpoint: `${baseUrl}/agent/voice/breeze-buddy/connect`,
      headers: new Headers({ Authorization: `Bearer ${apiToken}` }),
      requestData: { lead_id: leadId }
    });
  }, [baseUrl, apiToken]);
  //
  const disconnect = useCallback(async () => {
    await clientRef.current?.disconnect();
  }, []);
  //
  return { status, transcript, connect, disconnect };
}

Vanilla JavaScript

For non-framework projects, the same SDK works directly:

index.html

html

<script type="module">
  import { PipecatClient } from 'https://esm.sh/@pipecat-ai/client-js';
  import { DailyTransport } from 'https://esm.sh/@pipecat-ai/daily-transport';
  //
  const client = new PipecatClient({
    transport: new DailyTransport(),
    enableMic: false,
    enableCam: false,
    callbacks: {
      onConnected: () => client.enableMic(true),
      onUserTranscript: (d) => {
        if (!d.final) return;
        document.getElementById('transcript').innerHTML +=
          '<p><strong>User:</strong> ' + d.text + '</p>';
      },
      onBotLlmText: (d) => {
        document.getElementById('transcript').innerHTML +=
          '<p><strong>Bot:</strong> ' + d.text + '</p>';
      }
    }
  });
  //
  document.getElementById('connect-btn').onclick = async () => {
    await client.initDevices();
    await client.startBotAndConnect({
      endpoint: 'https://your-api.example.com/agent/voice/breeze-buddy/connect',
      headers: new Headers({ Authorization: 'Bearer ' + apiToken }),
      requestData: { lead_id: document.getElementById('lead-input').value }
    });
  };
  document.getElementById('disconnect-btn').onclick = () => client.disconnect();
</script>

Mobile Considerations

iOS Safari requires a user gesture (tap) before AudioContext can start. Call client.initDevices() + client.startBotAndConnect() inside a click/tap handler, not on page load.

Android WebView — ensure WebSettings.setMediaPlaybackRequiresUserGesture(false) is set and microphone permissions are granted via onPermissionRequest.

Background audio — mobile browsers may suspend WebRTC connections when the app is backgrounded. Listen for visibilitychange events and notify users if the session may be interrupted.

Next steps

Daily connect API Endpoint reference for starting a session.RTVI events Event catalogue for real-time UI updates.Daily recording Enable cloud recording for sessions.

Was this helpful?

Edit on GitHub