Transcripts
TranscriptEntry shape, streaming semantics, and how to consume transcripts from the session.
Transcripts stream in real time from both the user (via STT) and the assistant (via LLM / TTS). The SDK delivers them through one event — 'transcript' — and you branch on the entry’s role.
The TranscriptEntry type
A discriminated union on role:
type TranscriptEntry =
| { id: string; role: 'user'; text: string; isComplete: boolean }
| { id: string; role: 'assistant'; text: string; isComplete: boolean }
| { id: string; role: 'tool_call'; functionName: string; isComplete: boolean }; | Field | Meaning |
|---|---|
id | Stable UUID for this entry — same value across partial updates |
role | Discriminator: 'user', 'assistant', or 'tool_call' |
text | Accumulated text so far (NOT a delta — full current string) |
functionName | Only on 'tool_call' — the tool / function name being invoked |
isComplete | false while still streaming, true on the finalized version |
Streaming semantics
The same id is emitted multiple times as partial text arrives. Use it as a stable key (React / DOM) so partials update in place instead of creating new bubbles.
Example trace — user says “Hey hi, how are you?“:
{ id: 'abc', role: 'user', text: 'hey', isComplete: false }
{ id: 'abc', role: 'user', text: 'hey hi', isComplete: false }
{ id: 'abc', role: 'user', text: 'hey hi how', isComplete: false }
{ id: 'abc', role: 'user', text: 'hey hi how are', isComplete: false }
{ id: 'abc', role: 'user', text: 'hey hi how are you', isComplete: true } Same pattern applies to assistant transcripts (LLM tokens streamed in real time).
Consuming transcripts — the main way
Subscribe to the 'transcript' event and branch on role. This is the recommended pattern for all transcript handling:
session.on('transcript', (entry) => {
switch (entry.role) {
case 'user':
renderUserBubble(entry.id, entry.text, entry.isComplete);
break;
case 'assistant':
renderAssistantBubble(entry.id, entry.text, entry.isComplete);
break;
case 'tool_call':
renderToolBadge(entry.id, entry.functionName, entry.isComplete);
break;
}
}); TypeScript narrows entry inside each case — no casts needed, full type safety on text vs functionName.
Common patterns
Live chat UI
const bubbles = new Map<string, HTMLElement>();
session.on('transcript', (entry) => {
if (entry.role === 'tool_call') return; // handle tool calls separately
let bubble = bubbles.get(entry.id);
if (!bubble) {
bubble = createBubble(entry.role);
bubbles.set(entry.id, bubble);
container.append(bubble);
}
bubble.textContent = entry.text;
bubble.classList.toggle('partial', !entry.isComplete);
}); Commit-on-final (command parsing)
session.on('transcript', (entry) => {
if (entry.role !== 'user' || !entry.isComplete) return;
const text = entry.text.toLowerCase();
if (text.includes('transfer')) transferCall();
if (text.includes('hang up')) session.close();
}); Tool-call indicator
session.on('transcript', (entry) => {
if (entry.role !== 'tool_call') return;
if (entry.isComplete) {
toast(`Finished: ${entry.functionName}`);
} else {
toast(`Calling: ${entry.functionName}…`);
}
}); Accessing full history
getState().transcripts returns a cloned snapshot of all entries so far. Useful for exporting transcripts, debugging, or re-rendering after hot-reload:
const { transcripts } = session.getState();
exportAsJson(transcripts); The array is a clone — mutating it won’t affect session state.
Transcript vs speaking: pick the right one
Two closely-related but distinct streams for “what the assistant said”:
onAssistantTranscript | onAssistantSpeaking | |
|---|---|---|
| Source | LLM token stream | TTS pipeline |
| Fires when | The model is generating text | Audio is being synthesized |
| Stream mode (no LLM) | ❌ never fires | ✅ fires — the only option |
| Production / test mode | ✅ fires (earlier in the pipeline) | ✅ fires (after TTS begins) |
| Handler receives | Incremental AssistantTranscript | Discriminated { start \| chunk \| end } |
| Use for | Render the model’s response as text | Sync UI with actual audio (speaking indicator, karaoke) |
| Reflects post-processing? (PII redaction, profanity filter) | No — raw LLM output | Yes — what’s actually being heard |
Rule of thumb:
- Want what the model said →
onAssistantTranscript - Want what the user is hearing right now →
onAssistantSpeaking - In stream mode →
onAssistantSpeakingis your primary text stream (transcript doesn’t fire without an LLM)
The helper for onAssistantSpeaking lives in Speaking. The transcript helpers are below.
Single-role helpers
Handy when you only care about one role
session.on('transcript', ...) with a role filter and a type-narrowed handler signature. Use whichever feels cleaner for your code — the main pattern above is a solid default, and these are a neat shortcut when you only need one role.type Unsubscribe = () => void;
session.onUserTranscript(
(entry: UserTranscript) => void
): Unsubscribe;
session.onAssistantTranscript(
(entry: AssistantTranscript) => void
): Unsubscribe;
session.onToolCall(
(entry: ToolCallTranscript) => void
): Unsubscribe; Example:
const off = session.onUserTranscript((e) => {
if (e.isComplete) handle(e.text);
});
// later
off();