Template configuration
The configurations object on every template — controls STT, TTS, VAD, LLM, audio, interruptions, IVR, and more.
Overview
Each template carries an optional configurations object (ConfigurationModel). All fields are optional — sensible defaults apply when omitted.
Config cascade
Node-level config replaces (not merges with) template-level config. If a node sets its own stt_configuration, it fully overrides the template setting.
Configuration categories
| Category | Key fields | Details |
|---|---|---|
| Speech-to-Text | stt_configuration | STT Config |
| Text-to-Speech | tts_configuration, tts_configuration_overrides, tts_selection_config | TTS Config |
| Voice Activity | vad_config | VAD Config |
| LLM | llm_configurations | LLM Config |
| Audio | enable_background_sound, noise_filter, keyword_filter | Audio Config |
| Idle Handling | user_idle_configuration | User Idle |
| Telephony | enable_inbound, transfer_number, ivr_configuration | Telephony |
Fields reference
| Field | Type | Default | Description |
|---|---|---|---|
stt_configuration | STTConfiguration | null | STT provider, language, turn detection. |
tts_configuration | TTSConfig | null | Default TTS provider + voice for the template. |
tts_configuration_overrides | Dict[str, TTSConfig] | null | Per-provider override map. The dict key is the provider ("cartesia", "elevenlabs", etc.); provider is auto-filled. |
tts_selection_config | TTSSelectionConfig | null | LLM-based per-utterance TTS provider selection. |
enable_background_sound | bool | false | Mix ambient audio into the call. |
background_sound_file | BackgroundSoundFile | null | Ambient file — e.g. office-ambience. |
background_sound_volume | float | 2.0 | Gain multiplier for background sound. |
initial_greeting | str | null | First bot utterance. Supports {variable} placeholders. |
vad_config | VadConfig | null | Silero VAD tuning: confidence, start/stop secs. |
interruption | InterruptionConfig | null | Interruption mode and min_words threshold. |
input_collection | InputCollectionConfig | null | Multi-segment input collection (enabled, user_speech_timeout). |
user_idle_configuration | UserIdleHandlingConfig | null | Idle timer: timeout, message, max retries. |
llm_configurations | LLMConfiguration | null | LLM model and temperature. |
enable_inbound | bool | false | Accept inbound calls on this template. |
transfer_number | str | null | E.164 number for warm transfer. Required to let connect_to_live_agent succeed. |
ivr_configuration | IvrConfig | null | IVR menu options when multiple templates share an inbound number. See IVR configuration. |
noise_filter | NoiseFilterConfig | null | AIC-based noise enhancement on incoming audio. |
keyword_filter | KeywordFilterConfig | null | Drop matching user transcriptions while the bot is speaking. |
Playground overrides
When a lead carries metaData.playground=true (equivalent to the push field is_playground: true), the template’s configurations are replaced at runtime by whatever is in the lead’s configurations_override (stored into metaData.configurations). Use this for iterating on templates from the dashboard without modifying the stored config.
IVR configuration
ivr_configuration holds this template’s menu option when multiple templates share an inbound number. The IVR agent concatenates each template’s greeting in priority order.
| Field | Type | Description |
|---|---|---|
greeting | str | Menu line spoken for this option (e.g. “Press 1 or say Sales to speak with our sales team”). |
goodbye | str | Message played when the caller provides no input. |
priority | int (≥1) | Ordering within the menu. Lower number is spoken first. |
tts_configuration | TTSConfig | Optional TTS override for IVR menu playback. |
The flat fields ivr_greeting, ivr_goodbye, and ivr_priority are deprecated shorthand — prefer ivr_configuration.
Full JSON example
{
"configurations": {
"stt_configuration": {
"provider": "deepgram",
"language": "en",
"turn_detection": "smart_turn",
"deepgram": { "model": "nova-3-general", "endpointing_ms": 25 },
"smart_turn": { "stop_secs": 3.0, "pre_speech_ms": 500.0 }
},
"tts_configuration": {
"provider": "elevenlabs",
"voice_id": "pFZP5JQG7iQjIQuC4Bku",
"model_id": "eleven_turbo_v2_5",
"language": "en"
},
"tts_configuration_overrides": {
"cartesia": {
"voice_id": "a0e99841-438c-4a64-b679-ae501e7d6091",
"language": "en",
"speed": 1.1
}
},
"vad_config": {
"confidence": 0.5, "start_secs": 0.2,
"stop_secs": 0.8, "min_volume": 0.6
},
"interruption": { "mode": "enabled", "min_words": 2 },
"input_collection": { "enabled": false, "user_speech_timeout": 1.5 },
"user_idle_configuration": {
"enabled": true, "timeout": 5.0,
"idle_message": "Are you still there?", "max_retries": 3
},
"llm_configurations": { "model": "gpt-4o", "temperature": 0.7 },
"enable_background_sound": true,
"background_sound_file": "office-ambience",
"initial_greeting": "Hello {customer_name}, calling from {company}.",
"transfer_number": "+14155551234"
}
}Start simple
Begin with initial_greeting and tts_configuration, then layer in STT, VAD, and interruption tuning as needed.