Reference

Template configuration

The configurations object on every template — controls STT, TTS, VAD, LLM, audio, interruptions, IVR, and more.

Overview

Each template carries an optional configurations object (ConfigurationModel). All fields are optional — sensible defaults apply when omitted.

Global Defaults

Template Config

Node Override

Config cascade

Node-level config replaces (not merges with) template-level config. If a node sets its own stt_configuration, it fully overrides the template setting.

Configuration categories

Category	Key fields	Details
Speech-to-Text	`stt_configuration`	STT Config
Text-to-Speech	`tts_configuration`, `tts_configuration_overrides`, `tts_selection_config`	TTS Config
Voice Activity	`vad_config`	VAD Config
LLM	`llm_configurations`	LLM Config
Audio	`enable_background_sound`, `noise_filter`, `keyword_filter`	Audio Config
Idle Handling	`user_idle_configuration`	User Idle
Telephony	`enable_inbound`, `transfer_number`, `ivr_configuration`	Telephony

Fields reference

Field	Type	Default	Description
`stt_configuration`	`STTConfiguration`	`null`	STT provider, language, turn detection.
`tts_configuration`	`TTSConfig`	`null`	Default TTS provider + voice for the template.
`tts_configuration_overrides`	`Dict[str, TTSConfig]`	`null`	Per-provider override map. The dict key is the provider (`"cartesia"`, `"elevenlabs"`, etc.); `provider` is auto-filled.
`tts_selection_config`	`TTSSelectionConfig`	`null`	LLM-based per-utterance TTS provider selection.
`enable_background_sound`	`bool`	`false`	Mix ambient audio into the call.
`background_sound_file`	`BackgroundSoundFile`	`null`	Ambient file — e.g. `office-ambience`.
`background_sound_volume`	`float`	`2.0`	Gain multiplier for background sound.
`initial_greeting`	`str`	`null`	First bot utterance. Supports `{variable}` placeholders.
`vad_config`	`VadConfig`	`null`	Silero VAD tuning: confidence, start/stop secs.
`interruption`	`InterruptionConfig`	`null`	Interruption mode and min_words threshold.
`input_collection`	`InputCollectionConfig`	`null`	Multi-segment input collection (`enabled`, `user_speech_timeout`).
`user_idle_configuration`	`UserIdleHandlingConfig`	`null`	Idle timer: timeout, message, max retries.
`llm_configurations`	`LLMConfiguration`	`null`	LLM model and temperature.
`enable_inbound`	`bool`	`false`	Accept inbound calls on this template.
`transfer_number`	`str`	`null`	E.164 number for warm transfer. Required to let `connect_to_live_agent` succeed.
`ivr_configuration`	`IvrConfig`	`null`	IVR menu options when multiple templates share an inbound number. See IVR configuration.
`noise_filter`	`NoiseFilterConfig`	`null`	AIC-based noise enhancement on incoming audio.
`keyword_filter`	`KeywordFilterConfig`	`null`	Drop matching user transcriptions while the bot is speaking.

Playground overrides

When a lead carries metaData.playground=true (equivalent to the push field is_playground: true), the template’s configurations are replaced at runtime by whatever is in the lead’s configurations_override (stored into metaData.configurations). Use this for iterating on templates from the dashboard without modifying the stored config.

IVR configuration

ivr_configuration holds this template’s menu option when multiple templates share an inbound number. The IVR agent concatenates each template’s greeting in priority order.

Field	Type	Description
`greeting`	`str`	Menu line spoken for this option (e.g. “Press 1 or say Sales to speak with our sales team”).
`goodbye`	`str`	Message played when the caller provides no input.
`priority`	`int` (≥1)	Ordering within the menu. Lower number is spoken first.
`tts_configuration`	`TTSConfig`	Optional TTS override for IVR menu playback.

The flat fields ivr_greeting, ivr_goodbye, and ivr_priority are deprecated shorthand — prefer ivr_configuration.

Full JSON example

Complete Configuration Object

json

{
  "configurations": {
    "stt_configuration": {
      "provider": "deepgram",
      "language": "en",
      "turn_detection": "smart_turn",
      "deepgram": { "model": "nova-3-general", "endpointing_ms": 25 },
      "smart_turn": { "stop_secs": 3.0, "pre_speech_ms": 500.0 }
    },
    "tts_configuration": {
      "provider": "elevenlabs",
      "voice_id": "pFZP5JQG7iQjIQuC4Bku",
      "model_id": "eleven_turbo_v2_5",
      "language": "en"
    },
    "tts_configuration_overrides": {
      "cartesia": {
        "voice_id": "a0e99841-438c-4a64-b679-ae501e7d6091",
        "language": "en",
        "speed": 1.1
      }
    },
    "vad_config": {
      "confidence": 0.5, "start_secs": 0.2,
      "stop_secs": 0.8, "min_volume": 0.6
    },
    "interruption": { "mode": "enabled", "min_words": 2 },
    "input_collection": { "enabled": false, "user_speech_timeout": 1.5 },
    "user_idle_configuration": {
      "enabled": true, "timeout": 5.0,
      "idle_message": "Are you still there?", "max_retries": 3
    },
    "llm_configurations": { "model": "gpt-4o", "temperature": 0.7 },
    "enable_background_sound": true,
    "background_sound_file": "office-ambience",
    "initial_greeting": "Hello {customer_name}, calling from {company}.",
    "transfer_number": "+14155551234"
  }
}

Start simple

Begin with initial_greeting and tts_configuration, then layer in STT, VAD, and interruption tuning as needed.

Was this helpful?

Edit on GitHub