VAD and turn detection
Silero VAD parameters, turn-detection strategies, and per-node overrides.
Overview
Voice Activity Detection (VAD) determines when the user is speaking. Turn detection decides when the user has finished speaking so the bot can respond. Breeze Buddy uses the Silero VAD model combined with one of three turn-detection strategies.
VAD parameters
| Field | Type | Range | Description |
|---|---|---|---|
confidence | float | 0.0–1.0 | Minimum confidence to classify audio as speech. |
start_secs | float | ≥ 0 | Consecutive speech seconds before marking onset. |
stop_secs | float | ≥ 0 | Consecutive silence seconds before marking offset. |
min_volume | float | ≥ 0 | Audio below this volume is treated as silence. |
{
"vad_config": {
"confidence": 0.5,
"start_secs": 0.2,
"stop_secs": 0.8,
"min_volume": 0.6
}
}Tuning tips
Lower confidence catches softer speech but may false-trigger on noise. Higher stop_secs prevents premature cutoffs but adds latency. Start with defaults and tune from test calls.
Turn detection strategies
Set via stt_configuration.turn_detection:
| Strategy | Mechanism | Latency | Best With | Extra Config |
|---|---|---|---|---|
stt_native | Provider endpoint token | Lowest | Soniox | None |
smart_turn | Whisper ONNX prosody analysis | Medium | Deepgram | smart_turn |
timeout | Silent timer after last transcript | Configurable | Any | user_speech_timeout |
Smart Turn Config
| Field | Default | Description |
|---|---|---|
stop_secs | 3.0 | Max silence seconds before forcing turn stop. |
pre_speech_ms | 500.0 | Audio context (ms) before speech onset for analysis. |
max_duration_secs | 8.0 | Maximum turn duration. |
cpu_count | 1 | CPU threads for ONNX inference. |
Per-Node overrides
VAD and turn-detection follow a reset-then-apply cascade:
- Template-level
vad_configapplies as baseline. - When entering a node with its own config, template settings are reset.
- Node-level settings apply in full — no merging.
No merging
Node-level vad_config replaces the template config entirely. Include all desired fields — missing ones revert to system defaults, not the template’s values.
Full example
{
"configurations": {
"vad_config": {
"confidence": 0.5,
"start_secs": 0.2,
"stop_secs": 0.8,
"min_volume": 0.6
},
"stt_configuration": {
"provider": "deepgram",
"language": "en",
"turn_detection": "smart_turn",
"deepgram": {
"model": "nova-3-general",
"endpointing_ms": 25,
"no_delay": true
},
"smart_turn": {
"stop_secs": 3.0,
"pre_speech_ms": 500.0,
"max_duration_secs": 8.0,
"cpu_count": 1
}
}
}
}