How-to

TTS (text-to-speech)

Pick a TTS provider, configure the voice, and override per-language or per-provider with tts_configuration_overrides.

Overview

Breeze Buddy synthesises bot speech with one of four providers: ElevenLabs (default), Cartesia, Sarvam, and Google. Pick one by setting tts_configuration. If you need per-language or per-provider variants on the same template, add tts_configuration_overrides.

Top-level shape

Inside configurations:

FieldTypeRequiredDescription
tts_configurationTTSConfigYesThe default TTS provider and voice for the template.
tts_configuration_overridesDict[str, TTSConfig]NoPer-provider override map. The dict key (e.g. "cartesia") auto-fills the entry’s provider field, so each override can target a specific provider without repeating it.
tts_selection_configTTSSelectionConfigNoLLM-based per-utterance provider selection. Use only when genuinely multilingual.

TTSConfig fields

FieldTypeProviderDescription
providerTTSProviderallelevenlabs, cartesia, sarvam, or google.
voice_idstrallProvider-specific voice identifier.
model_idstrElevenLabsModel, e.g. eleven_turbo_v2_5, eleven_flash_v2_5.
speedfloatallSpeech rate. Ranges vary — ElevenLabs 0.7–1.2, Cartesia 0.6–1.5.
volumefloatCartesia0.5–2.0.
emotionlist[str]CartesiaEmotion tags, e.g. ["positivity:high", "curiosity"].
languagestrallBCP-47 language code.

Picking a provider

FeatureElevenLabsCartesiaSarvamGoogle
Best forMultilingual, natural prosodyExpressive English, low-latencyIndian languagesBroad language coverage
Emotion controlNoYes (emotion tags)NoLimited
Volume controlNoYes (0.5–2.0)NoLimited
Model selectionYes (model_id)NoNoNo

Example — default ElevenLabs

tts-elevenlabs.json
json
{
  "configurations": {
    "tts_configuration": {
      "provider": "elevenlabs",
      "voice_id": "pFZP5JQG7iQjIQuC4Bku",
      "model_id": "eleven_turbo_v2_5",
      "speed": 1.05,
      "language": "en"
    }
  }
}

Example — Cartesia with emotion

tts-cartesia.json
json
{
  "configurations": {
    "tts_configuration": {
      "provider": "cartesia",
      "voice_id": "a0e99841-438c-4a64-b679-ae501e7d6091",
      "speed": 1.1,
      "volume": 1.2,
      "emotion": ["positivity:high", "curiosity"],
      "language": "en"
    }
  }
}

Example — per-provider overrides

Use tts_configuration_overrides when you want the same template to produce different voices depending on which provider is chosen at runtime (e.g. via tts_selection_config). The dict key is the provider, so you don’t repeat it inside the config.

tts-overrides.json
json
{
  "configurations": {
    "tts_configuration": {
      "provider": "elevenlabs",
      "voice_id": "pFZP5JQG7iQjIQuC4Bku",
      "model_id": "eleven_turbo_v2_5",
      "language": "en"
    },
    "tts_configuration_overrides": {
      "cartesia": {
        "voice_id": "a0e99841-438c-4a64-b679-ae501e7d6091",
        "language": "en",
        "speed": 1.1
      },
      "sarvam": {
        "voice_id": "anushka",
        "language": "hi-IN"
      }
    }
  }
}

LLM-based TTS selection

tts_selection_config lets a Gemini LLM pick the provider per utterance. It adds latency — only use when you need multilingual switching.

FieldTypeDescription
enabledboolMaster switch.
promptstrGuidance for the LLM picking the provider.
providerslist[TTSProvider]Candidate providers.
Multilingual TTS Selection
json
{
  "configurations": {
    "tts_configuration": {
      "provider": "elevenlabs",
      "voice_id": "pFZP5JQG7iQjIQuC4Bku",
      "model_id": "eleven_turbo_v2_5",
      "language": "en"
    },
    "tts_configuration_overrides": {
      "cartesia": {
        "voice_id": "a0e99841-438c-4a64-b679-ae501e7d6091",
        "language": "en"
      }
    },
    "tts_selection_config": {
      "enabled": true,
      "prompt": "Use cartesia for English, elevenlabs for anything else.",
      "providers": ["cartesia", "elevenlabs"]
    }
  }
}

Latency trade-off

LLM-based selection adds a small latency overhead per utterance. Use only when you need multilingual provider switching.

Best practices

  • Start with tts_configuration pointing at one provider. Add tts_configuration_overrides only if you genuinely need different voices per provider.
  • Fine-tune speed and volume from test calls — defaults are neutral but rarely perfect.
  • Use Cartesia emotion tags for explicit expressiveness.
  • Enable tts_selection_config only for multilingual or multi-accent flows.

Next steps

Was this helpful?