How-to

LLM configuration

Configure the language model powering your voice agent's intelligence.

Overview

The llm_configurations field in ConfigurationModel controls which language model is used and its generation parameters. Breeze Buddy currently uses Azure OpenAI with GPT-4o as the default model.

LLMConfiguration fields

FieldTypeDefaultDescription
modelstrgpt-4oAzure OpenAI model deployment name.
temperaturefloatControls randomness. Lower values (0.0–0.3) are more deterministic; higher values (0.7–1.0) are more creative.

Temperature guide

RangeBehaviourUse Case
0.0–0.2Highly deterministic, consistentData collection, compliance scripts
0.3–0.5BalancedCustomer support, appointment reminders
0.6–0.8More varied phrasingSales, conversational flows
0.9–1.0Highly creativeBrainstorming, casual chat

JSON example

json
{
  "configurations": {
    "llm_configurations": {
      "model": "gpt-4o",
      "temperature": 0.4
    }
  }
}

Azure OpenAI

All LLM calls route through Azure OpenAI. The model field corresponds to the Azure deployment name (not the raw OpenAI model name). Your Azure deployment must be configured in the Breeze Buddy backend.

Model availability

The available models depend on your Azure OpenAI deployment configuration. GPT-4o is the default and recommended model for voice agent use cases due to its balance of quality and latency.

Observability with Langfuse

All LLM calls are automatically traced via Langfuse. This gives you visibility into:

  • Cost tracking — token usage and cost per call.
  • Latency — time-to-first-token and total generation time.
  • Token usage — prompt tokens, completion tokens, total tokens per request.
  • Request/response logs — full prompt and completion text for debugging.

Debugging prompts

Use the Langfuse dashboard to inspect the exact prompts sent to the LLM. This is invaluable for debugging unexpected agent behavior — you can see exactly what system/task messages the model received.

Per-Template configuration

Each template can specify its own LLM configuration. This allows you to use different models or temperatures for different use cases — for example, a lower temperature for compliance-heavy scripts and a higher one for sales conversations.

Compliance template — deterministic:

json
{
  "name": "compliance-verification",
  "configurations": {
    "llm_configurations": {
      "model": "gpt-4o",
      "temperature": 0.1
    }
  }
}

Sales template — conversational:

json
{
  "name": "sales-outreach",
  "configurations": {
    "llm_configurations": {
      "model": "gpt-4o",
      "temperature": 0.7
    }
  }
}
Was this helpful?