How-to

LLM configuration

Configure the language model powering your voice agent's intelligence.

Overview

The llm_configurations field in ConfigurationModel controls which language model is used and its generation parameters. Breeze Buddy currently uses Azure OpenAI with GPT-4o as the default model.

LLMConfiguration fields

Field	Type	Default	Description
`model`	`str`	`gpt-4o`	Azure OpenAI model deployment name.
`temperature`	`float`	—	Controls randomness. Lower values (0.0–0.3) are more deterministic; higher values (0.7–1.0) are more creative.

Temperature guide

Range	Behaviour	Use Case
`0.0–0.2`	Highly deterministic, consistent	Data collection, compliance scripts
`0.3–0.5`	Balanced	Customer support, appointment reminders
`0.6–0.8`	More varied phrasing	Sales, conversational flows
`0.9–1.0`	Highly creative	Brainstorming, casual chat

JSON example

json

{
  "configurations": {
    "llm_configurations": {
      "model": "gpt-4o",
      "temperature": 0.4
    }
  }
}

Azure OpenAI

All LLM calls route through Azure OpenAI. The model field corresponds to the Azure deployment name (not the raw OpenAI model name). Your Azure deployment must be configured in the Breeze Buddy backend.

Model availability

The available models depend on your Azure OpenAI deployment configuration. GPT-4o is the default and recommended model for voice agent use cases due to its balance of quality and latency.

Observability with Langfuse

All LLM calls are automatically traced via Langfuse. This gives you visibility into:

Cost tracking — token usage and cost per call.
Latency — time-to-first-token and total generation time.
Token usage — prompt tokens, completion tokens, total tokens per request.
Request/response logs — full prompt and completion text for debugging.

Debugging prompts

Use the Langfuse dashboard to inspect the exact prompts sent to the LLM. This is invaluable for debugging unexpected agent behavior — you can see exactly what system/task messages the model received.

Per-Template configuration

Each template can specify its own LLM configuration. This allows you to use different models or temperatures for different use cases — for example, a lower temperature for compliance-heavy scripts and a higher one for sales conversations.

Compliance template — deterministic:

json

{
  "name": "compliance-verification",
  "configurations": {
    "llm_configurations": {
      "model": "gpt-4o",
      "temperature": 0.1
    }
  }
}

Sales template — conversational:

json

{
  "name": "sales-outreach",
  "configurations": {
    "llm_configurations": {
      "model": "gpt-4o",
      "temperature": 0.7
    }
  }
}

Was this helpful?

Edit on GitHub