LLM configuration
Configure the language model powering your voice agent's intelligence.
Overview
The llm_configurations field in ConfigurationModel controls which language model is used and its generation parameters. Breeze Buddy currently uses Azure OpenAI with GPT-4o as the default model.
LLMConfiguration fields
| Field | Type | Default | Description |
|---|---|---|---|
model | str | gpt-4o | Azure OpenAI model deployment name. |
temperature | float | — | Controls randomness. Lower values (0.0–0.3) are more deterministic; higher values (0.7–1.0) are more creative. |
Temperature guide
| Range | Behaviour | Use Case |
|---|---|---|
0.0–0.2 | Highly deterministic, consistent | Data collection, compliance scripts |
0.3–0.5 | Balanced | Customer support, appointment reminders |
0.6–0.8 | More varied phrasing | Sales, conversational flows |
0.9–1.0 | Highly creative | Brainstorming, casual chat |
JSON example
{
"configurations": {
"llm_configurations": {
"model": "gpt-4o",
"temperature": 0.4
}
}
}Azure OpenAI
All LLM calls route through Azure OpenAI. The model field corresponds to the Azure deployment name (not the raw OpenAI model name). Your Azure deployment must be configured in the Breeze Buddy backend.
Model availability
The available models depend on your Azure OpenAI deployment configuration. GPT-4o is the default and recommended model for voice agent use cases due to its balance of quality and latency.
Observability with Langfuse
All LLM calls are automatically traced via Langfuse. This gives you visibility into:
- Cost tracking — token usage and cost per call.
- Latency — time-to-first-token and total generation time.
- Token usage — prompt tokens, completion tokens, total tokens per request.
- Request/response logs — full prompt and completion text for debugging.
Debugging prompts
Use the Langfuse dashboard to inspect the exact prompts sent to the LLM. This is invaluable for debugging unexpected agent behavior — you can see exactly what system/task messages the model received.
Per-Template configuration
Each template can specify its own LLM configuration. This allows you to use different models or temperatures for different use cases — for example, a lower temperature for compliance-heavy scripts and a higher one for sales conversations.
Compliance template — deterministic:
{
"name": "compliance-verification",
"configurations": {
"llm_configurations": {
"model": "gpt-4o",
"temperature": 0.1
}
}
}Sales template — conversational:
{
"name": "sales-outreach",
"configurations": {
"llm_configurations": {
"model": "gpt-4o",
"temperature": 0.7
}
}
}