Langfuse auto-evaluation
Background loop that polls Langfuse for failing LLM evaluator scores and sends deduplicated Slack alerts.
Breeze Buddy runs LLM-as-a-judge evaluators against Langfuse traces (outcome correctness, transcript quality, etc.). When an evaluator scores zero, a background loop notices and posts a Slack alert with the trace, recording, and call metadata. The loop uses a Redis distributed lock so only one pod fires a given alert, and a deduplication TTL so a flaky call isn’t paged 10 times.
Prerequisites
- Langfuse credentials (
LANGFUSE_SECRET_KEY,LANGFUSE_PUBLIC_KEY,LANGFUSE_BASEURL). - Redis reachable for distributed locking and deduplication.
- A Slack incoming webhook (
SLACK_WEBHOOK_URL). - Comma-separated evaluator names configured in
LANGFUSE_EVALUATORS.
How it works
Every SCORE_CHECK_INTERVAL_SECONDS (default 600), one pod acquires a Redis lock and polls Langfuse for zero-score entries in the last 10 minutes. For each zero score:
- Extract
call_sidfrom trace metadata. - Check a Redis dedup key (
call_sid:<evaluator>). If the key exists, skip — already alerted within the TTL. - Otherwise, post a Slack alert with the trace URL, recording URL, merchant, outcome, and failure reason. Set the dedup key with a 3600-second TTL.
Overlapping windows (10-minute check interval, 10-minute lookback) guarantee no missed scores even if a pod crashes mid-cycle.
Key settings
| Env var | Default | Controls |
|---|---|---|
ENABLE_SCORE_MONITORING_LOOP | true | Master switch. Turn off when a Langfuse outage causes noisy alerts. |
SCORE_CHECK_INTERVAL_SECONDS | 600 | Seconds between polls. Also the lock TTL. |
LANGFUSE_EVALUATORS | — | Comma-separated, case-sensitive evaluator names. |
SLACK_WEBHOOK_URL | — | Incoming webhook URL for alert posts. |
Example config
ENABLE_SCORE_MONITORING_LOOP=true
SCORE_CHECK_INTERVAL_SECONDS=600
LANGFUSE_EVALUATORS="breeze buddy outcome correctness,transcript_quality"
LANGFUSE_SECRET_KEY=sk-lf-xxx
LANGFUSE_PUBLIC_KEY=pk-lf-xxx
LANGFUSE_BASEURL=https://your-langfuse.example.com
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx
REDIS_HOST=localhost
REDIS_PORT=6379Operational notes
- Redis unavailable — the loop skips that cycle (fail-safe). The overlapping window catches the missed score on the next run.
- Langfuse API timeout / 401 — logs the error and retries next cycle. No alerts sent.
- Slack webhook failure — logs and retries. The dedup key is not set, so the alert goes out on the next cycle if the webhook recovers.
- Missing
call_sidin trace metadata — alert still fires, but cannot deduplicate. You may see the same alert across multiple cycles. Fix the trace instrumentation upstream. - Pod crash mid-check — the lock auto-expires after
SCORE_CHECK_INTERVAL_SECONDS. The next cycle resumes cleanly.
When evaluator names change
Evaluator names are case-sensitive and matched exactly against the names in Langfuse. Update LANGFUSE_EVALUATORS whenever you rename an evaluator in Langfuse or add a new one.