How-to

Langfuse auto-evaluation

Background loop that polls Langfuse for failing LLM evaluator scores and sends deduplicated Slack alerts.

Breeze Buddy runs LLM-as-a-judge evaluators against Langfuse traces (outcome correctness, transcript quality, etc.). When an evaluator scores zero, a background loop notices and posts a Slack alert with the trace, recording, and call metadata. The loop uses a Redis distributed lock so only one pod fires a given alert, and a deduplication TTL so a flaky call isn’t paged 10 times.

Prerequisites

  • Langfuse credentials (LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_BASEURL).
  • Redis reachable for distributed locking and deduplication.
  • A Slack incoming webhook (SLACK_WEBHOOK_URL).
  • Comma-separated evaluator names configured in LANGFUSE_EVALUATORS.

How it works

Every SCORE_CHECK_INTERVAL_SECONDS (default 600), one pod acquires a Redis lock and polls Langfuse for zero-score entries in the last 10 minutes. For each zero score:

  1. Extract call_sid from trace metadata.
  2. Check a Redis dedup key (call_sid:<evaluator>). If the key exists, skip — already alerted within the TTL.
  3. Otherwise, post a Slack alert with the trace URL, recording URL, merchant, outcome, and failure reason. Set the dedup key with a 3600-second TTL.

Overlapping windows (10-minute check interval, 10-minute lookback) guarantee no missed scores even if a pod crashes mid-cycle.

Key settings

Env varDefaultControls
ENABLE_SCORE_MONITORING_LOOPtrueMaster switch. Turn off when a Langfuse outage causes noisy alerts.
SCORE_CHECK_INTERVAL_SECONDS600Seconds between polls. Also the lock TTL.
LANGFUSE_EVALUATORSComma-separated, case-sensitive evaluator names.
SLACK_WEBHOOK_URLIncoming webhook URL for alert posts.

Example config

.env
bash
ENABLE_SCORE_MONITORING_LOOP=true
SCORE_CHECK_INTERVAL_SECONDS=600
LANGFUSE_EVALUATORS="breeze buddy outcome correctness,transcript_quality"
LANGFUSE_SECRET_KEY=sk-lf-xxx
LANGFUSE_PUBLIC_KEY=pk-lf-xxx
LANGFUSE_BASEURL=https://your-langfuse.example.com
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx
REDIS_HOST=localhost
REDIS_PORT=6379

Operational notes

  • Redis unavailable — the loop skips that cycle (fail-safe). The overlapping window catches the missed score on the next run.
  • Langfuse API timeout / 401 — logs the error and retries next cycle. No alerts sent.
  • Slack webhook failure — logs and retries. The dedup key is not set, so the alert goes out on the next cycle if the webhook recovers.
  • Missing call_sid in trace metadata — alert still fires, but cannot deduplicate. You may see the same alert across multiple cycles. Fix the trace instrumentation upstream.
  • Pod crash mid-check — the lock auto-expires after SCORE_CHECK_INTERVAL_SECONDS. The next cycle resumes cleanly.

When evaluator names change

Evaluator names are case-sensitive and matched exactly against the names in Langfuse. Update LANGFUSE_EVALUATORS whenever you rename an evaluator in Langfuse or add a new one.

Next steps

Was this helpful?