How-to

Langfuse auto-evaluation

Background loop that polls Langfuse for failing LLM evaluator scores and sends deduplicated Slack alerts.

Breeze Buddy runs LLM-as-a-judge evaluators against Langfuse traces (outcome correctness, transcript quality, etc.). When an evaluator scores zero, a background loop notices and posts a Slack alert with the trace, recording, and call metadata. The loop uses a Redis distributed lock so only one pod fires a given alert, and a deduplication TTL so a flaky call isn’t paged 10 times.

Prerequisites

Langfuse credentials (LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_BASEURL).
Redis reachable for distributed locking and deduplication.
A Slack incoming webhook (SLACK_WEBHOOK_URL).
Comma-separated evaluator names configured in LANGFUSE_EVALUATORS.

How it works

Every SCORE_CHECK_INTERVAL_SECONDS (default 600), one pod acquires a Redis lock and polls Langfuse for zero-score entries in the last 10 minutes. For each zero score:

Extract call_sid from trace metadata.
Check a Redis dedup key (call_sid:<evaluator>). If the key exists, skip — already alerted within the TTL.
Otherwise, post a Slack alert with the trace URL, recording URL, merchant, outcome, and failure reason. Set the dedup key with a 3600-second TTL.

Overlapping windows (10-minute check interval, 10-minute lookback) guarantee no missed scores even if a pod crashes mid-cycle.

Key settings

Env var	Default	Controls
`ENABLE_SCORE_MONITORING_LOOP`	`true`	Master switch. Turn off when a Langfuse outage causes noisy alerts.
`SCORE_CHECK_INTERVAL_SECONDS`	`600`	Seconds between polls. Also the lock TTL.
`LANGFUSE_EVALUATORS`	—	Comma-separated, case-sensitive evaluator names.
`SLACK_WEBHOOK_URL`	—	Incoming webhook URL for alert posts.

Example config

.env

bash

ENABLE_SCORE_MONITORING_LOOP=true
SCORE_CHECK_INTERVAL_SECONDS=600
LANGFUSE_EVALUATORS="breeze buddy outcome correctness,transcript_quality"
LANGFUSE_SECRET_KEY=sk-lf-xxx
LANGFUSE_PUBLIC_KEY=pk-lf-xxx
LANGFUSE_BASEURL=https://your-langfuse.example.com
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx
REDIS_HOST=localhost
REDIS_PORT=6379

Operational notes

Redis unavailable — the loop skips that cycle (fail-safe). The overlapping window catches the missed score on the next run.
Langfuse API timeout / 401 — logs the error and retries next cycle. No alerts sent.
Slack webhook failure — logs and retries. The dedup key is not set, so the alert goes out on the next cycle if the webhook recovers.
Missing call_sid in trace metadata — alert still fires, but cannot deduplicate. You may see the same alert across multiple cycles. Fix the trace instrumentation upstream.
Pod crash mid-check — the lock auto-expires after SCORE_CHECK_INTERVAL_SECONDS. The next cycle resumes cleanly.

When evaluator names change

Evaluator names are case-sensitive and matched exactly against the names in Langfuse. Update LANGFUSE_EVALUATORS whenever you rename an evaluator in Langfuse or add a new one.

Next steps

Observability overview Langfuse, OpenTelemetry, Slack alerts, logs.Alert response runbook What to do when an alert fires.Rate limiting Related operator tooling.

Was this helpful?

Edit on GitHub