Skip to content
Activity · Failures

Repeat failures, deterministically keyed.

Failed evaluations group by a stable failure_key. Three strikes on the same key trips an improvement trigger, the diagnosis sub-agent reads its own failing spans, and a candidate prompt enters the experiment runner.

Active aggregates
Above threshold (≥2)
Critical types
Total occurrences
failure_keyoccurrences · trend
Loading failures…