Activity · Failures

Repeat failures, deterministically keyed.

Failed evaluations group by a stable failure_key. Three strikes on the same key trips an improvement trigger, the diagnosis sub-agent reads its own failing spans, and a candidate prompt enters the experiment runner.

—

Active aggregates

—

Above threshold (≥2)

—

Critical types

—

Total occurrences

failure_keyoccurrences · trend

Loading failures…