Healing · Experiments

Baseline vs candidate. Code-evals only.

Each experiment scores the baseline and candidate prompts against the regression set with deterministic code_evals — no LLM judges in the hot path. The release gate decides whether the candidate ships.

Runs

Select an experiment to view results.