Skip to content

Settings

System status and configuration

Connection Status

Backend API
Disconnected

Checking…

Phoenix Cloud
Disconnected

Checking…

Gemini API
Disconnected

Checking…

Database
Disconnected

Checking…

Configuration

Loading config…

Evaluator Registry (14 evaluators)

NameTypeAnnotation Level
SchemaValidity
Code
span
ToolSequence
Code
session
RefundGuard
Code
span
PrivacyGuard
Code
span
EscalationGuard
Code
span
CitationPresence
Code
span
LatencyBudget
Code
span

Judge calibration

Judge calibration

Cohen’s κ between each LLM judge and seeded canary labels. Landis & Koch: κ ≥ 0.6 = substantial agreement.

κ tones — brand ≥ 0.60 · ink 0.20–0.60 · fail < 0.20aggregate computed

Diagnostic Actions

Phoenix Cloud workspace

Cross-reference everything PhoenixLoop logs by jumping to the live Phoenix workspace.