name: eval-outcomes description: 'Grade agent or model output against Outcomes for holdout-safe evals and runtime comparisons. Fold target for scenario. Triggers: "eval-outcomes", "eval outcomes", "grade agent or model output".'
eval-outcomes — moved to Mount Olympus (2026-06-10)
Holdout scenario management (absorbed from scenario, ag-s43tg)
Author and manage holdout scenarios with the ao CLI: ao scenario add "<title>"
creates a scenario in .agents/holdout/ (ID s-YYYY-MM-DD-NNN, acceptance
vectors, 0.8 default satisfaction threshold); ao scenario validate checks the
holdout set's schema and link graph. Linked scenarios feed directive fitness via
ao goals scenarios (see the $goals skill and docs/adr/ADR-0003).
Absorbed skills (ag-s43tg)
- scenario — Manage holdout scenarios; author and manage holdout scenarios with measurable acceptance vectors and satisfaction scoring in
.agents/holdout/for behavioral validation.
This skill encodes independent-verdict machinery and now lives with the outer
gate product. Canonical: ~/dev/mt-olympus/.codex/skills/eval-outcomes/SKILL.md —
read and follow that file. This stub preserves fleet routing until the
using-agentops catalog closer updates the registry (skill-prune Lane A,
evidence/skill-prune-recon.md).
Folded-In Trigger Surface (scenario)
eval-outcomes is the fold target for the retired standalone scenario skill
(skill-prune phase 2). Fire this skill for its use-cases:
- Scenario — Manage holdout scenarios. Author and manage holdout scenarios
for behavioral validation: scenarios define what the system should do in
narrative form, with measurable acceptance vectors and satisfaction scoring.
They live in
.agents/holdout/*.jsonso implementing agents cannot see them during development. When asked to author, manage, or score holdout scenarios, fire this skill.