build-judgment-list - SKILL.md Agent Skill

name: build-judgment-list description: "Build graded relevance judgments (explicit or click-derived) and an offline harness before tuning. Reach for this when there's no eval."

Tuning without a judgment list is guessing dressed as engineering (§3 #3).

Representative queries weighted by real traffic (§3 #7).

Explicit graded labels or click-derived judgments, with position-bias caution (§3 #3 #6).

Reusable NDCG/MRR/precision@k harness over the judgment list (§3 #3).

The current ranking's metrics — the bar every change must beat (§3 #1).

A graded judgment list and an offline harness with a recorded baseline.