name: karpathy-metric-pre description: "Use this when: red-team my optimization metric, find ways to game my metric, metric pre-mortem, adversarial metric evaluation, gaming vectors for my KPI, what could an agent exploit in my metric, metric failure modes, proxy divergence risk, eval contamination, silent degradation from optimization, metric gaming pre-mortem, is my metric robust enough for auto-improvement, build secondary metrics, evaluation diversity plan, metric countermeasures, what does overfitting look like in my system, holdout scenario design, disappearance test, metric gaming attack surface, is my metric gameable, optimization agent failure modes, what could go wrong with my eval, unsupervised optimization risk, metric red team"
If the user has a program.md from a previous session, ask them to paste the relevant sections. If they're working from their own notes, gather the equivalent information conversationally. Do not proceed until you understand all four elements.
Wait for their response.
STEP 2 — GENERATE GAMING VECTORS For the specific metric and system described, generate a comprehensive list of ways an optimization agent could inflate the metric without delivering the intended business value. Organize these into five categories:
a) Direct Gaming — Ways to hit the number by exploiting the measurement mechanism itself (e.g., formatting tricks that satisfy rubrics, edge cases that inflate scores, shortcuts that satisfy test cases but not real-world conditions)
b) Proxy Divergence — Ways the metric could improve while the actual business outcome it represents stays flat or degrades (e.g., optimizing response time while degrading response quality, reducing churn on paper while just making cancellation harder)
c) Eval Contamination — Ways the optimization loop could inadvertently influence the data or conditions it's being evaluated against (e.g., the agent's outputs during experiments changing the distribution of test inputs, training and evaluation data sharing leakage paths)
d) Silent Degradation — Side effects that the metric doesn't capture that could accumulate over many optimization cycles (e.g., increasing technical debt, eroding edge-case handling, drifting from compliance requirements, degrading user trust through subtle behavior changes)
e) Compounding Cascades — How a locally optimal change could create problems in connected systems (e.g., a pricing optimization that improves margin metrics but creates fulfillment bottlenecks, a support agent optimization that reduces handle time but increases repeat contacts)
For each gaming vector, provide:
- A specific, concrete scenario (not abstract — describe what the agent would actually do)
- Why it would register as an improvement on the primary metric
- What real-world damage it would cause
- How long it might persist before a human notices
STEP 3 — BUILD THE DEFENSE For each gaming vector identified, propose specific countermeasures:
a) Secondary Metrics — Additional measurements that would catch this failure mode. Be specific: name the metric, how to compute it, and what threshold should trigger investigation.
b) Holdout Scenarios — Test cases the optimization agent should never see during its loop but that should be evaluated periodically by a human. Describe the specific scenarios and why they'd catch this gaming vector.
c) The Disappearance Test — For each potential optimization the agent might propose, define how to apply Gu's test: "If this exact task disappeared, would this still be a worthwhile improvement?" Translate this into a concrete check for the user's domain.
STEP 4 — DELIVER THE EVALUATION DIVERSITY PLAN Synthesize the above into a single actionable document.
See Also
karpathy-triplet-diag— Define the system, editable surface, and metric this red-team is evaluating.karpathy-trace-infrastructure— Ensure traces capture enough to detect the gaming vectors identified here.harness-engineering— Place mechanical guardrails in the harness to block detected gaming patterns.ai-systems-architect— Architect secondary metrics and evaluator diversity into the pipeline.outcome-based-system-prompt— Remove prompt-level duct tape that creates proxy divergence risk.