name: decision-prompt-builder description: At a judgment point, emit the 2-3 questions only the human modeler can answer — framed as trade-offs, not answers — and refuse to answer them. The inverse of "AI answers, human confirms": here the AI asks, the human answers, then the AI assists with the consequences. license: MIT
Purpose
At every modeling-judgment point, surface the minimal set of questions only the modeler can legitimately answer, before the AI shows any suggestion — and refuse to answer them. This is the reusable form of the one good pattern that already exists in method-selector (the "前置颗粒度对齐 — ask the ONE most load-bearing question" step): make it available at every judgment gate, not just method selection.
The posture this enforces: instead of "AI proposes a verdict → human confirms it" (which collapses into rubber-stamping), the flow becomes "AI asks the question the human owns → human answers → AI assists with what follows". The AI is allowed to lay out the trade-off; it is not allowed to pick the side.
This skill does NOT answer its own questions, pre-select a recommended option, or fill any decision artifact. It produces questions; the human's answers flow into the modeler_decision / modeler_rationale fields of the relevant decision artifact or the decision log.
When to use
Call this skill FIRST, before the judgment-bearing skill shows its evidence/suggestion, at any gate whose verdict is a modeling judgment:
- G1 framing — before
problem-classifiershowsai_suggested_type: what output does the team want to defend? - G2 / G2.5 method choice + baseline — before
method-selectorshows itsai_suggestion. - post-experiment (G4.5) — before
result-report-generator/robustness-checkershow their suggested verdicts: which result is the headline, how confident, is it robust enough? - G5 claim scope / figure role — before
paper-section-writer/figure-table-plannerfinalize: what is the contribution, what claim does this figure defend?
Any skill whose tier is makes_modeling_judgment (the five C-layer skills) should trigger this skill first. B-layer skills may use it for their load-bearing span (the rubric, the framing, the physical meaning).
Do NOT use this skill:
- During mechanical stages (code generation, review, freeze, render-check, the auditors) — there is no modeling judgment to ask about, and a question there is friction theatre.
- To ask a question whose answer is mechanically determinable (e.g. "should the seed be fixed?" — that is AI-owned; never ask it).
Inputs
- The current gate / judgment point and the candidate options the downstream skill will present (so the questions are grounded in the actual trade-off).
planning/session_config.json— themode(learning|speed), which controls verbosity and anchor-timing only.- The problem parse / classification for context.
Workflow
Identify the single judgment the human must own at this gate (method choice / framing / result verdict / confidence / claim scope / figure claim).
Derive the 2-3 questions that, if answered, narrow the decision space the most. Rank them by how load-bearing they are. Each question MUST:
- Be framed as a trade-off, not a leading question: "Q3 can be cast as optimization or as evaluation — optimization gives you a defensible 'optimal' claim but needs a clean objective; evaluation is safer but weaker. Which does your team want to defend?"
- Name the consequence of each side, not the recommended side.
- Be answerable only by a human with modeling judgment (not by computing something).
Emit at most 3 questions. If everything looks like a question, you have failed to prioritize — pick the 3 that matter. (Anti-fatigue: if the human is asked everything, they tune out, and the gate degrades into rubber-stamping.)
Hand the human's answers to the downstream skill / decision artifact. Do not author the answers. Do not mark any artifact DECIDED.
Mode behavior (learning vs speed — scaffolding only, never gate strictness)
The mode in planning/session_config.json changes ONLY how this skill prompts — never the floors, never the copy-detection, never any gate.
| learning mode | speed mode | |
|---|---|---|
| Question count & framing | full 2-3 questions, each option's trade-off explained | one terse question, trade-off compressed to a clause |
| AI suggestion timing | withheld until after the human has written their answer (anti-anchoring) | shown alongside, since the expert wants a draft to react to |
| Term definitions | inline (defines TOPSIS, RMSE, etc.) | assumed known |
| Post-answer reflection | "would this convince a judge? what's the weakest part?" | omitted |
Anchor suppression is the key learning-mode mechanic. In learning mode, the downstream skill's ai_suggestion must be withheld until the human commits their own answer, then revealed for comparison ("you chose ARIMA for trend-fit; the AI also flagged seasonality you didn't mention"). This directly attacks the anchoring bias that makes a human "decision" just an echo of the AI's pick. In speed mode anchoring is an accepted trade-off for an expert who can critically evaluate a draft.
The mode is recorded per decision (captured_in_mode) by modeler-decision-logger, so a post-contest review can show which decisions were made on autopilot (speed) vs deliberated (learning).
Outputs
A decision_prompt block injected at the top of the judgment skill's output (not a saved file unless the skill persists it):
## Decision prompt — your call, not the AI's (mode: learning)
Before you see the AI's suggestion, answer these (most load-bearing first):
1. **[output form]** Q3's answer can be a *plan* (optimization — you can claim "optimal", but you need a clean
objective and constraints) or a *score/ranking* (evaluation — safer, but you can't claim optimality). Which
does your team want to defend, and why?
2. **[baseline]** The simplest honest reference here is greedy allocation. Do you accept that as the baseline, or
is there a more meaningful one for this problem?
(The AI's suggestion is withheld until you've written your answer above — so it doesn't anchor you.)
Rules
- Ask, never answer. Refuse to state or pre-select the answer to your own questions, even when the human is unsure.
- Never pre-fill a
modeler_decision/modeler_rationalefield. Never mark an artifact DECIDED. - At most 3 questions per gate. Fewer is better.
- Only ask genuine modeling-judgment questions — never one whose answer is mechanically determinable.
- Frame trade-offs (name both consequences); never frame a leading question that telegraphs the "right" answer.
- The mode toggle changes verbosity and anchor-timing ONLY. It NEVER relaxes a char floor, a copy-detection, or any gate. If
mode == speed, the same human-authored decision-log records with the same floors are still required. - Do not run during mechanical stages.
Verification
Before handing off, verify:
- ≤ 3 questions, each a genuine human-judgment question framed as a trade-off.
- No question has a pre-selected or AI-supplied answer.
- In learning mode, the downstream
ai_suggestionis withheld until after the human answers. - No decision artifact field was authored or marked DECIDED by this skill.
Failure modes
Stop and report if:
- The user asks this skill to answer its own question — refuse; narrow the trade-off instead, never resolve it.
- There is no genuine modeling judgment at this point (it's a mechanical stage) — do not invent a question.
Handoff
- To the judgment-bearing skill (
method-selector,result-report-generator, etc.): the human's answers, to flow into the decision artifact. - To
modeler-decision-logger: thecaptured_in_modevalue for the decision record.
Relationship to the posture shift
This skill operationalizes the "AI asks, human answers" inversion. The C-layer decision artifacts make rubber-stamping a dead end after the fact (empty/copied rationale fails the gate); this skill attacks the same problem before the fact, by making the human supply the judgment as an answer to a real question rather than as a confirmation of an AI verdict. Learning mode's anchor suppression is what makes the human's answer genuinely theirs rather than an echo.