goal-elicit

name: goal-elicit description: | Interview the user and write a verifiable goal artifact — a Goal Contract (GOAL.md), a JSON checklist, or a phased design doc, chosen by triage. Never executes the goal or invokes downstream tools — it writes the artifact and stops; hand off to goal-drive (or any agent) to execute. Use for "clarify what I want", "define done_when/acceptance criteria", "make this unambiguous", "plan this out", XY-problem requests, or untestable success. Skip trivial edits, factual questions, routine commands, and code review. user-invocable: true allowed-tools: Read, Write, Edit, Bash(git:), Bash(date:), AskUserQuestion

Interview the user and write a verifiable goal artifact. The deliverable is a single file, chosen by triage — a Goal Contract (GOAL.md), a JSON checklist, or a phased design doc. The skill terminates when that artifact is written — the user takes it from there and hands it to whichever agent or tool pursues it (e.g. [[goal-drive]]).

This skill never plans, edits other files, runs code, or invokes other skills. It writes one artifact and stops.

Runtime portability. One source of truth for Claude Code, Codex, and Grok — the skill never detects its runtime. Defaults are agent-neutral. One /goal execution message (Phase 5) is always emitted for an executable, ready artifact and serves both Claude Code and Codex unchanged — the transcript-anchored form launches goal-drive and guards completion on Claude (its evaluator judges only the transcript, never files) and is a valid native objective on Codex (see references/goal-guardrail.md); on Grok, paste the same body without the /goal prefix. A Claude-only tool is used when present and degrades to plain text otherwise (e.g. AskUserQuestion → numbered plain-text options; see Phase 3).

What this skill produces

One artifact, chosen by triage (Phase 0), then the skill stops:

Contract — GOAL.md at the repo root (or $PWD if not in a git repo); for multiple goals, .goals/<id>.goal.md (legacy .claude/goals/ still read). The default shape. Schema: references/contract-template.md.
Checklist — .goals/<id>.checklist.json for enumerable, batchable work. Schema: references/checklist-template.md.
Phased doc — .goals/<id>.plan.md for staged builds with per-phase acceptance. Schema: references/phased-doc-template.md.

The contract carries one of three terminal states in its frontmatter (the checklist/phased shapes carry the analogous per-unit state — see their templates):

ready — every required field is filled, every done_when item is mapped to evidence, the user has explicitly assented. The artifact is ready to use.
blocked — interview hit the round ceiling or the user can't supply enough context. blocking_unknowns enumerates what's missing. Tell the user the artifact is incomplete and what is needed.
draft — interview in progress; the file is the durable session state.

Execution is out of scope — a separate skill, goal-drive, drives any of these artifacts to done. goal-elicit writes the artifact and stops; it never invokes goal-drive.

Phase 0 — Triage (Cynefin)

Before asking a single question, classify the request. The triage decides how aggressive the interview is. See references/cynefin-triage.md for the decision tree and worked examples.

Domain	Signals	Interview
Clear	Obvious outcome and verification, single file, single-step, reversible.	Fast lane: one confirmation, write a one-shot contract, done. `rounds_used: 1`.
Complicated	Knowable but under-specified, multi-step, has tradeoffs.	Standard interview, 3–5 user rounds.
Complex	Outcome is partly emergent, no single right answer, exploratory or research.	Exploration interview, 4–6 rounds. Contract describes the next safe probe, not a full implementation. Includes a learning criterion and rollback.

If the request is Clear, do not run the full interview — it's a tax on simple work. Just confirm scope and write the contract. If you misclassify and find yourself asking a third question for what looked Clear, escalate to Complicated and continue.

Triage also selects the work shape — which artifact to write (contract / checklist / phased doc) — orthogonally to the domain. See the "Work shape" section in references/cynefin-triage.md. Default to a contract; promote to a checklist (enumerable, script-generatable items) or a phased doc (staged build with per-phase acceptance) only on a clear signal.

The interview

Five phases for Complicated/Complex domains. Round count below counts user turns, not assistant turns.

Phase 1 — Orient (1 turn)

Restate the seed request neutrally. Maintain two distinct fields throughout: stated_request (verbatim) and underlying_goal (what the user is actually trying to achieve). Ask the smallest set of intent questions needed to populate underlying_goal.

Do not draft the contract yet. Open a scratch artifact with status: draft and the frontmatter only — GOAL.md by default, switching to the checklist/phased path under .goals/<id>. once triage picks the work shape.

Phase 2 — Diverge (1–3 turns)

Widen the frame so you are not anchored on the user's first wording. Cover: job-to-be-done, target user/stakeholder, current pain, prior attempts, anti-goals. Force decisions, not yes/no theater.

By the end of round 2, restate at least two plausible interpretations of the goal and ask the user which is right (or whether neither is). This breaks anchoring.

Phase 3 — Converge (1–2 turns)

Present the leading interpretation, the unresolved choices, and your recommended defaults with rationale. Use this exact pattern when proposing an inference:

"I infer X because Y. Plausible alternatives are A and B. Which is right?"

When there are 2–4 plausible options, ask a forced-choice question — present them as a labelled, numbered list where each option carries its consequence, so the user must pick, rank, or edit (never a bare yes/no, no "or something else" escape). If your runtime has a structured question tool (Claude Code's AskUserQuestion), use it to render the choice; otherwise write the numbered options inline in your reply and have the user answer with a number/letter or an edit. Use free-text questions when the answer space is open. The force-a-decision property is what matters, not the widget.

When a forced-choice batch carries a recommended default on each question, give the user a single-token accept path: tell them they can reply defaults to take every recommendation, or override compactly like 1b 2a — one reply resolves the whole batch. This cuts interview drag without weakening the force-a-decision property; offer it only when the defaults are non-blocking assumptions (not for a P0 fork the user must actually decide).

Phase 4 — Contract (1 turn)

Show the draft artifact in compressed form: objective, scope in/out, constraints, acceptance (at least one Gherkin, or per-item/per-phase acceptance for checklist/phased), done_when (each item mapped to evidence), risks, rollback. Ask the user to approve, edit, or mark blocked — not yes/no.

Phase 5 — Confirm (1 turn)

After incorporating the user's edits, write the final artifact with status: ready. The final confirmation is not "are we good now?" It is:

"I will write this as the goal artifact unless you change one of these fields: [short list of decisions]."

If the contract's execution.commit_policy is per_unit, call it out explicitly — with that setting the executor makes a local commit per verified unit without asking each time.

Once the file is written, tell the user where it is (path) and stop. The user takes it from there.

Always emit the execution message (/goal). For every executable, ready artifact (a contract with an execution: block, a checklist with non-empty items, or a phased doc with a ## Phases section), end the handoff with exactly one ready-to-paste /goal message — the same line for Claude Code and Codex, including Clear-domain one-shots. One message, never per-runtime variants — two versions only add mental burden at copy-paste time.

Use the transcript-anchored form: drive the artifact with goal-drive; done when a GOAL-DRIVE COMPLETE marker + the real verification output appears; by-exception stop; TIMEOUT after N turns. The same string runs on both:

On Claude Code, /goal <it> makes the condition the directive — the paste launches goal-drive and guards completion. Its evaluator (a small fast model, defaults to Haiku) judges only the transcript — it never reads files or runs tools (official docs), which is exactly why done-when keys on goal-drive's printed marker + output, never "the file says done."
On Codex, the same /goal <it> is a valid objective — Codex's native executor drives the artifact and stops on the same marker (goal-drive prints it identically on both).

The transcript-anchored form is the portable superset — required by Claude's blind evaluator and valid for Codex; the Codex-minimal "verify the file's checks yourself" wording is not portable (Claude's evaluator can't open the file), so never emit it as the shared message. Build the one message from references/goal-guardrail.md. Emitting the text is not execution — goal-elicit never runs /goal. (On Grok, no /goal: the user pastes the same body without the /goal prefix — same text, not a second version.) The only carve-out: a blocked/draft artifact has nothing to execute yet — emit no execution message and instead state what's missing (blocking_unknowns/blank fields).

Optional lint check (advisory text — goal-elicit never runs it). For an executable, ready artifact, also hand the user the one-line mechanical check goal-drive ships, so they (or CI, or goal-drive's load-time preflight) can validate the artifact before driving:

python3 <goal-drive skill dir>/scripts/lint_goal_artifact.py <artifact-path>

It checks required fields, every done_when/acceptance mapped to evidence, no leftover placeholders, and no dangerous-vague phrases — the mechanical backstop for the Stop-criteria audit. Emitting the line is not execution (goal-elicit never runs code); skip it for blocked/draft artifacts.

Question batching

3–5 questions per turn by default. Drop to 2 if any question is cognitively heavy.
Group by theme. Examples: orient batch is "audience + outcome + deadline"; converge batch is "scope in + scope out + constraints"; contract batch is "acceptance + verification + rollback".
Every two rounds, post a running summary with three sections: locked decisions, unresolved decisions, inferred defaults. The summary is also the diff with the previous summary — call out anything changed or discarded.

Stop criteria

Mark status: ready only when all of the following are true. If any are false, either continue interviewing or write status: blocked.

Required fields filled: objective, underlying_goal, stakeholders, scope_in, scope_out, constraints, assumptions, acceptance_criteria, done_when (each mapped to evidence), verification_plan, risks, rollback, observability, final_response_contract.
At least one acceptance criterion is written as a Gherkin scenario (Given / When / Then) or an equivalent testable scenario.
Hard gate: every done_when item names a command, file artifact, metric, or user-observable behavior. Reject done when "looks good" style criteria.
Zero P0 open questions. Non-blocking assumptions are written with defaults and rationale.
The user has explicitly assented or supplied edits that you incorporated. Silence is not assent.
Contract audit: each acceptance criterion has a matching verification method; each constraint has a way to notice violation; no success criterion depends only on executor self-report. Any criterion that leans on an external oracle or reference dataset must also name (a) what produces that ground truth and (b) one guard against a wrong oracle — stale/cached source, a different locale/env than the system under test, or the same tool that's being judged.

Hard upper bound: 8 user rounds

Round	Behavior
1–5	Normal interview.
6	Warn the user: "Two rounds left before I either write the contract or write a `blocked` brief."
7	Last chance to resolve P0 unknowns. Be explicit about what's still missing.
8	Stop interviewing. Write either a complete contract or `blocked` with `blocking_unknowns` populated.

Never pretend completion to dodge the cap. A blocked contract is a successful outcome — it tells the user exactly what new input is required before the contract can be considered complete.

Question taxonomy

Ask in roughly this order. Earlier answers determine whether later categories matter. See references/question-bank.md for ~40 worked examples per category, each phrased to force a decision.

Intent — what outcome, for whom, why now?
XY check — what made you choose this requested approach? (Detects when stated_request ≠ underlying_goal.)
Job-to-be-done — "When [situation], I want [capability], so I can [outcome]."
Stakeholders / decision owner — who accepts the result?
Scope in — what must be included?
Scope out / anti-goals — what tempting adjacent work must not be done?
Constraints — stack, style, policy, time, budget, dependency, compatibility, security.
Prior attempts and current state — what exists, what failed, what should not be rediscovered?
Success criteria — what observable facts prove success?
Definition of done — what must be true before this is complete?
Risks and edge cases — what could make the apparent solution wrong?
Rollback / containment — how to undo, disable, or mitigate?
Observability — where will evidence appear (logs, tests, UI, files, metrics)?
Deadline / sequencing — real date or ordering constraint?

Anti-patterns to engineer out

See references/anti-patterns.md for the full failure-mode table with the prompt move that engineers each one out. The most important rules:

No yes/no until final approval. Earlier questions must force a choice, rank, or edit.
No leading questions. Always state your inference and evidence first, then ask the user to choose or correct it.
No faux confidence. Maintain an explicit open_questions ledger in the draft artifact. You may not say "I understand" until the ledger is empty or remaining items are marked non-blocking with defaults.
No "are we good now?" loops. One confirmation protocol at Phase 5, not repeated checks.
No anchoring on first phrasing. Preserve stated_request and underlying_goal as separate fields. By round 2, restate at least two plausible interpretations.
No verification theater. Every done_when needs concrete evidence — command, file artifact, metric, or user-visible behavior.
No runaway scope. Default new ideas to "Could" or "Won't this time" (MoSCoW) unless the user explicitly promotes them.
No execution. Never plan, write code, edit other files, or invoke other skills. The skill writes GOAL.md and stops. The user takes it from there.

Resumption

If a goal artifact already exists in the working directory — GOAL.md, or <id>.checklist.json / .plan.md under .goals/ (or, for artifacts from an earlier version, the legacy .claude/goals/):

Read it.
If status: ready (or, for a checklist, all required header fields are filled), the artifact is complete. Tell the user where it is and stop. Do not re-interview.
If status: draft or blocked (or a checklist whose items/header is incomplete), identify which fields are blank or in blocking_unknowns. Resume the interview by asking only for those fields. Do not re-ask filled fields. Increment rounds_used from where it left off.

What this skill must not do

Do not plan, write code, run the goal, or invoke another skill — ever. The deliverable is the artifact (contract, checklist, or phased doc) only. Execution belongs to goal-drive; hand the artifact off, but do not invoke goal-drive or run the work yourself.
Do not infer the user's goal silently and proceed.
Do not pretend completion at the round ceiling — write blocked instead.
Do not seek an external second opinion from another model or agent during the interview unless the user explicitly asks for one.
Do not over-question Clear-domain tasks. Fast-lane them.

Files in this skill

SKILL.md — this file.
references/contract-template.md — the GOAL.md skeleton with all required frontmatter and body sections (+ the optional execution: block for goal-drive).
references/checklist-template.md — the JSON checklist artifact (enumerable / batched work).
references/phased-doc-template.md — the phased design-doc artifact (staged work with per-phase acceptance).
references/cynefin-triage.md — Phase 0 decision tree with worked examples.
references/question-bank.md — worked questions per taxonomy category.
references/anti-patterns.md — failure-mode table with the prompt move that engineers each one out.
references/goal-guardrail.md — the Phase 5 /goal execution message (Claude Code + Codex): emit templates for both, per-shape derivation, and caveats.

Read the references when you need them — not all up front. Long-form material is there so the model loads it only when relevant.