cli-spec-to-goal - SKILL.md Agent Skill

name: cli-spec-to-goal description: Convert a Node.js CLI tool specification (even a vague one) into a Codex /goal-ready bundle — GOAL.md, VERIFY.md, PROGRESS.md inside goals// — with AI-agent-friendly patterns (--json, stdout/stderr separation, exit codes, structured errors, TTY detection, --dry-run) baked into every goal. Asks clarifying questions in focused batches (each with options + recommendation + reasoning), reaches 95% confidence, optionally scaffolds a missing CLI project (bin/, src/, package.json, AGENTS.md, CLAUDE.md behavioral snippet), and emits a tailored /goal command. Handles both new CLIs and adding commands to existing ones. Supports JavaScript (default) and TypeScript, commander.js, Vitest, conditional SQLite and cosmiconfig. For Node.js CLI tools, not web apps, servers, or non-Node projects.

cli-spec-to-goal — Vague CLI Idea → Codex /goal Bundle

Turn a rough Node.js CLI tool idea into the three files Codex /goal needs to drive autonomous implementation:

goals/<slug>/
  GOAL.md       what done looks like (full CLI tool template with AI-agent patterns)
  VERIFY.md     how /goal proves it's done (host-native verification)
  PROGRESS.md   the audit trail /goal will populate

The skill optionally scaffolds a missing CLI project (bin/, src/, package.json, AGENTS.md, CLAUDE.md) and prints a tailored /goal command at the end.

Why this skill exists

A full multi-prompt workflow kit is overkill for CLI tools that fit in one or two goal slices. This skill collapses the "spec → goal trio" hop into a single guided conversation that:

batches clarifying questions (so the user makes 2-3 decisions per round, not 20)
recommends a default for every choice (so they can move fast when the recommendation looks right)
bakes in six AI-agent patterns that make every CLI tool machine-friendly from the start
writes the same template shapes already validated by the Codex /goal operator pattern

It does not replace the multi-prompt kit for complex CLI suites with 10+ commands. Use the kit for big builds; use this skill for the small-to-medium ones.

The six AI-agent patterns

Every GOAL.md, VERIFY.md, and scaffold generated by this skill includes these patterns. They make CLI tools safe and useful for AI agents to invoke — not just humans.

--json flag on every command — stdout receives only valid JSON; no spinners, colors, or progress bars. Gold standard: the gh CLI.
stdout / stderr separation — data to stdout, diagnostics to stderr, always.
Non-interactive TTY detection — never block on stdin when !process.stdin.isTTY.
Meaningful exit codes — 0 = success, 1 = user/input error, 2 = usage error.
Structured error format — when --json is active: { "error": "CODE", "message": "...", "suggestion": "..." }.
--dry-run on mutating commands — any create/update/delete command previews without executing.

These patterns reinforce each other: --json + structured errors = machine-parseable success and failure; TTY detection + --dry-run = safe automated execution.

Core flow

Probe the project — inspect the cwd to learn what's already in place.
Decide single goal vs. multi-goal — judge complexity from the spec; if multi-shaped, offer to split via goals-plan.md.
Ask in focused batches — 2-4 questions per round, each with options + recommendation + reasoning, until 95% confident.
Optionally scaffold — if the project is missing pieces and the user agrees, create only what they confirm.
Write the goal trio — GOAL.md + VERIFY.md + PROGRESS.md at goals/<slug>/.
Hand off — print the file paths + a tailored /goal command + a short "how to run" note.

Never write any files until confidence on what's about to be produced is at 95%+.

Step 1 — Probe the project

Before asking the user anything, inspect the cwd. The point is to ground recommendations in reality so questions don't waste the user's time on things already settled by the project state.

Look for:

Signal	What it tells us
`package.json` → `bin` field	Existing CLI entry point; binary name already chosen
`package.json` → `type` field	ESM vs CJS — skip module system question
`package.json` → `engines`	Node version constraint
`package.json` → `scripts`	Existing test/lint/build commands
`package.json` → `dependencies`	Arg parsing lib already chosen (commander/yargs/meow/citty)
`package.json` → `devDependencies`	Test framework (vitest/jest), TS, linter (eslint/biome)
`tsconfig.json`	TypeScript project — skip language question
`bin/` or `src/cli.*`	Existing entry points
`.env` / `.env.example`	External service config pattern in use
`goals/` folder	Previous goal runs; pick non-conflicting slug
`AGENTS.md`	Existing conventions to respect
`CLAUDE.md`	Existing behavioral guidance to reference
`vitest.config.` / `jest.config.`	Test infra in place — skip framework question
`.gitignore`	Check coverage for node_modules, .env, dist/
`eslint.config.*` / `biome.json`	Lint setup exists
`.db` / `.sqlite` / `drizzle/` / `migrations/`	Data storage in use — triggers data storage questions
`.toolrc` / `*.config.js` / cosmiconfig in deps	Config file pattern already wired
`src/auth.` / `src/oauth.`	OAuth flow already implemented
`open` or `opener` in deps	Browser-opening for OAuth consent flow
`.env` containing `CLIENT_ID`, `CLIENT_SECRET`, `REDIRECT_URI`, or `REFRESH_TOKEN`	OAuth credentials pattern in use

Use this to inform Step 3 recommendations. Skip questions that the probe already answers.

New vs. existing CLI detection:

bin field in package.json or src/cli.* exists → existing CLI, focus on adding commands
Neither present → new CLI from scratch

Step 2 — Judge complexity

Read the user's spec and decide whether it fits one goal or wants to be split.

Multi-goal signals:

More than ~3 distinct commands/subcommands being built
More than ~5 AC spread across unrelated concerns (parsing + API + file I/O + formatting)
Clear "core library first, then CLI wrapper" or "MVP then enhancements" reading
Walking-skeleton step needed (e.g., basic --help before vertical features)
Multiple unrelated subsystems (HTTP + filesystem + database + streams)
Multiple output format requirements (JSON, table, plain text)

If multi-goal, surface it before other clarifying questions:

This spec looks like it could be 2-3 separate goals. I can either generate one combined goal or write a goals-plan.md and scaffold the first slice. Which do you prefer?

If the user picks split, write goals-plan.md at the project root with a numbered list of proposed goals (each with a 1-2 line description), then ask which slice to generate first. Re-run the skill later for the next slice.

Step 3 — Ask clarifying questions

Use the AskUserQuestion tool when available. Fall back to natural-language Q&A with the same shape: 2-4 questions per round, each with 2-4 options, recommended option first and labeled (Recommended), plus a one-line reason for the recommendation.

Aim for ≤3 rounds total. Skip any question already answered by the spec or the probe.

Round 1 — Identity + Project Context

Only ask questions not settled by the probe.

Package name: Suggest kebab-case of the tool name. Only for new CLIs.
Binary name: Usually same as package name. Only for new CLIs.
New vs. existing: Only if the probe is ambiguous (e.g., package.json exists but no bin field).
Language: JavaScript (recommended) or TypeScript. Skip if tsconfig detected. JS is the default because it's simpler and good enough for most CLIs; recommend TS if the user mentions types or the project is already TypeScript.
Module system: ESM (recommended) or CJS. Skip if type field found in package.json. ESM because it's 2026.
Arg parsing library: commander (recommended for ≤10 commands) or yargs (for complex subcommand trees) or citty or none. Skip if an arg parser is already in dependencies.

If existing CLI: Replace identity questions with:

"Which command are we adding?" (extract from spec)
"Any existing patterns to follow?" (probe-informed — show detected conventions)

Round 2 — Scope + Behavior

Acceptance criteria: Draft 3-6 AC bullets from the spec yourself, then ask the user to confirm, edit, or add. Don't make them write ACs from scratch — give them a starting list.
Out of scope: Suggest boundaries (GUI/web interface, daemon/watch mode, Windows-specific behavior, breaking public API changes, platform packaging/installers).
User stories: For non-trivial specs (>2 AC), propose 1-3 user stories. Skip for single-AC features.
Subcommand structure: If the spec mentions multiple actions, propose a subcommand tree.
I/O contract: What does the tool read? What does it produce? What formats? Infer from spec; confirm with user. This feeds the CLI Contract table in GOAL.md.

Round 3 — Configuration + Verification

Configuration surface: Env vars + dotenv (recommended for secrets), cosmiconfig (for persistent non-secret settings when >3 of them), or CLI flags only. See the decision matrix below.
External dependencies / auth: If the spec mentions an API or service, ask about credentials and auth.
Verification approach: Integration tests (recommended for CLI-visible behavior), unit tests (for internal logic), or both (recommended when substantial library code exists alongside CLI).
Test runner: Vitest (recommended — native ESM/TS, fast) or Jest. Skip if test framework already in project.

Configuration decision matrix:

Scenario	Recommendation
API credentials only	Env vars + dotenv only
1-3 non-secret settings	CLI flags only
>3 non-secret persistent settings	Env vars + dotenv + cosmiconfig
Per-project tool	cosmiconfig
User-global tool	Env vars or XDG config

Conditional: Data Storage Round

Only surface when the probe detects storage signals (*.db, *.sqlite, better-sqlite3 in deps, drizzle/, migrations/) OR the spec mentions persistence keywords (store, save, history, cache, log, database, persist, record, track, remember, catalog, index).

Does this tool need to persist data between runs? Yes (describe what) / No.
Storage approach: SQLite via better-sqlite3 (recommended for structured data) / JSON file (for simple key-value) / No persistence.

SQLite via better-sqlite3 is recommended because: zero setup (no server, connection string, or Docker), single file, synchronous API, AI-agent-friendly (no daemon to manage).

Conditional: Config File Round

Only surface when the tool has many non-secret settings that users would want to persist across runs.

Should the tool support a config file? Yes (cosmiconfig — .toolrc, tool.config.js, etc.) / No (CLI flags only). Recommend yes if >3 persistent settings.

Conditional: OAuth Round

Only surface when the probe detects OAuth signals (existing src/auth.*, open in deps, CLIENT_ID in .env) OR the spec mentions OAuth authentication with an external API that requires user consent (not just a static API key).

OAuth keywords: oauth, login, authenticate, authorization, consent, token, refresh token, access token, redirect, callback URL.

Does this tool need OAuth authentication? Yes / No (env var API key is sufficient). If the spec just mentions an API key, OAuth is not needed — env vars handle it.
OAuth provider? (e.g., Reddit, Google, Spotify, GitHub). Needed to document the authorization and token endpoints.
Token storage tier? Tier 1: refresh token in .env (recommended — simple, gitignored) / Tier 2: refresh token encrypted in SQLite (for shared machines or compliance) / Let user choose at runtime during auth login.
Cache access tokens? Yes, in SQLite with expiry (recommended for frequent invocations — skips network call when token is valid) / No, in-memory only (re-mint each run from refresh token — more secure but slower).

The OAuth flow uses Authorization Code with a temporary localhost redirect server (like gh auth login). The scaffold adds src/auth.js, src/token-store.js, and src/commands/auth.js with auth login, auth status, and auth logout subcommands.

If OAuth is active, it may independently trigger SQLite as a dependency (for Tier 2 or access token caching). When both OAuth and the data storage layer are active, they share the same SQLite database.

Confidence check after each round

After each round, ask yourself: "If I started writing files now, what could go wrong because of something I don't know?"

If "not much" → proceed to Step 4.
If meaningful unknowns → ask one more focused round.
If 3 rounds done and still under 95% → the spec may need a longer conversation or a different approach. Tell the user rather than guessing.

Recommendation heuristics

Area	Heuristic
Identity	Kebab-case from tool name; honor existing package.json name
Scope	Narrowest scope that achieves the spec. Out-of-scope wins ties
Security	Env vars for secrets, never in stdout. Strictest reasonable approach
Verification	Integration tests for CLI behavior, unit tests for library code, both when crossing the boundary
Goal slicing	Single goal unless AC clearly spans unrelated subsystems
Language	JS default; TS when user mentions types or when adding to a TS project
Framework	commander for ≤10 commands; yargs for complex trees; respect existing choice

Step 4 — Optionally scaffold

If the probe found a new CLI (no existing project structure) and the user agrees, create missing project files. Templates are in references/scaffold-templates.md. Read that file when scaffolding.

Default scaffold (JavaScript):

bin/<tool-name>.js      shebang + import
src/cli.js              Commander setup, --json, --version, TTY detection
src/config.js           loadConfig() from env vars via dotenv
src/errors.js           AppError with code, message, suggestion, exitCode
src/commands/           one file per subcommand
tests/cli.test.js       smoke tests
.env.example            documented env var template
.gitignore              node_modules, .env, dist/, *.db
package.json            bin, type:module, engines, scripts, deps
AGENTS.md               stack, conventions, canonical commands
CLAUDE.md               behavioral snippet: when/how AI agents use each command

TypeScript additions (when TS is chosen):

tsconfig.json (strict, nodenext, es2022, outDir: dist)
bin/ imports from dist/ instead of src/
build and typecheck scripts added to package.json
typescript added to devDependencies
files field set to ["bin", "dist"]

Conditional additions:

src/db.js — when data storage round chose SQLite (better-sqlite3 singleton, WAL mode, foreign keys, initSchema())
cosmiconfig integration in src/config.js — when config file round chose yes
src/auth.js — when OAuth round confirmed (OAuth Authorization Code flow with local redirect server)
src/token-store.js — when OAuth round confirmed (token storage with optional AES-256-GCM encryption)
src/commands/auth.js — when OAuth round confirmed (auth login, auth status, auth logout subcommands)

Collision handling: Never overwrite without explicit user confirmation. Exceptions:

.gitignore — offer to merge missing lines
AGENTS.md and CLAUDE.md — offer to append rather than overwrite
Existing CLI path — skip scaffold entirely, but still generate CLAUDE.md and AGENTS.md updates

Step 5 — Write the goal trio

Generate three files at goals/<slug>/. Templates live in:

references/goal-template.md — full GOAL.md template (16 sections with CLI defaults and AI-agent patterns)
references/verify-template.md — full VERIFY.md template (host-native verification: binary smoke, functional, exit code, integration checks)
references/progress-template.md — initial PROGRESS.md skeleton

Read these reference files when generating. Substitute placeholders with everything confirmed in Step 3 and observed in Step 1.

Filling rules

Spec placeholders (e.g., {{Tool Name}}, {{slug}}, {{AC ID}}) — replace with concrete values from Step 3 answers and Step 1 probe. Never leave one in.
Audit placeholders in VERIFY.md evidence format section and PROGRESS.md final evidence block — leave as empty cells for /goal to fill at completion time.
If a whole section doesn't apply (e.g., "Data / Migration Requirements" for a tool with no persistence), write Not applicable. plus a one-line reason rather than deleting the section — /goal's completion audit expects the section to exist.
For every AC, give it a stable ID like AC-001.1. The audit table in GOAL.md and the evidence table in PROGRESS.md reference these IDs.
Match the user's stated scope literally. If they said "read-only tool," omit --dry-run from the AI-agent patterns and note why.
Before writing, scan the generated text for any remaining {{ outside of code-fenced template-instruction comments. If you find any, you missed something — fix it.

Host execution rule

Every command in the goal trio runs directly on the host. There is no container layer. The binary is invoked as node bin/<name>.js (or just <name> if the package is linked). Tests run with npm test. No routing decisions needed.

Simplified sandbox rule

If the harness running /goal sandboxes shell access:

Local-only tools (no network, no external services) → no sandbox bypass needed.
API-calling tools → network access to target API hosts.
Mock-tested API tools → localhost binding on the test port.
OAuth tools → network access to the provider's authorization and token endpoints, plus localhost binding for the redirect server (default port 8910).

No Docker socket, no playwright browser, no container filesystem concerns.

Test artifact convention

Everything lives on the host:

Test specs → tests/
Fixtures → tests/fixtures/
Goal verification scripts → goals/<goal-dir>/checks/
Test output artifacts → goals/<goal-dir>/test-artifacts/

Step 6 — Hand off

After writing the files, print:

Generated:
  goals/<slug>/GOAL.md
  goals/<slug>/VERIFY.md
  goals/<slug>/PROGRESS.md

To run with Codex /goal:
  /goal Complete goals/<slug>/GOAL.md. Use goals/<slug>/VERIFY.md as the verification contract. Update goals/<slug>/PROGRESS.md continuously. Treat uncertainty as incomplete.

How to run:
  1. Open Codex inside this repo:    codex
  2. (Optional) /plan Read goals/<slug>/GOAL.md and VERIFY.md and propose an implementation plan.
  3. Paste the /goal command above.
  4. Review changes via `git diff` before committing.

If scaffold files were created in Step 4, list them above the goal trio under Scaffolded:. If goals-plan.md was written in Step 2, append a Next slice: note.

New CLI vs. existing CLI

Scenario	Detection	Behavior difference
New CLI	No `bin` field in package.json, no `src/cli.*`	Full scaffold offered, all questions asked
Existing CLI	`bin` field exists, entry points found	No scaffold, questions focus on the new command/feature, probe informs existing conventions to respect

For existing CLIs:

Respect existing arg parsing library (don't recommend commander if yargs is already in use)
Respect existing module system (don't ask ESM/CJS)
Respect existing test framework
Focus questions on the new command's scope, I/O contract, and acceptance criteria
Still generate the full goal trio, but GOAL.md "Source of Truth" references existing files

Stop conditions

Stop and ask the user instead of guessing if:

The spec implies an external API with credentials not described.
The spec implies a database schema but the data model isn't clear.
The slug or project folder collides with something in the repo.
More than 3 question rounds and confidence still under 95%.
The user's probe state contradicts their spec (e.g., "new CLI" but bin/ already exists).
The spec describes a daemon, server, or long-running process (not a CLI tool).
The spec requires platform-specific behavior (Windows services, macOS keychain).

What this skill does not do

It does not implement the CLI tool. That's /goal's job.
It does not run tests. That's /goal's job.
It does not work for non-Node.js projects (Python CLIs, Go CLIs, etc.).
It does not work for web apps, servers, or daemons.
It does not fix bugs. For bug fixes, use the bug-fix templates in codex-goal-templates.md directly.
It does not handle publishing readiness (npm publish config, GitHub releases, etc.).