name: consult
description: "Lightweight conversational design consultation via external AI reviewer(s) — for abstract design questions, architectural tradeoffs, and 'should I X or Y?' decisions before any artifact exists. Single-codex by default; add --panel for a 3-model (codex+grok+agy) parallel find-holes panel, or --panel --repo for informed multi-model review of a real codebase. Pick a single model (--codex/--grok/--agy), or add --web for opt-in deep web research (live community practice). Triggers on 'help me choose between', 'compare options', 'search the web', 'talk through this architecture', 'what tradeoffs am I missing', 'what am I overlooking', 'find the holes', 'sanity check', 'ask all three models', 'Помоги выбрать', 'обсудим архитектурное решение', 'какие тут компромиссы', 'спроси codex', 'спроси все три модели'. Do NOT use single-consult to review a file/diff on disk as the target — use bulldozer:check, or --panel --repo for a multi-model read of real code (a question that merely names a file as context is fine)."
argument-hint: "[design question] — or: [--codex|--grok|--agy|--panel] [--web[=models]] [--repo PATH] [--verdict] "
allowed-tools: ["Bash", "Read", "AskUserQuestion"]
Bulldozer Consult — Conversational Design Validation
Core principle: Send a design question as inline text to an isolated codex process, get a decisive verdict, iterate cheaply. No files, no state, no ledger.
This skill is the lightweight sibling of /bulldozer:check. check is for artifact-grounded adversarial review (files, specs, configs); consult is for abstract design Q&A before any artifact exists.
When to Use
- "Help me choose between X and Y" — architectural decisions
- "What tradeoffs am I missing in this approach?" — design exploration
- "Is this approach reasonable before I build it?" — pre-implementation sanity check
- "Should we use pattern A or B for this problem?" — pattern selection
- "Talk through this design with me" — second opinion on an idea
Do NOT use for:
- Reviewing a file/diff/artifact on disk as the review target → use
/bulldozer:check(a question that merely names a file as context is fine — see Step 2, Case B) - Quick factual questions with deterministic answers → ask directly without codex
- Code review where tests exist → run the tests
- Implementation details (variable naming, exact syntax) → consult is design-level
Step 1: Model Selection (every launch)
Same flow as /bulldozer:check. Read saved preference from .bulldozer/config.md (key: reviewer_model), show user 4 options via AskUserQuestion, save choice.
Selection rules (in order):
- ALWAYS include current global model from
~/.codex/config.toml - ALWAYS include last used model from
.bulldozer/config.md - Fill remaining slots from: gpt-5.5, gpt-5.3-codex-spark, gpt-5.4-mini
- Mark saved choice as "(Recommended)"
Save choice → use as -m <model> argument to codex.
Step 2: Artifact Reference Notice (soft warn, NOT a hard stop)
Before invoking codex, scan the user's question for artifact references. If found, classify intent — do not blanket-block. The old hard-stop never once fired in 209 production launches (#107); the real hallucination guard is the Step 3 prompt directive + Step 4 isolation, not this step.
Artifact patterns to detect (case-insensitive):
| Pattern | Example |
|---|---|
| Absolute or relative file path | /0/DEV/foo.md, src/bar.py, ./config.json |
| File extensions in prose | *.md, *.py, *.ts, *.swift, *.yml, *.sql |
| Repo-style references | "see specs/X", "in path Y", "at line N" |
| Artifact pointers | "attached", "this spec", "this code", "the diff", "the file", "@file" |
| Code identifiers | function names with parens, class.method, fenced code blocks with language tags |
On a match, classify by INTENT:
- Case A — artifact as review TARGET ("review
foo.py", "what's wrong with this code", "check this diff"): STOP, recommend/bulldozer:check <path>. consult can't read files, so a review of a file belongs incheck(or--panel --repofor a multi-model read).Вопрос просит ревью самого артефакта (
<excerpt>).consultне читает файлы — запусти/bulldozer:check <path>для file-based ревью (или--panel --repo <path>для multi-model чтения кода). - Case B — artifact as decisional CONTEXT ("should we refactor
foo.pybecause X?", "we have N call-sites inbar.py— pattern A or B?"): show a one-line notice, then PROCEED. The question is an abstract design tradeoff that merely names a file; codex answers from the supplied context and discloses its basis (verified: a clean verdict with"Basis: from the supplied context only; I did not inspect files", zero file hallucination).Вопрос упоминает артефакт (
<excerpt>) как контекст, не как цель ревью — продолжаю consult (codex отвечает по тексту, файлы не читает). Нужно ревью файла →/bulldozer:check <path>. - Ambiguous (can't tell target from context): ask the user with AskUserQuestion (rare path).
Why this is a soft warn, not a stop: consult runs codex in a fully isolated tmpdir with no project access (Step 4), and the Step 3 prompt directive ("Do not inspect files") already prevents grounded hallucination. The pre-flight is a routing aid for Case A, not a gate that blocks Case B.
Step 3: Wrap the User Prompt
Build the prompt with belt-and-suspenders skill suppression (SKIP SKILLS at both ends) and a structured verdict requirement:
SKIP SKILLS. Do not inspect files or run tools. Text-only consultation.
---
<user's design question, verbatim>
---
SKIP SKILLS. Give a decisive verdict. Under 200 words. End with one sentence
stating the basis or limits of this advice, then exactly one final standalone
line — one of:
VERDICT: GO
VERDICT: NO-GO
VERDICT: MINOR-FIXES
The required final line is the anchored VERDICT: <X> form the Step 5
classifier accepts — keeping the prompt and parser in lock-step (no drift).
The SKIP SKILLS. prefix is prompt-level suppression (weak on its own — codex may still load skills if the prompt mentions skill design). The flags in Step 4 give process-level enforcement (strong).
Step 4: Invoke Codex in Full Isolation
TMPDIR_RUN="/tmp/bulldozer-consult-$$"
mkdir -p "$TMPDIR_RUN"
SESSION="${CLAUDE_CODE_SESSION_ID:0:8}"
OUT="$TMPDIR_RUN/verdict-r${ROUND}.txt"
(
cd "$TMPDIR_RUN"
timeout 180s codex exec \
--skip-git-repo-check \
--ignore-user-config \
--ignore-rules \
--ephemeral \
-s read-only \
-c model_reasoning_effort=xhigh \
-m "$MODEL" \
"$WRAPPED_PROMPT" \
< /dev/null > "$OUT" 2>"$OUT.err"
)
EXIT_CODE=$?
Every flag is load-bearing (validated empirically — see "Why this isolation" below):
| Flag | Purpose | Without it |
|---|---|---|
--skip-git-repo-check |
Allow exec from non-git tmpdir | codex refuses to start outside trusted dir |
--ignore-user-config |
Skip ~/.codex/config.toml & user skills |
codex loads superpowers/skill-creator into reasoning |
--ignore-rules |
Skip AGENTS.md hierarchy | codex inherits project rules into context |
--ephemeral |
No rollout, not resumable | Session persisted (data retention risk) |
-s read-only |
Block all writes | codex may inadvertently modify files |
Empty $TMPDIR_RUN cwd |
No project file access | codex reads project files via tools |
timeout 180s |
Hard cap on runtime | Runaway xhigh reasoning hangs the skill |
< /dev/null |
Block stdin re-auth prompt | codex hangs waiting for input |
2>"$OUT.err" |
Split stderr to a side file (NOT 2>&1) |
Chatty errors land in $OUT and get misread as a substantive prose answer → false INCONCLUSIVE instead of fail-closed NO-GO |
gtimeout (GNU coreutils on macOS) is an acceptable fallback if plain timeout is missing.
Check exit code:
0→ proceed to Step 5124→ timeout hit; tell user "codex exceeded 180s, try a shorter prompt or lower reasoning effort"- Other non-zero → read last 20 lines of
$OUT, report to user, do NOT silently retry
Step 5: Parse the Verdict (Fail-Closed)
Apply the verdict classifier — the §3.7 logic shared with scripts/consult_panel.py::classify_verdict. The old sed + loose \bGO\b matching is gone (it misfired on prose like "a go-to pattern"):
- Only an anchored standalone line matching
^\s*VERDICT:\s*(GO|NO-GO|MINOR-FIXES)\s*$(case-insensitive) counts. Incidental prose is ignored. - Multiple anchored lines → the FINAL one wins (the model's conclusion beats an earlier echoed option).
- No anchored line → strip the CLI banner/footer, then:
- a substantive remaining line (≥3 words) → INCONCLUSIVE — show the prose and prompt the user to re-ask for a crisp verdict. INCONCLUSIVE does not count toward the Step 7 escalation trigger.
- empty / whitespace / banner-only → NO-GO (fail-closed for real failures).
Run the classifier directly rather than matching by hand:
VERDICT=$(python3 - "$OUT" <<'PY'
import os, sys, glob
# Resolve the consult scripts dir WITHOUT relying on $CLAUDE_PLUGIN_ROOT — it is NOT exported
# to the Bash tool's environment (empirically empty, CC 2.1.185 — #221). Prefer it if set
# (future-proof), else the newest installed plugin-cache copy.
cands = ([os.path.join(os.environ["CLAUDE_PLUGIN_ROOT"], "skills/consult/scripts")]
if os.environ.get("CLAUDE_PLUGIN_ROOT") else [])
cands += sorted(glob.glob(os.path.expanduser(
"~/.claude/plugins/cache/*/bulldozer/*/skills/consult/scripts")),
key=os.path.getmtime, reverse=True)
scripts = next((d for d in cands if os.path.isdir(d)), None)
if not scripts:
sys.exit("consult scripts dir not found (#221 resolver)")
sys.path.insert(0, scripts)
from consult_panel import classify_verdict
print(classify_verdict(open(sys.argv[1]).read()))
PY
)
Do NOT use os.environ["CLAUDE_PLUGIN_ROOT"] here — that var is resolved only in plugin manifests / markdown substitution, NOT exported to the Bash tool's shell (empirically empty in CC 2.1.185, KeyError — #221). The plugin code lives in ~/.claude/plugins/cache/… and the Bash tool's cwd is the consumer project, so the snippet self-resolves the scripts dir: it honors $CLAUDE_PLUGIN_ROOT if ever set, else falls back to the newest plugin-cache copy (the abs path the skill header also prints as "Base directory for this skill"). A bare relative skills/consult/scripts would not resolve.
Why fail-closed: a missing/malformed verdict signals codex didn't follow instructions — silent GO would let bad advice through. INCONCLUSIVE is the middle path so a substantive prose answer that merely lacks the token is no longer a false NO-GO. Capture codex to $OUT with split streams (> "$OUT" 2>"$OUT.err") so a chatty error on stderr can't be misread as a substantive prose answer — a real failure leaves $OUT empty/banner-only → fail-closed NO-GO.
Step 6: Report to User
Show the cleaned verdict text (not the raw $OUT — too noisy). Format:
**Verdict: <GO|NO-GO|MINOR-FIXES|INCONCLUSIVE>** (round <N>, <time>s, <tokens> tokens, model <M>)
<verdict body>
Then prompt for next action:
- GO → "Готово. Применять рекомендацию?"
- MINOR-FIXES → "Учесть фиксы и переспросить? (новый round)"
- NO-GO → "Переформулировать или эскалировать на
/bulldozer:check?" - INCONCLUSIVE → codex gave a substantive prose answer but no crisp verdict line. Show the prose and ask: "Переспросить с явным требованием вердикта (
VERDICT: ...)?" This is a re-ask, not a NO-GO, and per Step 7 it does not count toward the escalation trigger — it also does not occupy a slot in thelast_two_verdictswindow (skip it).
Step 7: Multi-Round Escalation Rule
consult supports multiple rounds in the same Claude session — the user adjusts the question text and we re-invoke codex. Each round is independent (stateless by design).
Track round state in this session's conversation context (NOT in any file):
round_count— how many codex invocations so farlast_two_verdicts— sliding window
Escalation trigger: if round_count >= 3 AND both last_two_verdicts == "NO-GO":
Третий раунд consult, два последних вердикта — NO-GO. Это означает, что вопрос вышел за рамки lightweight консультации. Рекомендую переключиться на
/bulldozer:checkс конкретным артефактом (создай design doc файл, потом проверь его). Продолжить consult всё равно?
User can choose to continue or escalate. We do NOT auto-invoke check.
Step 8: Cleanup
After every invocation (success or failure):
rm -rf "$TMPDIR_RUN"
Tmpdir is per-PID so concurrent invocations don't collide. No state survives between invocations. No logs with prompt content (by design — see "What we don't do" below).
Panel Mode (--panel) — Multi-Model Find-Holes
Opt-in: run three models (codex + grok + agy) in parallel instead of one codex, for diversity on hard questions. Orchestrator: scripts/consult_panel.py (Steps 1–8 above are the single-codex default; panel is a separate entrypoint). agy = Antigravity CLI (Gemini models); it replaced the retired gemini CLI (#189).
When panel beats single-consult:
- "What am I overlooking / what are the holes here?" — find-holes diversity (~50% of findings are unique to one model; verdict diversity is ≈0, so don't use panel just for a GO/NO-GO).
- A high-stakes design call where one reviewer's blind spot is costly. Cost: 4 model calls, ~17–60s typical. Worst case approaches 2×
--timeout(default 180s): the summarizer runs serially after the parallel triad, so a slow-but-not-failed model plus a slow merge stack up.
Invocation:
# Resolve the plugin dir WITHOUT $CLAUDE_PLUGIN_ROOT (NOT exported to the Bash tool — #221):
# honor it if set, else the newest plugin-cache copy.
BULLDOZER_DIR=$( { [ -n "${CLAUDE_PLUGIN_ROOT:-}" ] && [ -d "$CLAUDE_PLUGIN_ROOT/skills/consult" ] \
&& printf '%s\n' "$CLAUDE_PLUGIN_ROOT"; } || ls -dt ~/.claude/plugins/cache/*/bulldozer/*/ 2>/dev/null | head -1 )
PANEL="$BULLDOZER_DIR/skills/consult/scripts/consult_panel.py"
# isolated find-holes (abstract question, no file access) — all three, panel default
python3 "$PANEL" "<question>"
# pick specific models (any combination); bare panel-script call = all three
python3 "$PANEL" --grok "<question>" # one model alone
python3 "$PANEL" --codex --grok "<question>" # a subset
# informed find-holes (models READ the real repo) — for questions ABOUT a codebase
python3 "$PANEL" --repo <path> "<question>"
# verdict mode (per-model GO/NO-GO, no merge) — add --verdict (works with --repo too)
python3 "$PANEL" [--repo <path>] --verdict "<question>"
# --web: opt-in DEEP web research (web search + subagents). ALWAYS use the =form so it
# can't eat the question; blanket = all selected, scoped = a comma-list.
python3 "$PANEL" --grok --web=grok "<question>" # web on grok only
python3 "$PANEL" --panel --web=codex,grok "<question>" # web on codex+grok; agy isolated
The BULLDOZER_DIR resolver above is required because $CLAUDE_PLUGIN_ROOT is not exported to the Bash tool (#221) and the script lives in the plugin cache (~/.claude/plugins/cache/…), not the consumer project the Bash tool runs from.
Output: a merged ## SHARED / ## UNIQUE synthesis (find-holes) or a per-model verdict line (verdict), with raw per-model blocks below. Exits non-zero iff every model failed; one model failing degrades to a [<model>: failed — …] block and the panel continues with survivors.
--repo is the deliberate exception to Step 2's artifact ban. When the question is about real code, --repo grants the three models read access to that repo (informed mode). Split-test (2026-06-02): informed ≫ isolated for repo-specific questions, tie for abstract — so informed is opt-in (your code goes to the cloud reviewers only when you ask). Single-consult and isolated --panel stay file-free.
Isolation: all three run on the REAL HOME now — none gets a HOME sandbox. codex isolates via flags (--ephemeral etc.); grok via --no-memory/--no-subagents (a HOME-sandbox broke grok's --repo tool-worker auth — it cancelled on every informed run, #147); agy's auth is macOS-Keychain-bound + OAuth-only, so there is no copyable token to seed a sandbox HOME and a fake HOME triggers re-auth (#189); non-interactive -p (stdin /dev/null) onboards an unknown repo with no OAuth prompt and starts no MCP server. Read-only enforcement (#189): agy --print AUTO-ACCEPTS every tool call — verified that NO flag/config disables it (not dropping --dangerously-skip-permissions, not --sandbox, not tools.autoAccept:false, not deny lists, not per-project settings; print mode can't prompt so it just executes). The deterministic gate is a fail-closed PreToolUse hook: the agy leg runs with cwd = a temp dir seeded with .agents/hooks.json whose hook ALLOWS only an EXACT set of known read tools (read_file/view_file/list_dir/grep_search/… — view_file+list_dir empirically confirmed) and DENIES everything else before it runs — any unlisted/mutating/command-exec tool, and any malformed input, returns {"decision":"deny"}. An exact allowlist fails closed; the earlier substring blocklist let save_memory/shell/exec/save_file and malformed stdin through (#189 code-review). The repo is reached ONLY via --add-dir (read), never as cwd — so a denied write can't touch it. Reads/search are allowed, so the review still works. grok/agy need their own logins.
agy notes (#189): agy prints plain text (no JSON envelope), so the parser is parse_codex. It persists each -p call's prompt+response in a per-call session dir ~/.gemini/antigravity-cli/brain/<conversationId>/ (plaintext transcripts — verified by a unique-marker probe) plus a conversations/<id>.db. For statelessness the run injects a unique NONCE into agy's prompt (which agy logs into its transcript); _run_one snapshots the brain id-set BEFORE the run and afterward deletes only NEW UUID-named dirs whose transcript carries the nonce — plus conversations/<that-id>.db*. A CONCURRENT agy session (e.g. the user's visual/IDE Antigravity app running at the same time) creates its own brain/<uuid> WITHOUT our nonce, so it is never swept — that is what makes the diff visual-safe (verified live: a single panel run while the GUI created 2 new dirs left all 40+ untouched). (Earlier id-via-hook capture failed in the panel-default isolated mode — agy makes ZERO tool calls there, so the hook never fired and the transcript LEAKED, #189 code-review; the nonce works in both modes. The id is validated as a full UUID and the glob anchored to <id>.db*, so a prefix-sibling conversation is never swept.) Note: the panel does NOT touch ~/.gemini/.../cache/projects.json, which accumulates a harmless stale entry per run for the agy leg's temp cwd. Override the model with BULLDOZER_AGY_MODEL (an agy models label) if the default Gemini 3.1 Pro (High) is renamed. Prompt framing matters: the informed find-holes wrapper uses BEHAVIORAL wording ("where the code could behave incorrectly, surprisingly, or not as a caller expects"), NOT "holes/bugs/vulnerabilities/security" — the latter trips Gemini's safety refusal on security-flavoured code (proven on an auth file: trigger-word framing → refused ×2; behavioral framing → full review naming the same issues). The no-write_file clause is kept (agy, like the old gemini CLI, can otherwise save findings to a file and return empty). The retired gemini CLI returned empty response on informed --repo runs (agentic plan-mode) and was deprecated 2026-06-15; agy replaced it. Display label in the panel output stays "Gemini" (same models). Design: docs/superpowers/specs/2026-06-02-consult-panel-design.md.
Web lane (--web) — opt-in deep web research
--web lets the selected models do deep web research (web search + their own subagents), grounding the answer in CURRENT real-world practice — for questions that need live community practices, not just training knowledge. Per-model: --web=grok,agy (scoped) or bare --web (blanket = all selected). Always emit the = form (or put a bare --web LAST): argparse otherwise binds the question to --web and errors (empirically confirmed). Per-model effect: codex -c web_search="live" (canonical key); grok drops --no-subagents/--disable-web-search but KEEPS read-only --permission-mode plan (verified: still spawns parallel subagents); agy's read-only hook ALLOW gains search_web/read_url_content.
READ-side only, by design. --web enables web fetch/search + subagents (reading), NEVER write/shell. So --web --repo lets a model read your real code AND reach the web — it reverses the #189 no-egress guarantee (opt-in, like --repo: code/design can egress to the web), but it can NOT mutate the repo. Default (no --web) stays fully isolated / no-egress.
Output & persistence: each web model's raw research (large, sometimes garbled by parallel subagents) is pre-compressed to a findings+URL digest before the merge (inline shows the digest); the full raw is saved to .bulldozer/consult-<ts>/ (self-ignoring .gitignore; research.md + raw-<model>.md; keep-last-10) for drill-down. The default --timeout rises to 600 s under --web (research runs ~3 min).
Logging
Two kinds of line land in ~/.claude/hooks/bulldozer-consult.log, told apart by their fields:
1. Start-marker — the UserPromptSubmit hook (hooks/hooks.json) writes one lean line the moment /bulldozer:consult is typed. It records only that an invocation started — nothing has run yet, so it carries NO verdict/tokens/model:
<ISO8601-ts> | event=consult-invoke | project=<P>
(#107 lesson: the marker used to carry always-empty verdict= / tokens=0 / model=; those were dropped — empty fields read as missing completion data, not telemetry.)
2. Completion line — one line per COMPLETED invocation, when the outcome is known. The panel path (--panel / per-model --codex/--grok/--agy / --web) writes it deterministically from consult_panel.py::_log_completion; the inline single-codex flow writes it from Step 6 below. Two shapes, both metadata-only:
Inline single-codex — model= (one model), written by Step 6. Strict 8 fields, this order:
<ISO8601-ts> | session=<S> | round=<N> | verdict=<V> | tokens=<T> | time=<X>s | model=<M> | project=<P>
2026-05-25T03:15:00+03:00 | session=f7186873 | round=1 | verdict=GO | tokens=4500 | time=4.3s | model=gpt-5.5 | project=/path/to/repo
Panel / per-model / --web — models= (comma list) + web=, written by the script:
<ISO8601-ts> | session=<S> | round=1 | verdict=<V> | tokens=NA | time=<X>s | models=<m,…> | web=<m,…> | project=<P>
2026-06-22T00:30:08+07:00 | session=NA | round=1 | verdict=find-holes | tokens=NA | time=12.4s | models=codex,grok | web=codex | project=/path/to/repo
| Field | Value — and the ONLY accepted "no data" form |
|---|---|
session |
${CLAUDE_CODE_SESSION_ID:0:8}; NA if unset — never the project name |
round |
integer ≥ 1 (panel is single-shot → always 1) |
verdict |
inline: GO / NO-GO / MINOR-FIXES / INCONCLUSIVE. Panel: find-holes (no GO/NO-GO) · the collapsed per-model verdict or mixed (disagreement) in --verdict mode · ERROR (all models failed). Never TBD or empty. |
tokens |
inline: integer, NA if codex didn't report it. Panel: always NA (no per-model token capture yet). One sentinel, not N/A/na/-/~/``/0. |
time |
seconds, one decimal, s suffix (e.g. 4.3s) |
model / models |
inline: the -m model id. Panel: comma-joined selected models (plus web= = the subset doing web research, empty when none) |
project |
git rev-parse --show-toplevel 2>/dev/null || pwd |
What we do NOT log: the prompt content, the verdict body, any user-supplied text. Only metadata. This is a deliberate privacy property — see "What we don't do".
What We Don't Do (and Why)
These are not oversights — they are validated design choices:
| Anti-feature | Why we don't have it |
|---|---|
Persistent mode (codex exec resume) |
3 of 4 dogfood runs voted REMOVE: data retention, stale context, 2× implementation surface, broken reproducibility. Realistic continuity scenarios are addressable by user copying prior verdict text into new prompt. |
| Session log with prompt content | Raw prompt snippets leak sensitive design text. Metadata-only log preserves observability without retention risk. |
Auto-invoke /bulldozer:check on escalation |
Escalation is advisory only — user decides. Silent skill chaining hides decisions. |
| File reading / project access | Process-level isolation (empty tmpdir + --ignore-user-config --ignore-rules) is the security boundary. Prompt-level "SKIP SKILLS." alone is theatrical. |
| Counting blockers in verdict prose | Empirically unparseable from short codex output (≤200 words). We use round count + NO-GO repetition instead — simple string match, reliable. |
Custom config file in .bulldozer/consult.md |
YAGNI. Model preference is shared with check via .bulldozer/config.md (single key: reviewer_model). Adding consult-specific config drifts both. |
| A wrapper script for the SINGLE-codex flow | The single-consult bash flow fits inline (no shared state, no complex logic). The --panel mode is the deliberate exception — it DOES ship scripts/consult_panel.py because 3 parallel models + per-model isolation + merge is genuinely complex logic that does not fit inline. |
Common Mistakes
| Mistake | Fix |
|---|---|
Running codex from project root (-C $PROJECT_ROOT) |
Use empty tmpdir cwd — codex with project access reads files and hallucinates with false grounding |
Skipping --ignore-user-config because --ephemeral was set |
--ephemeral blocks rollout, NOT skill loading. Codex will still load user skills if the prompt mentions skill design. Use both. |
| Trusting "SKIP SKILLS." prefix alone | Empirically observed: codex loaded skill-creator anyway when prompt mentioned "skill design". Always combine prompt-level + process-level. |
| Hard-blocking every file-path mention | Step 2 is a soft warn: route Case A (artifact = review target) to check; PROCEED on Case B (artifact named as context). Blanket blocks are false positives — the hard-stop never fired in 209 prod launches (#107). |
| Treating missing verdict as silent success | Fail-closed: no anchored VERDICT: line AND no substantive prose → NO-GO. Substantive prose without the token → INCONCLUSIVE (re-ask), not a false NO-GO. |
Parsing full output for verdict by hand |
Use classify_verdict (§3.7): anchored VERDICT: line only, final one wins, banner stripped. The old codex↔tokens used sed misfired on prose like "a go-to pattern". |
| Running multi-round in background | FOREGROUND ONLY. User explicitly invokes each round. No automation, no cron. |
Red Flags — STOP and Reassess
- Codex returns GO on first round with zero specifics → likely didn't engage; re-prompt with clearer scope
- Same NO-GO content repeats across rounds → the question may need a
checkartifact (escalation rule fires) - Verdict references files you didn't mention → codex hallucinating; verify isolation flags are actually applied
- Tmpdir wasn't cleaned (find
/tmp/bulldozer-consult-*from a prior run) → previous invocation crashed before cleanup; safe torm -rfbut investigate why - User keeps adding artifact references to dodge pre-flight → the right answer is
check, not a more clever consult
Quick Reference — Full Invocation Template
# 1. Pre-flight (Claude-side regex on $ARGUMENTS)
# 2. Model selection (AskUserQuestion + .bulldozer/config.md)
# 3. Wrap prompt
WRAPPED_PROMPT=$(printf 'SKIP SKILLS. Do not inspect files or run tools. Text-only consultation.\n---\n%s\n---\nSKIP SKILLS. Give a decisive verdict. Under 200 words. End with one sentence stating the basis or limits of this advice, then exactly one final standalone line — one of:\nVERDICT: GO\nVERDICT: NO-GO\nVERDICT: MINOR-FIXES\n' "$USER_PROMPT")
# 4. Isolated invocation
TMPDIR_RUN="/tmp/bulldozer-consult-$$"
mkdir -p "$TMPDIR_RUN"
(
cd "$TMPDIR_RUN"
timeout 180s codex exec \
--skip-git-repo-check \
--ignore-user-config \
--ignore-rules \
--ephemeral \
-s read-only \
-c model_reasoning_effort=xhigh \
-m "$MODEL" \
"$WRAPPED_PROMPT" \
< /dev/null > verdict.txt 2>verdict.err # split streams: stderr noise out of the verdict file
)
EXIT=$?
# 5. Parse verdict (fail-closed) — §3.7 classifier, not sed/loose-regex
# (self-resolves the scripts dir — $CLAUDE_PLUGIN_ROOT is NOT in the Bash env, #221)
VERDICT=$(python3 - "$TMPDIR_RUN/verdict.txt" <<'PY'
import os, sys, glob
cands = ([os.path.join(os.environ["CLAUDE_PLUGIN_ROOT"], "skills/consult/scripts")]
if os.environ.get("CLAUDE_PLUGIN_ROOT") else [])
cands += sorted(glob.glob(os.path.expanduser(
"~/.claude/plugins/cache/*/bulldozer/*/skills/consult/scripts")),
key=os.path.getmtime, reverse=True)
scripts = next((d for d in cands if os.path.isdir(d)), None)
if not scripts:
sys.exit("consult scripts dir not found (#221 resolver)")
sys.path.insert(0, scripts)
from consult_panel import classify_verdict
print(classify_verdict(open(sys.argv[1]).read()))
PY
)
# 6. Log metadata, cleanup
S="${CLAUDE_CODE_SESSION_ID:0:8}"; S="${S:-NA}" # session id, never the project name (#107 schema)
T="${TOKENS:-NA}"; [ "$T" = "0" ] && T="NA" # 0 is the no-data stub the schema forbids, not a count (#107)
echo "$(date -Iseconds) | session=$S | round=$ROUND | verdict=$VERDICT | tokens=$T | time=${ELAPSED}s | model=$MODEL | project=$(git rev-parse --show-toplevel 2>/dev/null || pwd)" >> ~/.claude/hooks/bulldozer-consult.log
rm -rf "$TMPDIR_RUN"
Why This Isolation (Empirical Basis)
Each design choice traces back to a measured failure mode. Dogfooded against codex over 11 independent runs before shipping (see PR description for full reproducibility trail).
| Decision | Measurement |
|---|---|
| Process-level isolation vs prompt-level | Without --ignore-user-config: 43s, 51K tokens, 1900 lines of skill-loading noise. With all 5 flags: 4s, 4.5K tokens, 0 noise. ~10× faster, ~12× cheaper. |
| Stateless only | 3 of 4 cross-framing dogfood runs (neutral A2 + adversarial A3 + adversarial dogfood-2) independently voted REMOVE persistent. |
| Artifact pre-flight | Soft routing aid, NOT a kill-switch (#107): the hard-stop framing never fired in 209 prod launches — Case A (review target) routes to check, Case B (context ref) proceeds. The real hallucination guard is the Step 3 directive + Step 4 isolation. |
| Fail-closed verdict parsing | Empirical: codex output noise can suppress GO matches; silent default to NO-GO forces user re-prompt rather than acting on absent advice. |
| Escalation rule (round≥3 + 2× NO-GO) | Alternative rule "count blockers ≥ 5" empirically unparseable from short prose (verified directly via B2). |
Integration with Other Skills
/bulldozer:check— escalation target when consult is too lightweight (artifact exists or stuck after 3 rounds)/receiving-code-review— discipline for evaluating codex's verdict honestly before applying it to the design/brainstorming— runs BEFORE consult to shape the question; consult validates the resulting design idea
Feedback
If you encounter friction while using this skill — documentation mismatch, missing capability, unclear error, or need a workaround — create a GitHub issue.
Create issue when:
- SKILL.md describes behavior X, reality is Y
- Had to use a workaround instead of the standard path
- Need a feature that doesn't exist
- Script failed with unhelpful error
- Codex returned consistently unusable output for a valid design question
- Pre-flight artifact detection misfired (false positive or false negative)
Do NOT create issue when: own mistake in arguments, external problem (Codex CLI not installed, network down), or behavior documented as a known limitation.
Command:
gh issue create --repo A3IO/jaine-plugins \
--label "feedback,bulldozer,consult" \
--title "[feedback/consult] short description" \
--body "$(cat <<ISSUE
## What I was doing
{task description}
## What I expected
{expected behavior}
## What happened
{actual behavior, errors}
## Workaround used
{what was done instead, or "none — blocked"}
## Environment
- Plugin version: $(jq -r .version "$(ls -dt ~/.claude/plugins/cache/*/bulldozer/*/.claude-plugin/plugin.json 2>/dev/null | head -1)" 2>/dev/null || echo unknown)
- Skill: consult
- Project: $(pwd)
ISSUE
)"
After creating the issue, tell the user:
"Я создал feedback issue про consult: {URL}. Продолжить с workaround или сначала пофиксим?"