slice - SKILL.md Agent Skill

name: slice description: Trust-first recursive decomposition of a development task — baseline → vertical slices → Canon-TDD execution → adversarial verification → per-leaf commits → deterministic integration + owner's briefing. Use whenever a coding task is big, risky, multi-file, or vague enough that "just doing it" could silently break things — "implement X across the app", "refactor this subsystem", "migrate Y", "fix this and also clean up around it" — and whenever the user asks to decompose work, wants TDD discipline enforced, or wants changes they can verify and trust rather than take on faith. Works on any language/stack (the baseline step learns the repo's own build/test physics). Not for a single-file fix already diagnosed to a line — do that directly. Args = the task description (optionally with a repo path).

/slice — recursive, trust-first decomposition

You are the front door to the recursive-slice workflow engine (bundled as recursive-slice.js in this skill's base directory). It encodes Kent Beck's Mastering Programming heuristics (Slicing, One-thing-at-a-time, Baseline Measurement, Concrete hypotheses) in service of the Trust Factory objective: every step leaves verifiable evidence and avoids silent surprise.

Trust Factory sets the ceremony — read first

Priority: Trust Factory first, then TDD + Tidy First as its mechanisms — never the reverse. The test is never "did I run the full ceremony?" but "did I manufacture the trust this change actually lacked, and no more?" Trust you already hold is free — a clean compile, a green filtered test, a diff you can read whole and revert in one commit. Spend the costly ceremony (executor≠verifier, adversarial verify, baseline/slice/integrate rounds, parallel worktrees) only on the trust deficit. Uniform max-ceremony lowers trust-per-hour; a 10-minute change that takes an hour is a tier error, not rigor.

Two reflexes before the tiers (reflexes, not research projects): (1) Scope — the cheapest, highest-trust change is the one you don't make: does it need to exist (YAGNI)? stdlib / native / one line before fifty? (the scope floor — details in code-fundamentals). (2) Deliberation — don't spend longer deciding what not to build than building it would take; if your explanation is longer than the change, cut the explanation (prose defending a simplification is complexity smuggled back in). A care-process that ruminates is slower for zero trust gain.

Ceremony ladder — pick the LOWEST tier that still guarantees the baseline floor:

T0 deterministic — a compiler / type-check / filtered test already proves it → just run it. No agent, no engine.
T1 legible — one change you can read whole + a filtered test → do it inline yourself: failing test → fix → filtered suite → one commit. Two hats (behavior, then structure), one head. Default for simple work — most tasks exit here, no engine.
T2 manufactured — ≥2 independent risky leaves, an unknown interface/API, cross-cutting plumbing, a security gate, or an irreversible seam → the engine earns its 100k–700k tokens: executor≠verifier, adversarial verify, and independent leaves run in parallel (worktree groups + coordinator) by default — serial buys no extra trust, only wall-clock.

This T0/T1/T2 ladder is the front-door routing decision (do nothing / inline yourself / launch the engine). It is ORTHOGONAL to the engine's internal riskTier: light/standard/heavy, which — once you are inside a T2 run — sets how hard each leaf is verified. Don't conflate them: a T2 lane still has light, standard, and heavy leaves within it.

The floor (baseline invariants, per-leaf reversibility, evidence) is non-negotiable at every tier; only the ceremony above it scales with risk. Step 2 below is the per-group mechanics.

What to do

Resolve the task and repo.
- The task = the user's $ARGUMENTS (or, if empty, the work currently under discussion).
- The repo = the relevant project root. If ambiguous, infer from the conversation/cwd; ask only if you genuinely cannot tell.
Don't loop what doesn't need a loop (judgment stays with you, not the engine). A diagnosed, single-file, single-leaf fix (you already have file:line + the failing behavior) is cheaper and safer done DIRECTLY in-conversation with the same discipline inline: failing test first → fix → filtered suite → full gate → one commit. Reserve the engine for work with ≥2 genuinely separate leaves, an unknown decomposition, or a risky seam needing adversarial verification. A full lane spends 100k-700k tokens on ceremony that a 10-line wiring fix does not need. This rule applies PER GROUP, not per task. When the spec already carries the decomposition with file:line evidence and KNOWN remedies (typical after an audit), do those groups YOURSELF inline — same discipline, no agent ceremony: failing test → known fix → filtered suite → one commit per fix — and lane ONLY the genuinely risky seams (unknown APIs, cross-cutting plumbing, security gates). Measured on a live remediation lane: a known-fix leaf costs ~~5 agents (~~25min) through the engine vs ~10min inline; running audit-prescribed groups through the full engine roughly DOUBLED the lane's wall-clock for no trust gain (the audit was the verification). Mixed task → split it: inline the known, lane the risky.
Write the task as a LANE SPEC, not a wish. Runs given precise specs went 11/11–12/12 trusted; vague specs produced verifier rejections and repair loops. Include, in the task string:
- Evidence: the defect/feature anchored to file:line + the observed symptom (paste the line of code if you've already diagnosed it). Diagnose BEFORE launching, not inside the run.
- MUST PRESERVE: behaviors the run may not regress (the baseline protects tests; this protects un-tested behaviors you know matter).
- Purpose: what the USER should observe afterward — this feeds the purposeGap check.
- Wiring clause (UI/feature lanes): name the REAL production path the change must be reachable from ("wire into the screen/endpoint the app actually serves, not a legacy variant") — the #1 recurring defect class is built-tested-but-unwired.
- Known flakes: name them (e.g. "exit 137 = watchdog timeout, not a failure"). The engine already auto-retries the integrate gate once on exit 137.

Right-size the call. Tune the workflow args before launching:

maxDepth: 2 for a contained task, 3–4 for a genuinely large one. Keep it small — the floor is the anti-explosion guard.
parallel — ON by default (the engine's native disposition: parallelize the independent). You rarely pass it; pass parallel: false only to FORCE sequential. The partition ENGINEERS independence (file-disjoint cores in parallel; shared-file wiring as a final sequential group), and light overlap is fine — the Coordinator merges branches and resolves conflicts honoring both sides, with the deterministic full-suite net behind it. The default is cheap because the engine auto-falls back to sequential whenever parallel can't help — no git, dirty main tree, compile-bound builds (unless sharedScratch/forceParallel lifts it), or <2 independent groups. So single-seam work runs sequentially on its own; you don't opt out.
sharedScratch — now AUTO-ON for compile-bound repos (the engine enables it whenever the Baseliner reports coldBuildCost: "expensive"; you no longer pass it). It lifts the compile-bound fallback PROPERLY: all worktrees share one build dir (--scratch-path), deps compile once, builds serialize on its lock (measured: 3×cold ≈ 9-15min → serialized-warm ≈ 1-2min). This kills the recurring drift of forgetting it on Swift/Rust/CMake lanes and silently crawling. Pass sharedScratch: false ONLY to force the old per-worktree-cold behavior (rarely what you want).
confirmTier — the engine stops before any leaf runs when it detects an over-tier launch: a compile-bound repo whose plan is ≤3 slices ALL judged low-risk (light) — i.e. inline-T1-shaped work the engine would only make slower (the 5.8h-run lesson). The stop is a clean half-nothing (zero leaves, commits, worktrees; lock released) and names the escape. Pass confirmTier: true to override and force the engine anyway. A single non-light slice OR any completeness-critic expansion bypasses the gate — so it fires only on the narrowest, explicitly-lowest-risk run, and trust stays exactly where it is fragile. (Verifier speed: on compile-bound repos the verifier no longer adds a redundant full build when the engine already ran a deterministic build+filtered-test gate.)

skills: [paths] — domain-guidance guide files threaded into every executor/verifier (executors apply them, verifiers enforce them; repo conventions win on conflict). AUTO-SELECT these yourself — selection is part of right-sizing the call, never something the user must ask for. Resolve each name to its SKILL.md absolute path, checking in order: project .agents/skills/<name>/, user ~/.agents/skills/<name>/, plugin cache ~/.claude/plugins/cache/*/*/<sha>/skills/<name>/ (latest sha), then any local skill-library index. Selection table (COMPOSE matching rows; cap ~4 entries):

lane touches	add
any substantial code (default)	`code-fundamentals`
shipping code with no/weak test rig (the testing-readiness gate's fix path)	`test-foundations`
fuzzy / ambiguous spec or exploratory entry point	`spec-first`
React/Next.js UI (.tsx/.jsx, components/hooks/routes)	`toss-frontend-fundamentals` + `vercel-react-best-practices` (if-installed) + `vercel-composition-patterns` (if-installed)
build config / deps / workspace / codemod migration	`build-config-drift`
bug-hunt / regression / rootcause lane	`issue-rootcause-workflow`

The test-foundations row is the named REMEDY the engine's testing-readiness gate points at: when the Baseliner reports rigPresent:false the run halts before any work, and scaffolding a scripts/verify.sh (or a real test command) via test-foundations is the zero-flag fix that flips the gate green on re-run (the confirmNoRig:true arg is the explicit-bypass alternative). The vercel-* guides are (if-installed): resolve them only if present in the skill-library search path (see install note below); a missing guide is silently dropped, never a blocker. A lane in a domain with no matching guide gets none. Every entry taxes every leaf's attention, but the engine's per-leaf RELEVANCE GATE (leaves skip guides whose domain doesn't match their contract) bounds the cost for mixed lanes — so a full-stack lane may safely carry both backend and frontend guides.

Pass repo as an absolute path so the agents (whose cwd may differ) find it.

Check the runway, then launch via the Workflow tool, pointing at the bundled engine: Workflow({ scriptPath: '<skill-base-dir>/recursive-slice.js', args: { task, repo, maxDepth, parallel } }) (or copy recursive-slice.js to ~/.claude/workflows/ once and use { name: 'recursive-slice' }) It runs in the background; you'll be notified on completion. Tell the user they can watch live progress with /workflows.
- QUOTE AN ETA BEFORE YOU LAUNCH (delivery-predictability is a reliability axis). A correct result that lands at an unpredictable hour erodes trust as much as a wrong one — "a slow tool trains its owner to distrust it" applies to납기 too. Before launching, state a rough wall-clock estimate to the user, derived from the shape: ~(leaf count) × (per-leaf build+test cost), with compile-bound repos dominated by serial rebuilds. If that number is uncomfortable for a task you diagnosed to file:line, that is the signal you mis-tiered — drop the low-risk leaves to inline T1 (step 2) and reserve the engine for the genuinely risky seams. Never let "long" be a surprise. BEFORE launching, enforce the operational rules:
- ONE workflow per WORKING TREE, always (rs-lock enforces it — two runs mutating one tree corrupt each other; correctness, not cost). ACROSS repos, concurrent workflows are fine when the owner accepts the burn rate — they share one API quota, and three at once once starved every agent into stall-kills, so check what else is consuming quota before stacking. Treat "its git deposits landed" as proof of NOTHING — a run can stall-retry for hours after its last commit.
- Quiesced, clean tree: nobody (human or other run) edits the target tree while a run is live. The engine refuses a tree another run holds (lock in the tree's gitdir, rs-lock); if a run crashed and left a stale lock, confirm no workflow task is alive, then remove it.
- Crash recovery: the orchestrator dying loses NO work (per-leaf commits survive). git stash any unverified in-flight edits (never trust uncommitted debris), then relaunch with Workflow({ scriptPath: <persisted script>, resumeFromRunId: <run id> }) — completed agents replay from the journal cache; only the stalled leaf re-runs live.
- Quota-halt auto-resume: if the run returns with an aborts entry starting quota-halt: (the engine's circuit breaker tripped on a session/usage limit), do NOT hand it back to the owner to babysit: parse the reset time from the quoted error when present (e.g. "resets 12:20am"), ScheduleWakeup for a few minutes past it (fallback ~60min if unknown), and on wake relaunch with the SAME args + resumeFromRunId — cached leaves replay free. The quiesce rules still apply first: stash crash debris, remove a stale rs-lock.
- After killing a run, also kill its ORPHANED test processes: stopping a workflow does NOT kill the test runner its executor had in flight (whatever the stack spawns: swift-test/swiftpm-testing-helper, jest/vitest workers, cargo test runners…). The orphan keeps the build/test lock forever (its parent is dead), and the NEXT test run hangs at 0% CPU waiting on it — looks like "tests are slow", is actually a deadlock. Check ps for stale runners (old etime, 0 CPU) and kill them before re-running anything.
- Hang triage in one line: ps -o pcpu,etime — old etime + ~0% CPU = deadlock, not work (a working test/build burns CPU). Don't wait on a 0%-CPU runner; capture the last-started test (serial/PTY run if output is block-buffered), sample the host process, then kill.
- Background Bash tasks OUTLIVE the tool timeout (observed: a 8-min-timeout suite ran 38 minutes). For anything that can hang, build the wall-clock watchdog INTO the command (e.g. ( sleep N && kill … ) &) — never rely on the harness timeout in background mode.
On completion, report the trust ledger — not a wall of text:
- the baseline that was protected,
- how the task was sliced (the decomposition tree),
- per-leaf: passed + whether the verifier trusted it,
- the final integration verdict (deterministic full-suite green/red),
- anything the agents flagged (funList tangents, untrusted leaves, MAX_LEAVES truncation, aborts — a unit halted by the untrusted-streak guard means the APPROACH failed, lead with it). Surface untrusted leaves and false-green risks prominently — that is the whole point. Then relay the briefing field (the Owner's Briefing) in full — it is the owner's guided read for repaying comprehension debt (reading order, decisions made for them, buried bodies, what to verify by hand). Never summarize it away; the owner reading code is the loop's one unautomatable obligation. Persist the briefing durably: append it to docs/briefings/<date>-<lane>.md in the target repo (create the dir if missing) and commit it with the lane. /tmp copies die with the machine; the briefing IS the comprehension-debt ledger and must live with the code it explains. Also report wiringGaps (the engine's built-tested-unwired audit) right after untrusted leaves — a gap there means a feature landed unreachable; queue the wiring fix before anything else.
Visually verify UI lanes before handover. A green ledger proves tests pass, not that the user can SEE the feature — after a lane that changes UI, render the real interface and read the pixels yourself (the purposeGap check, done by you, not the owner). If the screenshot contradicts the green ledger, that's a wiring gap: report it as a failure, not a success. Per platform:
- Web: headless-browser screenshot (playwright/puppeteer) of the changed route/state.
- macOS app (needs Screen Recording permission): capture the app's window WITHOUT stealing focus via <skill-base-dir>/scripts/capture-window.sh <AppName> /tmp/verify.png — window-ID capture works on occluded windows; never fight the owner for displays/focus. Driving the UI (clicks) needs an idle display and explicit with timeout; never run an entire contents AX query against an app — it wedges the app's accessibility server for minutes and every subsequent AppleEvent times out (-1712).
- CLI/TUI: capture real terminal output (PTY run if output is block-buffered).
Compound the lane before closing it (each unit of work must make the next one easier — not just ship). After relaying the briefing, spend one explicit pass on three questions:
- Repo convention? If the lane established a pattern (e.g. "thread data explicitly into views, never via store selection state"), add ONE line to the target repo's CLAUDE.md conventions — that's how the next lane's agents inherit it without re-deriving it.
- Caught next time? For every defect class the lane fixed, name the regression canary that now catches it (a test, a compile-gated parameter, a t0 gate). If none exists, that's unfinished work — queue it, don't close.
- Feel regression checklist: every owner complaint about look/feel (a janky scroll, a wrong icon, a missing header) becomes a permanent entry in the repo's docs/polish-checklist.md; after any UI lane, walk that checklist yourself with the platform's capture method (step 7) before handover. Owner taste, once stated, is system property — never make them say it twice.
Keep the loop's memory in the repo, not the conversation (the agent forgets, the repo doesn't). The durable queue lives in docs/BACKLOG.md of the target repo: every briefing's "buried bodies"/follow-ups, deferred funList items, and known-gap canaries get appended there (with file:line evidence) when you relay the briefing; completed items get checked off at the next lane's start. Point the next lane's spec at the relevant BACKLOG entries instead of re-deriving them from chat history — conversation summaries are lossy; BACKLOG.md is not.

Notes

The roles are also standalone subagents (slice-baseliner, slice-slicer, slice-executor, slice-verifier) — you can spawn any one directly via the Agent tool for a lighter, interactive pass instead of the full workflow. (slice-slicer also owns the recursion-termination decision — the former slice-assessor is folded into it.) One-time setup (npx-installed copies): plugin installs register these automatically; a plain skill install does not. If slice-* agents are missing from the agent registry and this skill's base directory has an agents/ folder, copy those files to ~/.claude/agents/ (tell the user you did) — idempotent, skip if already present.
No Workflow tool in this harness?
- opencode: first-class adapter available — copy adapters/opencode/slice-engine.ts to ~/.config/opencode/tools/ and the slice-engine tool runs the SAME engine artifact (AsyncFunction-hosted PORT; agent() = opencode run subprocess on the user's own plan). First use returns a needsSetup question — lane→agent/model mapping, with oh-my-openagent auto-detection (recommended: sisyphus-junior for shell/light, momus for the heavy lens — cross-model criticism preserved). Verify install free with node adapters/opencode/host-smoke.mjs. v1 degradations are documented in the adapter header (no per-call model overrides, no resume journal, unlimited budget).
- Anything else (Codex CLI, subagent-only setups): don't give up the discipline — read references/portable-orchestration.md and drive the same algorithm yourself with subagents. The four invariants survive any port: executor ≠ verifier, shell truth before model judgment, one commit per trusted leaf, full suite only at integrate.
If the target repo is not under git, note it: small reversible commits are themselves a trust mechanism, and git unlocks worktree isolation for parallel slices. Offer to git init.

Integrating Vercel Agent Skills (Optional)

Two React/Next.js selection-table rows reference vercel-react-best-practices and vercel-composition-patterns, tagged (if-installed). These wrap Vercel's open-source agent-skills (perf + API/composition guidance) and are NOT bundled with slice — install them once if you do frontend work, or skip them entirely.

Install (source: github.com/vercel-labs/agent-skills) into a path the front-door resolver searches (~/.agents/skills/<name>/ — the user-level skill-library dir from §4):

mkdir -p ~/.agents/skills/vercel-react-best-practices ~/.agents/skills/vercel-composition-patterns
cp <clone>/skills/react-best-practices/SKILL.md  ~/.agents/skills/vercel-react-best-practices/
cp <clone>/skills/composition-patterns/SKILL.md  ~/.agents/skills/vercel-composition-patterns/

Role split on a React/Next.js lane (compose, do not duplicate): toss-frontend-fundamentals owns usability + a11y + the Toss 4-axis judgment; vercel-composition-patterns owns component/API composition design; vercel-react-best-practices owns React runtime performance.

If-installed treatment: the front door resolves each vercel-* name against the §4 search path (project .agents/skills/, user ~/.agents/skills/, plugin cache, local index). If a guide is NOT found it is silently dropped from the composed skills:[...] list — the lane still runs with whatever guides resolved (an absent path is never passed, and the per-leaf RELEVANCE GATE skips non-matching guides anyway). No flag, no error, no blocked run — the Vercel guides are strictly additive when present.