batch-bug-shepherd - SKILL.md Agent Skill

name: batch-bug-shepherd description: >- Use this skill to drive a batch of suspected bugs in microsoft/apm from raw issue list to mergeable PR queue. Fan out one triage subagent per issue (LEGIT / UNCLEAR / FIXED-AT-HEAD), gate every legit bug against PRINCIPLES.md via an apm-ceo strategic-alignment pass, cross-reference legit issues against open PRs, then open a fix PR (TDD + mutation-break gate) for greenfield bugs. Drive every PR -- community in-flight and own fix alike -- to mergeable by composing the shepherd-driver skill: one driver per PR runs the review panel, folds non-blocking recommendations, pushes (preserving author), and watches CI to green. Re-probe mergeability and resolve conflicts via shepherd-driver. Maintain a plan.md ground-truth table as canonical state. Activate when the maintainer asks to triage issues, sweep the bug queue, shepherd bug-flagged issues, run a weekly community sweep, or drive in-flight community PRs to merge -- even if "shepherd" or "batch" is not named.

batch-bug-shepherd - Outer-loop bug-queue orchestrator

This skill is an A10 ORCHESTRATOR-SAGA over fan-out waves (triage, strategic-alignment, PR-cross-reference, fix, drive-to-merge, conflict-resolution) with a persisted ground-truth table between phases. It COMPOSES the shepherd-driver skill as the per-PR drive-to-merge engine -- it does NOT re-implement the review + fold + push + CI loop. shepherd-driver transitively COMPOSES apm-review-panel; this skill inherits that edge and never reaches into panel internals directly. It also COMPOSES the apm-ceo persona (host-repo agent at .apm/agents/apm-ceo.agent.md) for the strategic-alignment gate, which checks every LEGIT bug against PRINCIPLES.md before allowing fix / drive work to proceed. Per-PR shepherding is delegated to shepherd-driver; per-issue verification, strategic alignment, PR-in-flight branching, greenfield fix dispatch, post-wave mergeability re-probe, and the cross-session table are owned here.

The skill is ADVISORY at the panel layer and EXECUTIVE at the orchestrator layer: it WILL push commits, open PRs, post comments, close superseded PRs. Every consequential write goes through a deterministic CLI (gh, git, uv run ruff) wrapped in plan + execute + verify (A9 SUPERVISED EXECUTION).

Architecture invariants

These 17 rules bind every wave. The one-line essence is below; the FULL binding text (rationale, edge cases, inherited-from-driver detail) lives in references/invariants.md. Load references/invariants.md before planning Phase 0 -- the summaries here are dispatch anchors, not the complete contract.

Fan-out, not serial. Triage / alignment / fix / drive run as parallel child threads; single-loop is an anti-pattern.
Verify before fix. No fix dispatched until the bug reproduces on HEAD (LEGIT); UNCLEAR -> human, FIXED-AT-HEAD -> close.
PR-in-flight detection is mandatory. gh pr list every legit issue before any fix; duplicating community work is the worst failure mode this skill defends against.
Drive, do not split shepherd from complete. ONE shepherd-driver subagent owns the whole per-PR loop; no separate panel + completion waves.
Mutation-break gate. A regression trap is real only if deleting the production guard makes the test FAIL.
Superseding-PR fallback (inherited). On contributor-fork push failure the driver opens an authorship-preserving PR under microsoft/apm and returns superseded.
Single-writer interlock. One idempotent panel comment + one driver advisory per PR; the orchestrator never posts to a PR.
ASCII only. Printable ASCII in every artifact (cp1252 safety).
Lint contract is the push gate (inherited). ruff check + ruff format --check silent before any git push.
Ground-truth table is the single source of truth. One plan.md table, rewritten on every return, re-read at each wave start (B4 PLAN MEMENTO + B8 ATTENTION ANCHOR).
Cross-session message reports only on green. Failures stay in the subagent session until resolved or escalated to a human.
Operator visibility is a contract. Progress mermaid + live table at every boundary; dispatch table before every fan-out (assets/progress-diagram.md).
Mergeability is post-wave truth. Re-probe mergeStateStatus before claiming ready; Phase 5 resolves conflicts with --force-with-lease (bare --force prohibited).
Two-comment-per-PR cap. Driver advisory + resolution confirmation only; the in-loop panel comment is idempotent and does not add to the count. No third comment, ever.
Bias toward folding (inherited). The driver folds in-scope follow-ups into the PR; only genuinely separable work becomes a tracking issue.
Strategic-alignment gate before drive. Phase 1.5 runs one apm-ceo subagent per LEGIT row; demoted rows skip Phase 2-5; the gate fails open to aligned, aborts only if the persona / PRINCIPLES.md is missing.
Worktree isolation. Every fix and drive child runs in its OWN git worktree (one per issue/PR); never fan out mutating children against a shared REPO_ROOT (they would race on .git/index and the checked-out branch). Triage is read-only and may share one.

Composition with shepherd-driver

shepherd-driver is the per-PR drive-to-merge engine. This skill spawns ONE shepherd-driver subagent per PR (both in-flight community PRs and own greenfield fix PRs) using the spawn body ../shepherd-driver/assets/shepherd-driver-prompt.md. The driver owns the whole convergence loop -- Copilot classification, apm-review-panel, fold-vs-defer, push, CI watch, with its own caps -- and returns a completion_return matching ../shepherd-driver/assets/completion-schema.json (ready-to-merge | advisory-with-deferred | superseded | blocked).

The cross-PR conflict-resolution / mergeability phase is ALSO shepherd-driver's: Phase 5 delegates to ../shepherd-driver/assets/conflict-resolution-prompt.md with the step-by-step gate in ../shepherd-driver/references/mergeability-gate.md.

shepherd-driver is a same-repo LOCAL SIBLING declared in apm.yml (dependencies.apm: [../shepherd-driver]). PROBE for it before the drive wave -- a tool call, not an assertion from recall (A9 SUPERVISED EXECUTION):

test -f ../shepherd-driver/assets/shepherd-driver-prompt.md \
  && test -f ../shepherd-driver/assets/completion-schema.json \
  && echo "shepherd-driver present" \
  || echo "MISSING shepherd-driver - stop and ask the operator"

On a probe MISS, STOP and ask the operator to restore the sibling; do NOT re-implement the loop inline (avoids HAND-ROLLED HALLUCINATION and PHANTOM DEPENDENCY). The driver transitively composes apm-review-panel and probes for it at its own preflight, returning status: blocked on a miss. The orchestrator NEVER reaches into shepherd-driver or apm-review-panel internals.

Phases

Work through the phases in order. Reload the ground-truth table at each phase boundary. Do not skip the cross-reference phase.

At every phase boundary (and once at the run start, once at the end), render the progress mermaid diagram + the live ground-truth table to chat per assets/progress-diagram.md. Before every fan-out wave, also render the dispatch table mapping subagent_id to target. These are not optional -- they are the operator's only real-time window into a multi-wave parallel saga.

Phase 0 - scope resolution

Input is either (a) an explicit issue list (e.g. #123 #456 #789) or (b) the sweep-all flag, which expands to:

gh issue list --label bug --state open --json number,title,labels,body
plus gh issue list --state open --search "is:open no:label" filtered by suspicion keywords (error, crash, broken, regression, unexpected, traceback, does not work, cannot, fails).

Initialize the ground-truth table (assets/ground-truth-table.md) with one row per candidate. Print a brief plan to the user: candidate count, expected wave shape, and the disciplines that will be enforced (mutation-break, ASCII, lint). Ask for confirmation only if sweep-all produced more than 20 candidates -- otherwise proceed.

Then render the progress mermaid diagram for the first time per assets/progress-diagram.md -- every phase pending, with the candidate count N substituted into the P0 and P1 labels. Print the live (empty) ground-truth table below it. This is the operator's anchor frame for the run.

Phase 1 - triage fan-out (WAVE 1)

Re-render the progress diagram with P1 styled active. Print the dispatch table mapping each triage-<issue> subagent_id to its target issue BEFORE issuing the parallel spawns.

Spawn one child thread per candidate using assets/triage-prompt.md. Each subagent:

Reproduces the bug on HEAD via the smallest possible repro.
Returns a verdict JSON matching assets/verdict-schema.json (triage verdict shape).

Schema-validate every return (S4). On malformed, re-spawn that subagent ONCE with a clarifying note. On second malformed, mark the row UNCLEAR -- subagent malformed and continue.

Update the table. Move on only when every row has a triage verdict.

Phase 1.5 - strategic-alignment gate (WAVE 1.5)

Re-render with P15 active (substitute L LEGIT count). If L = 0, render P1.5 as skipped and pass through.

Load references/strategic-alignment-gate.md when entering this phase -- it holds the binding procedure (external-dep probes, fail-open semantics, deferred-PR strategic-rejection subagent).

Probe .apm/agents/apm-ceo.agent.md and PRINCIPLES.md. Either missing -> ABORT. Print the dispatch table for the ceo-align-<issue> subagents, then spawn L parallel threads with assets/strategic-alignment-prompt.md. Returns are strategic_alignment_return JSON (verdict in aligned | aligned-with-reservations | out-of-scope | wrong-direction). Schema-validate per retry-once; on second malformed, route as aligned with gate_note (fail-open).

Update strategic_verdict + strategic_rationale columns. Demoted rows flip to status triaged-deferred and are SKIPPED by Phase 2/3/4/5. aligned-with-reservations rows stay in saga; downstream phases MUST surface the reservations.

Phase 2 - PR-in-flight cross-reference

Re-render the progress diagram with P1 done and P2 active. Substitute L (LEGIT row count) into the P2 label.

Skip every row with status triaged-deferred (Phase 1.5 demoted). Run a LIGHTWEIGHT gh pr list probe against demoted rows only to feed the deferred-PR strategic-rejection comment procedure in references/strategic-alignment-gate.md; this read-only probe does not route demoted rows back into Phase 2.

For every LEGIT row (status triaged), run gh pr list --search "<issue-ref-or-keywords>" --state open --json number,title,headRefName,headRepository,headRepositoryOwner,author,maintainerCanModify. Also inspect each linked PR on the issue itself. Two outcomes per row:

pr_in_flight = false -> route to FIX in Phase 3 (greenfield).
pr_in_flight = true -> capture and store, for the Phase 4 driver spawn, every input shepherd-driver requires: PR_NUMBER (number), AUTHOR (author.login), HEAD_REPO (headRepositoryOwner.login + "/" + headRepository.name), HEAD_BRANCH (headRefName), MAINTAINER_CAN_MODIFY (maintainerCanModify), and ORIGIN = community. Route to DRIVE in Phase 4.

Store these driver-input fields in the row (table columns head_repo, head_branch, maintainer_can_modify). They are the crash-survivable evidence the Phase 4 spawn reads -- never re-derived from recall. Update the table. This phase MUST complete before any Phase 3 or Phase 4 spawn.

Phase 3 - greenfield fix fan-out (WAVE 2)

Re-render the progress diagram with P0..P2 done and P3 active. Substitute m (greenfield count) into the P3 label. If m = 0, render P3 as skipped (dashed border) and pass straight to Phase 4.

This wave is FIX-ONLY. In-flight community PRs do NOT pass through here -- they go directly to the Phase 4 drive wave. Filter out any row with status triaged-deferred (strategically demoted by Phase 1.5).

Print the fix-<issue> dispatch table (subagent_id -> issue number) BEFORE spawning. For each LEGIT && !pr_in_flight row, provision one git worktree (git worktree add <path> origin/main), record its slug in the row's worktree column, and spawn a child thread with assets/fix-prompt.md passing that worktree path as REPO_ROOT (per the Worktree-isolation invariant). The fix subagent:

Writes failing tests FIRST (TDD).
Implements the minimum fix.
Runs the mutation-break gate (delete the new guard, confirm tests FAIL).
Runs the lint contract.
Opens a PR under microsoft/apm referencing the issue.
Returns {kind, issue, pr, branch}.

On each return, store the Phase 4 driver inputs for the new PR: PR_NUMBER (pr), AUTHOR = the maintainer's own gh handle (these are own PRs), HEAD_REPO = microsoft/apm (same-repo head), HEAD_BRANCH (branch), MAINTAINER_CAN_MODIFY = true, and ORIGIN = own-fix. Write head_repo, head_branch, maintainer_can_modify into the row. Hold until every spawn returns.

Phase 4 - drive-to-merge fan-out (WAVE 3)

PROBE for shepherd-driver (see "Composition with shepherd-driver"). On a probe MISS, STOP and ask the operator; do NOT inline the loop.

Re-render with P4 active. Let D be the count of PRs to drive -- EVERY PR in the table that is not triaged-deferred: the in-flight community PRs (ORIGIN = community, from Phase 2) PLUS the own fix PRs (ORIGIN = own-fix, from Phase 3). Substitute D into the P4 label; if D = 0, render P4 as skipped.

Print the drive-<pr> dispatch table (subagent_id -> PR number), then spawn ONE shepherd-driver subagent per PR using ../shepherd-driver/assets/shepherd-driver-prompt.md. Each driver runs in its OWN worktree (Worktree-isolation invariant): own-fix rows REUSE the worktree their Phase 3 fix child recorded; community rows get a fresh git worktree add + gh pr checkout, recorded in worktree. Pass the inputs the prompt declares, reading each from the row (never recall); REPO_ROOT is the row's worktree path. For rows with strategic_verdict = aligned-with-reservations, ALSO pass PANEL_PRIOR = {"reservations": [<the strategic reservations as {summary} objects>]} so the driver surfaces them in the panel run and the PR advisory comment.

If D is large, batch the spawns (e.g. groups of 3) to bound nested-panel fan-out rather than launching all drivers at once.

Each driver owns the full convergence loop end-to-end and returns a completion_return matching ../shepherd-driver/assets/completion-schema.json (status enum per the Composition section). Schema-validate every return (retry-once; on second malformed, mark the row blocked and continue). Write head_sha, mergeable, merge_state_status, and ci_status from the return into the table, and remove the status/shepherding label from the driven issue (assignment stays). The orchestrator owns only schema-validation, table update, and label cleanup -- it does NOT post to any PR.

Phase 5 - mergeability gate (WAVE 4)

Re-render with WAVE4 active. Substitute R (ready-PR count) and C (CONFLICTING-PR count) into the P5a / P5b labels. If R = 0, skip Phase 5 entirely; if C = 0, render P5b as skipped.

Load ../shepherd-driver/references/mergeability-gate.md when entering this phase -- it holds the binding step-by-step (probe CLI flags, retry policy, four-way partition, trust-but-verify re-probe). The contract summary:

5a (read-only): probe every Phase-4 ready PR via S7 DETERMINISTIC TOOL BRIDGE (gh pr view --json with the S7 mergeability fields -- full flag list in mergeability-gate.md). Skip triaged-deferred rows. Partition CLEAN / UNSTABLE / HAS_HOOKS (verified-ready) from BEHIND / DIRTY / CONFLICTING (route to 5b). BLOCKED is not a conflict.
5b (fan-out, one subagent per CONFLICTING PR): print dispatch table, spawn resolve-conflicts-<pr> subagents using ../shepherd-driver/assets/conflict-resolution-prompt.md. Each owns its PR end-to-end: rebase, faithful conflict merge, lint silent, push with --force-with-lease (NEVER bare --force), re-probe, post the single resolution-confirmation comment. Returns are conflict_resolution_return matching ../shepherd-driver/assets/completion-schema.json.
5c (read-only): trust-but-verify re-probe; partition into the schema's four conflict_resolution_return statuses; update the table.

Phase 6 - final report

Re-render with every phase done (or blocked where the human-escalation queue is non-empty). Render assets/final-report-template.md: per-issue verdict, PR link, post-gate status (the template's status set), and subagent session refs. Phase-1.5-demoted rows land in the template's "Recommend close as out-of-scope" partition, each citing the principle that fired.

Use clickable GitHub links (issue + pull URLs under github.com/microsoft/apm) and @<author> profile links, not plain issue numbers.

After the report renders, tear down ONLY the worktrees this run created (git worktree remove each slug in the worktree column); leave branches on origin for the open PRs.

Bundled assets

This skill bundles ONLY the assets unique to its triage-and-batch orchestration. Everything PR-drive related is owned by the composed shepherd-driver sibling, loaded from ../shepherd-driver/, not duplicated here.

assets/verdict-schema.json -- JSON schema for the TWO BBS-owned subagent return shapes (triage_return, strategic_alignment_return). Schema-validate every return (S4).
assets/ground-truth-table.md -- canonical table template (issue | verdict | pr | pr_in_flight | author | head_repo | head_branch | maintainer_can_modify | worktree | status | strategic_verdict | strategic_rationale | notes).
assets/triage-prompt.md -- WAVE 1 spawn body.
assets/strategic-alignment-prompt.md -- WAVE 1.5 spawn body (loads apm-ceo persona + PRINCIPLES.md).
assets/fix-prompt.md -- WAVE 2 greenfield-fix spawn body.
assets/final-report-template.md -- user-facing report shape.
assets/progress-diagram.md -- mermaid progress diagram, color contract, dispatch-table render rules (Phase 1, 1.5, 3, 4, 5b).
references/strategic-alignment-gate.md -- Phase 1.5 step-by-step (external-dep probes, fail-open semantics, deferred-PR strategic-rejection subagent). Load WHEN ENTERING PHASE 1.5.
references/invariants.md -- full binding text of the 17 architecture invariants. Load BEFORE PLANNING PHASE 0.

Composed from shepherd-driver (loaded by relative path, NOT bundled): shepherd-driver-prompt.md (Phase 4 drive), conflict-resolution-prompt.md (Phase 5b), completion-schema.json (driver + resolution returns), references/mergeability-gate.md (Phase 5 gate) -- all under ../shepherd-driver/.

Operating contract for the orchestrator thread

The orchestrator loop, beyond the invariant anchors above:

Before each phase: re-read plan.md ground-truth table. After each subagent return: schema-validate, update the table, write it back.
Delegate every PR-side write to the responsible subagent; the orchestrator never posts to a PR and never flips merge state.
Render the progress mermaid + live table at every phase boundary, and the dispatch table before every fan-out (assets/progress-diagram.md).

Out of scope

Authoring panel personas (lives in apm-review-panel).
Computing coverage percentages (lives in test-coverage-expert persona, invoked via apm-review-panel).
Single-PR review without a batch (use apm-review-panel directly).
Driving a single PR to merge without a batch (use shepherd-driver directly).
Auto-merge or auto-label. The orchestrator does not flip merge state; the maintainer ships.