name: upgrade-agent-fleet description: "Upgrade a FLEET of Tangle agent products (tax, gtm, legal, creative, agent-builder) to the current shell standard — adopt @tangle-network/agent-app, delete hand-rolled shell duplication, strip fork-inherited domain residue, and contribute generic improvements DOWN to the substrate. Runs as a dynamic workflow fanout with adversarial completion gates that make a false 'done' fail. Loops per agent until measurable completion criteria pass. Use when one product (the reference) has advanced past the others and the fleet must catch up, or to audit fleet drift."
Upgrade Agent Fleet — adversarial, looped, fanned-out
You have N agent products that forked from a common substrate and drifted. One of them (the reference) has advanced: it adopted a shared shell, deleted its duplicated plumbing, fixed bugs, sharpened its domain. The others lag — each still hand-rolls the same shell, and some still carry residue from whatever they forked FROM. This skill drags the whole fleet up to the reference standard, in parallel, and refuses to believe "done" until it's proven.
It is only the fleet-orchestration layer. The per-agent "how to adopt agent-app" already exists — this skill does NOT re-specify it. It calls these as its payload:
| Layer | Skill | Role | This skill duplicates it? |
|---|---|---|---|
| This skill | upgrade-agent-fleet |
Fanout + adversarial gates + loop-until-green across many agents at once | — |
| Per-agent adoption (the payload) | build-agent-app |
How ONE product adopts agent-app: discovery interview, module seams, layering rule, sandbox/non-sandbox wiring, the migration lift-loop, anti-patterns | NO — delegate to it |
| Eval-stack pipeline | agent-stack-adoption / agent-eval-adoption |
The 9-phase self-improving stack + primitive reference | NO |
| Publishing improvements | substrate-release |
Contribute-down + fleet propagation when the upgrade surfaces a generic fix | NO |
If you're upgrading one agent, you don't need this skill — use
build-agent-appdirectly. Reach for this only when you're moving the fleet and want parallelism + adversarial completion-gating + loop-until-green. Everything below is the orchestration aroundbuild-agent-app, not a second copy of it.
The current job (instantiate these via args before running)
- Reference (the standard to match): the product furthest along on the shell — fully delegating its shell mechanism, fork residue removed. Pass it as
args.reference. - Shell target:
@tangle-network/agent-app(published, provenance-signed). Its modules (/tools /runtime /eval /knowledge /knowledge-loop /config /preset-cloudflare /integrations /tangle /billing /delegation /crypto /web /redact /stream) are what a product otherwise hand-rolls. - Fleet to upgrade: the remaining agent products, passed as
args.agents. Nothing here hard-codes a repo name — the orchestration is repo-agnostic.
The three hard-won laws (this skill exists because these were learned the expensive way)
DISCOVER before you upgrade — you cannot pre-script it. Each agent NAMES the same mechanism differently. Grepping the reference's vocabulary (
streamAppToolLoop,submit_proposal,HubExecClient) across a target returns zero — not because the duplication is absent but because it's there under a different name. Every upgrade starts with a per-agent discovery pass that maps that agent's shell surface onto agent-app's modules. A pre-written checklist of symbol names is a lie.A fork carries the donor domain's whole layer — hunt the residue. A product that was forked from another agent silently keeps the donor's live knowledge-gate (a
buildXKnowledgeRequirementswired into the production chat loop), its whole flow library, its schema tables, and a donor-shaped eval harness — all under the new product's branding. The upgrade is not done when the shell is adopted; it's done when no foreign-domain code remains. Probe:grep -rinE "<donor-domain tokens>"(the caller passes the donor's vocabulary viaargs.donorDomainTokens, e.g. a filings/contracts donor:matter|filing|jurisdiction|incorporation|contract). Treat every hit as residue until proven domain-correct.Extend the substrate, never re-duplicate it. If the upgrade reveals a generic mechanism still hand-rolled, the fix is to contribute it DOWN to
agent-app/ the engine (additive subpath, peer-dep), then have the agent consume it — not to copy the reference's local copy. Engine =peerDependency, never bundled. (substrate-releaseowns the publish.)
Architecture — dynamic fanout, per-agent loop-until-green
fleet ─┬─ agent A ─► [Discover] ─► [Upgrade] ─► [Adversarial Verify] ─┐
├─ agent B ─► [Discover] ─► [Upgrade] ─► [Adversarial Verify] ─┤── loop until
└─ agent C ─► [Discover] ─► [Upgrade] ─► [Adversarial Verify] ─┘ verdict=PASS
▲ │
└── refutations ───┘ (reopen with the
refutation as the
next task)
- Fanout across agents (parallel; each agent independent).
- Per agent, a bounded retry loop:
Upgrade → Verify; if the adversarial panel refutes, feed the refutations back as the next Upgrade's directive. Stop when the panel returns a clean verdict K rounds running, or a retry cap is hit (and thenlog()the agent as NOT complete — never silently pass). - Isolation: each agent runs in its own git worktree (
isolation: 'worktree') off its current branch. Never push, never touch main. The output is a verified worktree + a report; a human merges.
Per-agent procedure — DELEGATE to build-agent-app
The per-agent upgrade IS the build-agent-app migration path — discovery interview,
module seams, the layering rule, the lift-loop, sandbox/non-sandbox wiring,
anti-patterns. Do not restate it here. Every Upgrade agent this skill spawns is
told to follow build-agent-app for the actual adoption. This skill adds exactly
two things on top of that procedure, because they're fleet- and rigor-specific:
- Delta A — hunt foreign-domain residue (law 2).
build-agent-appcovers adopting the shell; it does not assume the agent is a fork carrying another domain's whole layer. This skill makes residue-stripping a first-class, separately-verified step (thedomainskeptic). [Upstream candidate: fold this intobuild-agent-app's migration section.] - Delta B — measurable completion criteria + an adversarial panel (below) instead of self-assessed "done." This is the whole point of running the fleet through a workflow rather than by hand.
USER STORIES — validate behavior, not just compilation
Each agent must satisfy these (verified by a real test or a runtime trace, not by assertion). Phrase them per the agent's domain; the mechanism is shared:
- Proposal routing: "As a user, when the agent recommends a regulated action, it emits a validated
submit_proposaltool call (via agent-app/tools) that lands in the approval queue with the correct taxonomy — not a fenced-text block, not a direct side effect." - Bounded tool loop: "As a user, a turn that triggers integration tool calls runs the agent-app
/runtimeloop: streams a turn → executes tools → folds results → re-runs, capped atmaxToolTurns— and the same loop drives both the sandboxed and (if applicable) browser path." - Integration invoke: "As a user, a read action executes immediately through the hub client (
/integrations); a write returns approval-required and surfaces as a proposal." - No foreign domain: "As a user of this agent, nothing in the product references the donor domain — no foreign flows in the UI, no foreign tables queried, no foreign knowledge requirements gating my chat."
- Eval parity: "The eval harness runs through agent-app
/evalre-exporting the engine verifier; the agent's completion gate scores its own domain, not the donor's."
The implementer writes/keeps a test per story. A story with no executable check is not validated.
COMPLETION CRITERIA — measurable, per agent (the gate)
A target is complete only when ALL hold, each PROVEN by command output:
| # | Criterion | Proof |
|---|---|---|
| 1 | typecheck clean | pnpm typecheck exit 0, zero errors |
| 2 | tests ≥ baseline | pnpm test pass count ≥ the pre-upgrade baseline (record both) |
| 3 | build green | pnpm build exit 0 |
| 4 | agent-app adopted | @tangle-network/agent-app resolves in the lockfile; imported in ≥1 file per swapped module |
| 5 | no duplication | each swapped local module is a re-export/deleted — grep finds no second copy of the lifted logic |
| 6 | no foreign domain | donor-domain token probe returns 0 hits in src/ (or every hit is justified in the report) |
| 7 | no shims/regressions | no @ts-expect-error, no .skip added, no --no-verify, no fake fallback introduced by the upgrade |
| 8 | LOC down | net .server/src shell LOC strictly decreased (record delta) |
Criteria are literals checked by the verifier agents, not vibes. "Mostly done" is FAIL.
ADVERSARIAL VERIFICATION — designed so false "done" cannot survive
After each Upgrade, spawn a panel of skeptics, each with a DISTINCT refutation lens, each prompted "default to REFUTED unless you can prove otherwise by running/grepping; cite the command and its output."
| Lens | Tries to prove |
|---|---|
| Build skeptic | typecheck/test/build actually fail, or the test baseline dropped |
| Shim skeptic | the "deletion" is actually a second copy / a re-export that still pulls local logic |
| Resolution skeptic | the adoption is HOLLOW — git grep -c "<shell>" -- src/ is 0 (or didn't rise ~per-swapped-module) despite adoption-named commits; or <shell> is only an array entry (e.g. minimumReleaseAgeExclude) not a versioned dependency; or imports resolve to a local path / the lockfile never moved. Commit messages are not evidence — the real import count is. (Caught live: an agent shipped 5 "delegate to the shell" commits with 0 actual imports.) |
| Domain skeptic | foreign-domain residue remains (tokens, tables, flows, knowledge gates, UI) |
| Story skeptic | a user story has no executable check, or its check is trivially true |
| Silent-cap skeptic | the upgrade narrowed scope (skipped a module, sampled, TODO'd) without saying so |
Verdict rule: complete only if every lens fails to refute (or ≥ a strong majority, with the holdouts' objections resolved). Any surviving refutation becomes the directive for the next Upgrade iteration. Loop-until-dry: require 2 consecutive clean panels before declaring an agent done — a single clean pass is not enough (the tail is where the lie hides).
Cap the loop (e.g. 4 Upgrade↔Verify rounds). On cap, the agent is reported INCOMPLETE with the open refutations — never upgraded to "done." Silent truncation reads as success when it isn't.
How to run
The executable form is workflows/upgrade-agent-fleet.workflow.js in this plugin. Invoke the Workflow tool with that script (or paste it), passing the fleet as args:
Workflow({ scriptPath: ".../upgrade-agent-fleet.workflow.js",
args: { reference: "<reference-product>",
shell: "@tangle-network/agent-app",
donorDomainTokens: { "<forked-product>": ["matter","filing","jurisdiction","incorporation","contract"] },
agents: ["<product-a>","<product-b>","<product-c>"],
codeRoot: "<your code root>",
maxRounds: 4 } })
It fans out, loops each agent to a clean adversarial verdict, isolates each in a worktree, and returns a per-agent report (baseline vs final test counts, LOC delta, modules swapped, residue removed, upstream candidates, open refutations if capped). It does not push. Read the report, then merge the green worktrees and run substrate-release for any upstream candidates.
When the fan-out stalls on one hard module — switch to a focused subagent
The adversarial fan-out is excellent at audit, breadth, and catching hollow/partial adoption (the coverage + shim + hollow skeptics earned their keep — they caught a forked redact traversal line-for-line and repeatedly flagged a capability-token-only tools claim). It is bad at landing one deep, single-module surgery: a finish agent given many modules will do the easy ones and keep skipping the hard one (e.g. migrating a :::proposal regex parser onto /tools across the chat pipeline + MCP + HTTP routes), and seeding the gap as a prior-refutation does not reliably force it.
Rule: if two finish rounds leave the same module unadopted (verified by import count, not the agent's self-report), stop re-fanning. Hand that one module to a single focused subagent with a precise brief + the reference consumer to copy + hard grep-based gates — the deterministic path. (Proven repeatedly: a deep schema/flow de-wire and a tools side-channel migration both landed clean via a focused subagent after the fan-out loop stalled on them.) Then re-audit.
Always verify with the repo's ACTUAL layout. A git grep -- src returns 0 on a monorepo (packages/*/src, apps/*/src) and you will wrongly call a real adoption "hollow." Grep the whole worktree, or detect the layout first.
Safety / guardrails
- Worktree isolation per agent; no push, no force, no main mutation, no
--no-verify. - Each agent's pre-existing uncommitted work + feature branch is respected — the worktree forks the current HEAD.
- Bounded retries; capped agents are reported incomplete, not faked.
- Engines stay peer-deps; never bundle. Improvements go DOWN, not sideways.