name: dportsv3-agentic-analysis description: Analyze how the DeltaPorts agentic loop (triage → patch → convert) handled a given port. Use when the user says "analyze port X" or "analyze the agentic run for X". Produces a structured report covering correctness, efficiency, and bugs against the current code's expected behavior. This skill is constantly improving — when you spot a new failure mode or expected-behavior gap, note it at the end of the report under "Skill update suggestions" so it can be folded back in.
DeltaPorts agentic analysis
What this skill is for
The DeltaPorts agentic loop watches dsynth failures and tries to fix them. Each failure produces a bundle with triage output, patch attempts, tool trace, and a rebuild proof. This skill walks you through pulling that data for a given port and judging what the loop did well, did badly, or got silently wrong.
Inputs you need before starting:
- Port origin (e.g.
devel/gperf) or a bundle ID. - Tracker base URL — read from the
DP_TRACKER_URLenv var, or ask the user. There is no hardcoded default; the tracker host is operator-specific.
Output: a report with what the agent did, whether that matches the expected contract, inefficiencies, bugs, and skill update suggestions.
If the analysis will read large traces (>50 KB of artifacts), delegate the read+summarize step to a sonnet 4.6 subagent via the Agent tool so the main context stays clean. Give the subagent the artifact URLs and a copy of the "Expected behavior" + "Known failure modes" sections from this file.
How to fetch data
Prefer the dportsv3 tracker get-* CLI over curl. It talks to the same tracker HTTP API but returns clean structured output and handles URL encoding for you. Set DPORTSV3_TRACKER_URL once and every command picks it up; or pass --server URL.
Locating the dportsv3 binary
The CLI is not always on $PATH. At the start of your analysis, resolve the binary once and reuse it:
# Repo-root-anchored fallback. The venv binary is the project's
# own and is the safe deterministic location.
DPORTSV3="${DPORTSV3:-$(command -v dportsv3 || echo /Users/tuxillo/s/DeltaPorts/scripts/generator/.venv/bin/dportsv3)}"
# Verify before using it:
"$DPORTSV3" tracker --help >/dev/null \
|| { echo "dportsv3 unusable at $DPORTSV3" >&2; exit 1; }
Then every later call uses "$DPORTSV3" tracker get-bundle … etc. If you can't make the CLI work after that probe, fall back to curl and flag the gap.
CLI commands (prefer these)
| Purpose | Command |
|---|---|
| List recent bundles for a port | dportsv3 tracker list-bundles --origin <category/port> --limit 10 |
| Get one bundle's detail (incl. artifact list) | dportsv3 tracker get-bundle <bundle-id> |
| Same, structured JSON | dportsv3 tracker get-bundle <bundle-id> --json |
| Fetch one artifact's raw bytes (logs/diffs/JSON) | dportsv3 tracker fetch-artifact <bundle-id> <relpath> |
| Get one job by ID | dportsv3 tracker get-job <job-id> |
| List jobs (filter by state) | dportsv3 tracker list-jobs --state dead --limit 20 |
| Activity log for one job | dportsv3 tracker get-activity --job <job-id> --limit 200 |
| Activity log filtered by stage | dportsv3 tracker get-activity --job <id> --stage tool: |
HTML pages (use only when no equivalent API exists)
| Purpose | URL |
|---|---|
| Agentic dashboard (rendered overview) | GET /agentic |
| Bundle detail rendering | GET /agentic/bundles/<bundle-id> |
Discovery flow
dportsv3 tracker list-bundles --origin devel/<port> --limit 5— find recent bundle IDs.dportsv3 tracker get-bundle <id> --json— full detail with the artifact list.- For each artifact you actually need to read:
dportsv3 tracker fetch-artifact <id> <relpath>. - If multiple bundles exist for the same port, analyze the most recent and scan timestamps of prior failures to see if the loop kept retrying.
Why not curl?
- The CLI handles URL encoding of slashed paths (
devel/gperf→devel%2Fgperf) without you remembering. - Structured output (or
--jsonfor full JSON) means no grep/sed/jq pipelines that break on whitespace or special characters in error messages. - The single binary already knows the tracker URL (
DPORTSV3_TRACKER_URL) so you don't repeat it.
Fall back to curl only if the CLI is missing a command you need; then file a SKILL-update suggestion so the next analyzer doesn't have to fall back.
Artifacts that matter
All under /api/bundles/<bundle-id>/artifacts/:
| Path | What it tells you |
|---|---|
meta.txt |
Origin, target, timestamps, dsynth profile. |
logs/errors.txt |
dsynth tail — the actual failure the agent reacted to. |
analysis/triage.md + triage.json |
Classification, confidence, root cause, suggested fix. |
analysis/patch.md |
Agent's narrative of what it did and why. |
analysis/patch_audit.json |
Status, model, token usage, attempts breakdown. |
analysis/rebuild_proof.json |
rebuild_ok + build command — the success gate. |
analysis/changes.diff |
The operator-applyable diff. Empty diff with rebuild_ok=true is always a bug. Since Step 42 there is no intent-log fallback — this is the canonical record. |
analysis/tool_trace.jsonl |
Per-turn tool calls + their results, lifecycle events (attempt_start, llm_turn, tool_call, attempt_end, token_budget_exhausted). Cheap inefficiency scan. Does NOT contain assistant message text or reasoning_content — for that, read the session dumps below. |
analysis/sessions/*.jsonl.gz |
Full LLM message transcripts (one file per attempt per job). The single most informative artifact: every assistant turn's content + reasoning_content, every tool_calls entry with full arguments, every tool result message the model actually received. Read these when tool_trace.jsonl isn't enough — i.e. whenever you need to understand why the agent did what it did, not just what tools fired. Available only when DP_HARNESS_DUMP_SESSION was on at run time. See below for record structure + parsing. |
analysis/proposed_fix.md |
Operator-facing recipe written when rebuild_ok=true. |
analysis/manual_handoff.md |
Operator-facing escalation summary written when the job escalates (MANUAL tier, retry cap, budget exhaustion, gave-up). Either this or proposed_fix.md is present, not both. |
analysis/convert_result.json |
Convert phase typed result (when a convert job ran). Carries status, reapply_ok, deferred_patches. |
analysis/triage_result.json / patch_result.json |
Typed phase results (Step 36) — root cause + evidence lifted into structured fields, consumed by downstream phases. |
port/Makefile, port/distinfo, port/pkg-plist |
Snapshot of the port at failure time. |
LLM session dumps — structure and parsing
Each analysis/sessions/<timestamp>-<target>-<origin>-<pid>[-<role>].job.attempt<N>.jsonl.gz is the complete messages array as it stood at the end of one tool_loop attempt. The filename embeds:
- the role suffix:
-convert,-patch, or no suffix for triage - attempt number:
attempt1,attempt2(patch can retry, convert/triage are single-attempt)
Decompressed: one JSON object per line, each a message in the LLM-API shape:
{"role": "system", "content": "<system prompt verbatim>"}
{"role": "user", "content": "<initial task payload, plus any failure-context message inserted at attempt N>1>"}
{"role": "assistant", "content": "<text>", "reasoning_content": "<deepseek thinking>", "tool_calls": [...]}
{"role": "tool", "content": "<JSON tool result>", "tool_call_id": "..."}
What each role tells you:
- system (record 0): the exact prompt the agent saw. Verify any prompt-rule reference (e.g. "the agent went to /work/DPorts despite the prompt saying not to") by grepping the system content.
- user (record 1): the assembled payload. Always check
len(content)— anything >50KB warrants a section-by-section breakdown by## heading. Common offenders:## Build Errors(dsynth log tail),## Port Files(every file under port/ inlined),## Agent Playbooks(matched playbook content). - user (record 2+, attempt 2+ only): the failure-context message from
attempt_loop._failure_context_message. After Step 42 this is justPrevious attempt #N did not succeed.\nTail of your prior response:\n<last 2KB>\n— no structured intent-log summary. If the agent re-submits the exact same wrong edit in attempt 2, the lesson is in those last 2KB of attempt-1 prose. - assistant: the agent's per-turn output. Three fields you care about:
content(text the model wrote) — usually small except in the final "## Patch Log / ## Rebuild Proof" turn.reasoning_content(deepseek thinking-mode output) — often the largest single field. Carried on every subsequent turn per deepseek's contract, so accumulates quadratically. A 13KB single-turn reasoning blob is ~3K tokens × every remaining turn.tool_calls— array of{id, function: {name, arguments}}. The arguments are JSON-stringified; parse before inspecting.
- tool: the JSON result the model received. Contents match what
worker.pyreturned:{ok, ...}. Formaterialize_dports:stdout_tailis what the agent reads to decide the apply landed. Forvalidate_dops:{ok, stderr_tail}with E_* error codes on failure. Forget_effective_overlay:{target, effective_ops, filtered_out}— what compose WILL apply vs. ops scoped to other build lines. Forput_file:{ok, sha256, ...}.
Fetch + decompress in the same shell pipeline as the rest:
BID=<bundle-id>
"$DPORTSV3" tracker get-bundle "$BID" --jobs --json | python3 -c "
import json,sys
d=json.load(sys.stdin)
for a in d.get('artifacts',[]):
if 'sessions/' in a['relpath']:
print(a['relpath'])
" | while read rel; do
out="$(basename "$rel")"
"$DPORTSV3" tracker fetch-artifact "$BID" "$rel" > "$out" 2>/dev/null
gunzip -k "$out" # keep both .gz and decompressed
done
Walking a session (Python, one-shot):
import json
recs = [json.loads(l) for l in open('<file>.jsonl') if l.strip()]
print(f"records={len(recs)} system={len(recs[0]['content'])}B user={len(recs[1]['content'])}B")
for i, r in enumerate(recs):
if r['role'] in ('system','user'): continue
if r['role'] == 'assistant':
tcs = [t['function']['name'] for t in (r.get('tool_calls') or [])]
rc = r.get('reasoning_content') or ''
print(f"#{i} ASST text={len(r.get('content') or '')}B reason={len(rc)}B tools={tcs}")
# show big reasoning blobs
if len(rc) > 1000:
print(f" R: {rc[:400].replace(chr(10),' ')!r}...")
elif r['role'] == 'tool':
try:
d = json.loads(r['content']); ok = d.get('ok')
except Exception:
ok = None
print(f"#{i} TOOL ok={ok} sz={len(r['content'])}B")
When tool_trace.jsonl is enough vs. when you need session dumps:
- tool_trace alone suffices for: "which tools fired in what order", "did the agent call dsynth_log after build failure", "how many
put_filewrites hitoverlay.dops", basic efficiency scan. - Need session dumps for: "why did the agent go down path X" (reasoning_content), "what did the agent actually observe in the materialize summary" (full tool result content vs. trace's truncated keys), "did the agent see the failure-context message between attempts" (user-record-2 in attempt 2), "what was the static prompt's exact text" (system record). Anything that requires understanding the agent's mental model needs the session dump.
A session dump may not exist for older bundles or bundles run without DP_HARNESS_DUMP_SESSION=1. The text format of analysis/tool_trace.jsonl is always present and is the fallback.
Bulk-fetch the bundle in ONE shell command, not N small calls.
Every Bash invocation costs a permission prompt and a tool turn. The whole analysis should fit in 2–4 Bash calls total:
List + pick (1 call):
"$DPORTSV3" tracker list-bundles --origin <port> --limit 5Full bundle dump in one shell pipeline (1 call):
BID=<bundle-id> echo "===== bundle detail + jobs =====" "$DPORTSV3" tracker get-bundle "$BID" --jobs --json for f in meta.txt logs/errors.txt analysis/triage.md analysis/patch.md \ analysis/patch_audit.json analysis/rebuild_proof.json \ analysis/changes.diff analysis/tool_trace.jsonl \ analysis/proposed_fix.md analysis/manual_handoff.md \ analysis/convert_result.json; do echo "===== $f =====" "$DPORTSV3" tracker fetch-artifact "$BID" "$f" 2>/dev/null \ || echo "(absent)" doneActivity log if needed (1 call):
"$DPORTSV3" tracker get-activity --job <job-id> --limit 200
That's it. Three Bash calls covers ~95% of analyses. Do NOT make one call per artifact, one call per job ID, one call per stage filter — the shell loop above is a single Bash invocation from the permission system's perspective. Reach for more calls only when the first dump leaves a specific gap (e.g. an unexpected stage you want to grep for).
Note: proposed_fix.md and manual_handoff.md are mutually exclusive (one per terminal outcome); convert_result.json is absent on patch-only bundles. The || echo "(absent)" handles all three.
Expected behavior (current code contract)
This section is what the agent should be doing if the code is working. Update it when the code evolves.
Lifecycle
- A failure bundle is written by the dsynth hook. The runner enqueues a triage job.
- Triage job runs
dportsv3.agent.triage.run— single LLM call (no tools), may do up toDP_HARNESS_MAX_SNIPPET_ROUNDS(default 5) snippet-extractor rounds. - Triage emits
classification,confidence, root-cause text.config/agentic-policy.jsonresolves a tier (AUTO / ASSIST / MANUAL). - Routing is driven by
assess_dops(the substrate gate), not by triage classification — convert is a substrate prerequisite for patch and is never skipped/deferred/escalated based on classification. - AUTO/ASSIST → auto-enqueue patch (and convert first, if substrate isn't dops). MANUAL → stop.
- Patch job runs
dportsv3.agent.patch.run→attempt_loop.runup totier.max_iterationsattempts, each driving atool_loopuntil the LLM stops requesting tools. - Success gate:
rebuild_proof.jsonparsed from the LLM's## Rebuild Proof (JSON)block withrebuild_ok=true. Convert jobs additionally requirevalidate_dops_ok=true.
Tool surface
There is one patch-agent tool surface (Step 42 deleted the edit-intent layer). The patch agent edits ports/<origin>/overlay.dops directly in dops DSL — the same surface convert uses (put_file + validate_dops + dops_reference, reading with grep / get_file) — plus the build-loop tools convert doesn't need (extract, dupe, genpatch, install_patches, dsynth_build, dsynth_log, materialize_dports) and the read-only views emit_diff and get_effective_overlay. System prompt: prompts.PATCH_SYSTEM. Tool list: tools.patch_tool_names() (returns the full registry).
Convert agent: env_verify, list_dir, get_file, put_file, grep, dops_reference, validate_dops. Build-loop tools are deliberately excluded so the model doesn't wander into source exploration. System prompt: prompts.CONVERT_SYSTEM. Tool list: tools.CONVERT_TOOL_NAMES.
Triage agent has no tools — single-turn LLM call with snippet rounds. System prompt: prompts.TRIAGE_SYSTEM.
Check dportsv3/agent/tools.py and dportsv3/agent/prompts.py directly if uncertain.
Substrate contract
- All chroot-bound operations route through
dportsv3 dev-env exec <env> -- .... - Host filesystem ops only on
env_dir/writable/...(resolved viadportsv3 dev-env path <env> --writable). - No git commits, branches, push, or PRs from the loop. The writable overlay is the workspace;
analysis/changes.diffis the audit trail. Delivery to upstream (PR open / patch outbox) is a separate operator-triggered phase viadportsv3.delivery.orchestrator. - Guards collapse to three generic gates:
validate_dops/check_dsl(DSL syntax + semantics),assess_dops(substrate-state gate that decides whether patch can proceed), and the_resolve_pathpath-escape backstop.assert_port_clean(the pre-job clean preflight) runs unconditionally.
Substrate / mode handling
Patch operates ONLY on dops-converted substrate (Step C, commit 0b7ed09fc26 onward, reinforced by Step 42). Compat-shaped ports (Makefile.DragonFly + dragonfly/patch-* without overlay.dops) get converted by the convert agent first; if convert hasn't produced a dops overlay, patch refuses with blocked_by: state:<state> and the runner escalates to MANUAL.
The put_file boundary refuses any write to a dops port's Makefile.DragonFly (relocated there from the old intent translator). Half-migrated substrate (both Makefile.DragonFly AND overlay.dops present) is detected by assess_dops and routed back to convert.
dops grammar (what overlay.dops looks like)
The dops DSL is the source of truth for substrate. The canonical reference is scripts/generator/dportsv3/agent/dops_quickref.md — read it before judging dops edits. Real examples live at ports/devel/readline/overlay.dops, ports/editors/vim/overlay.dops, ports/ports-mgmt/pkg/overlay.dops.
Quick orientation (full grammar in the quickref):
file.materialize { source = "src/path"; dest = "dest/path" }— copy a file from the dragonfly source tree into the port at compose time. Used for porting in DragonFly-specific replacements of upstream files.file.copy { source = "..."; dest = "..." }— copy a file from port-local resources (no dragonfly source tree).patch.apply { target = "path/in/wrksrc"; diff = """...""" }— apply an inline unified diff at patch phase. Thediffpayload is the actual patch content — when context drifts, the diff string inside overlay.dops is what needs editing, not adragonfly/patch-*file.mk set/mk add/mk remove/mk replace-if— Makefile.DragonFly variable directives. Sequentialmk seton the same key REPLACES (the engine collapses duplicates); the agent's mental model should not produce duplicatemk setlines.mk targetset/append — recipe lines for a make target.- Per-target scoping via
target @main/target @2026Q[1-4]/target @any.@anyis the right default for fresh overlays;@mainfilters every op away on quarterly builds (silent-skip bug class — see "Target-mismatch ghost" below).
Bright-line rule: if the upstream-source file lives in the dragonfly source tree, prefer file.materialize; if it's port-local, use file.copy. The convert agent's system prompt teaches this — when judging a convert run, check it followed the rule.
Mode-correctness checks for the analyzer
For any patch or convert bundle, verify in the trace:
- Did the agent read
overlay.dopsearly? (It should — that's the source of truth.) Preferget_effective_overlayreads over rawget_file overlay.dopsreads — on multi-target overlays the effective view filters out ops scoped to other build lines, eliminating manual filtering errors. - If a patch hunk drifted: did the agent edit the
diff = """..."""block insideoverlay.dops(correct), or did itput_filetodragonfly/patch-*(wrong — silently clobbered on next reapply, the put_file boundary should refuse this on dops ports but verify it did)? - If a patch became obsolete: did the agent remove the corresponding
patch.apply/file.materializeblock fromoverlay.dops, or did itput_filean emptydragonfly/patch-*? Only the first is durable. - Did
validate_dopsrun after everyput_file overlay.dops? It should — that's the dops equivalent of "does this even parse." Skipping it and going straight tomaterialize_dportsis a discipline regression. - Did
materialize_dportsre-run between the dops edit anddsynth_build? Compose needs to re-render the port tree from the edited overlay before dsynth sees the change. - For a fresh overlay: does the header carry
target @any(correct) ortarget @main(silent-skip regression — see failure modes)?
Output contract for operators
changes.diffmust contain the operator-applyable diff. Empty diff withrebuild_ok=trueis a contract violation. Since Step 42 there's no intent-log fallback — this single artifact is the canonical record.- On success,
proposed_fix.mdis written bydportsv3.agent.proposed_fixand must reference a non-zero diff. - On escalation (MANUAL tier, retry cap, budget exhaustion, gave-up),
manual_handoff.mdis written bydportsv3.agent.manual_handoffinstead.
Playbook library (Step 27 + Step 42 reframe)
The legacy KEDB (docs/known-errors/) and prompt-embedded pattern content were replaced by a single playbook library at docs/agent-playbooks/ (markdown entries with YAML-subset frontmatter triggers). The selector is dportsv3/agent/playbooks.py::load_playbooks. After Step 42 there are three live filename prefixes:
error-*.md— triggered by triage classification (triggers.classifications:); attached to triage + patch payloads.convert-*.md— attached to convert payloads whentriggers.flowscontainsconvert.toolchain-*.md— triggered by mechanical toolchain detection on the port Makefile (playbooks.py::detect_toolchains— parsesUSES=,GNU_CONFIGURE=, file-presence signals likeCMakeLists.txt,Cargo.toml).flow-patch.md— the single consolidated patch playbook (Step 42 collapsed the 12 deletedintent-*.mdentries into one). Triggered byflows: [patch]. Carries the durable knowledge previously split per-intent: mk-directive traps, scoping judgment (@anyvs@main), the static-patch workflow, broken-patch recovery,PORTREVISIONhandling.
The intents trigger axis and the per-call intent_reference on-demand attachment path are both gone. Every playbook that fires is attached at payload-build time.
Telemetry signal — playbooks_selected activity row. Every payload build emits one. The row carries role (triage / patch / convert), included (list of entry filenames that fired), skipped_count, and a skipped_sample of up to 8 {file, reason} pairs. Fetch via dportsv3 tracker get-activity --job <id> and filter for event=playbooks_selected. Reasons take shapes like flow:patch-not-in-['convert'], classification:'patch-error'-not-in-['compile-error'], toolchains:no-overlap-with-['autoconf'], or budget:N+M>BUDGET.
What to check in an analysis:
- For a triage/patch bundle: did
playbooks_selectedfire, and doesincludedinclude the entries you'd expect given the classification and the port's Makefile? E.g. apatch-errortriage on aUSES=cmakeport should pullerror-*entries gated onpatch-errorplustoolchain-cmake.md. Emptyincludedon a port with a recognized toolchain is a red flag — likely a missing trigger or afind_playbooks_dir()failure. - Every patch bundle should include
flow-patch.md. Its absence on a patch role row points at theflows: [patch]trigger or the selector wiring. - Convert bundles: did
playbooks_selectedwithrole=convertfire and includeconvert-target-directive.mdandconvert-classify-patch-domain.md? - Does the
skipped_samplereveal a likely-buggy entry (e.g. atoolchain-cmake.mdskipped withtoolchains:no-overlap-with-['cmake']when the port's Makefile clearly hasUSES=cmake)? That points atdetect_toolchains()not seeing the Makefile.
Known parser quirks (smoke-testing surface, mention if you spot symptoms):
_parse_inline_listonly parses inline-form YAML ([a, b]). Block-form list (- a\n - b) silently yields(), which means wildcard in the selector. A new entry that "fires on everything" probably has block-form triggers.
Analysis checklist
For each bundle, work through these questions and write the report against them. Skip ones that don't apply, but say so.
1. Pipeline shape
Before drilling into per-job correctness, sketch the pipeline. The canonical flows are:
- Patch only (substrate already dops): triage → patch. One triage session.
- Convert-then-patch: triage → convert → triage (re-classifies against the converted substrate) → patch. Two triage sessions is expected here, not a bug.
- Convert only (substrate-only fix): triage → convert. No patch session.
- Convert failed: triage → convert (gave up / budget). No patch session.
List every job that ran with dportsv3 tracker get-bundle <id> --jobs --json. The jobs array carries each job's type, state, retire_reason. Two triages with identical classifications is normal (convert didn't change the failure shape); two triages with different classifications is a substrate change worth narrating.
2. Triage correctness
- Does the classification match what
logs/errors.txtactually shows? (e.g. a fetch failure misclassified as compile-error is a triage bug.) - Is the confidence appropriate?
- Did snippet rounds happen, and were they useful? (Look for
snippets/round_N/artifacts.) - If two triage rounds ran (pre-convert + post-convert), did classifications change between them? If yes, note why — typically convert promoted compat→dops which changes the substrate's failure expression.
- Record each triage's token usage; the second triage often duplicates the first when convert didn't change failure shape (cheap re-run, expected).
3. Convert correctness (if a convert job ran)
- Status from
analysis/convert_result.json:verified/failed/no_conversion_proof_block.reapply_ok=truemeans convert produced a valid overlay that compose accepts. - Verify
target @anyin the produced overlay. Readanalysis/changes.diffor grep the env'soverlay.dopsfor the header. Anything other thantarget @anyis a regression of the post-2026-05-26 fix (commitsd71f605c206+47846e7a392).@mainin a fresh overlay means every op will silently skip at compose against@2026Q2(perengine/apply.py:296-313). - Deferred patches: list
deferred_patchesfrom convert_result. Each entry says what the dropped framework patch was DOING (intent, not authority). The downstream patch agent should address each one — verify it does. - Tokens, attempts. Convert's budget is tighter than patch's; a convert that hit
budget-exhaustedafter one validate_dops parse error is a known weak spot.
4. Patch correctness
- Did the agent reach
rebuild_ok=true? - Does the fix actually fix the root cause, or did it bypass the problem? (E.g. removing a patch the agent declared obsolete vs. actually adapting it — both may produce
rebuild_ok=true, but only one is right. Cross-checkpatch.md's reasoning against the upstream code it read.) - Did the agent edit
overlay.dops(correct) ordragonfly/*files directly (wrong, theput_fileboundary should refuse — verify it did)? - For each
put_file overlay.dops, wasvalidate_dopscalled before the nextmaterialize_dports? - Did the agent escalate cleanly when blocked by
assess_dopssubstrate gates (e.g. half-migration), or did it keep retrying? - Turn-to-first-meaningful-edit. Count tool turns from session start to the first
put_filethat targetsoverlay.dops(ordragonfly/*on the rare valid case). Floor on a clean success appears to be ~10-15 turns (opening + investigation + dops_reference if writing fresh). Values >20 suggest over-exploration; 0 means the agent never committed to a hypothesis (paralysis — flag). - Self-correction. Count edits that were later reversed within the same attempt (e.g. an
overlay.dopswrite replaced by a corrected version after a failedmaterialize_dports). One self-correction per run is healthy (the agent learned from compose/build feedback). Many self-corrections suggest the agent is thrashing; zero self-corrections paired withbudget-exhaustedmay mean the agent never tried anything concrete enough to fail informatively.
5. Path discipline (scan tool calls)
- Reads of
/work/DPorts/<origin>/...? Perprompts.PATCH_SYSTEMthe agent may NOT read from this path — it's the LOCK ROOT, last-known-good, will disagree with extract output. Note every occurrence. - Reads of
/xports/...or any other chroot-internal path that isn't under/work/? Tools fail withValueError: path must be under /work(worker.py_resolve_path). One occurrence = honest mistake; multiple = the agent didn't read the build log's path notation correctly. - Hand-constructed
/work/obj/<origin>/...paths that didn't come fromextract'swrksrcfield? The prompt explicitly forbids constructing these. Compare againstextract'swrksrcreturn value. - Host-side path leaks: tools returning host paths (e.g.
/root/.cache/dports-dev/...) that the agent then passes to chroot-path-expecting tools.genpatch'soutput_dirreturn was a known case.
6. Materialize cycle signal (P0a/P0b regression check)
For every materialize_dports call (each attempt typically has 1-3), check the summary: line in stdout_tail:
applied=N>0— ops actually applied to the compose tree. Healthy.applied=0withskipped>0— ops were filtered by target mismatch. Expect theI_COMPOSE_DOPS_ALL_OPS_SKIPPEDwarning in the same stage line (per commit663a8eae819). If the warning is present, surface it; the agent should see it too. Ifapplied=0 skipped>0but the warning is absent, that's a P0b regression.applied=0witherrors>0— at least one op failed (parser, executor error). Distinct from skipping; checkdops_failed_op_resultsin compose report.applied=N>0andskipped=0on a freshly-created overlay confirms P0a is functioning (the convert / patch agents emittarget @anyheaders per commitsd71f605c206/47846e7a392).
If you see target @main in any agent-emitted fresh overlay, that's a P0a regression — flag immediately.
7. Build verification (after dsynth_build)
- After
dsynth_buildreturnsrebuild_ok=true, did the agent verify by grepping the extracted source for the original error symptom, or rely on the tool exit code alone? For deterministic failures (linker duplicate symbol, missing-include compile error), trusting exit code is fine. For symptom classes where multiple bug sites can produce the same error (e.g.__resultundefined in N headers), grep-the-symptom catches incomplete fixes. - After
dsynth_buildfailed, diddsynth_logimmediately follow? The prompt says to call it immediately on build failure. If the agent went back to exploring instead, that's a discipline regression. - Did the agent run
dsynth_buildat least once?budget-exhaustedwith zero build calls is the worst possible signal — the agent never tested anything, never learned from substrate feedback, just analyzed.
8. Output contract
- Is
analysis/changes.diffnon-empty whenrebuild_ok=true? - On success: is
proposed_fix.mdpresent and does it reference the diff? - On escalation: is
manual_handoff.mdpresent?rebuild_proof.jsonshould also exist withrebuild_ok=falseand a reason — its absence on a terminal-failure bundle is a regression (see failure modes). - Does the diff actually match what
patch.mdsays was changed?
9. Efficiency — quantitative breakdown
Don't say "the loop was expensive". Break it down:
- Static prompt cost. Sizes of
messages[0](system) +messages[1](initial user). Multiply by turn count for the per-turn ceiling. On a clean ASSIST run this is typically 35-50% of total tokens; on bloated cases (>50KB user prompt) it dominates. - User prompt composition. When the user prompt is >50KB, break it by
## sectionheading and flag oversized sections (>10KB) the agent never read via subsequent tool calls. Common offender:## Port Fileswhich inlines every file under port/; the agent hasget_fileand can pull on demand. - Reasoning_content accumulation. Sum
reasoning_contentbyte sizes across all assistant turns. Note single-turn outliers (>5KB is a "thinking hard" turn — fine on hard ports, suspicious on simple ones). Deepseek thinking-mode requires reasoning_content carry on every subsequent turn, so this accumulates quadratically. - Tool result carry. Identify the top-3 biggest tool returns by byte size. Each
dsynth_logis ~10-16KB. A 16KB result carried across 10 subsequent turns is ~40K tokens. Also watchget_effective_overlayon large multi-target overlays. - Completion. Usually small unless the agent wrote a long Patch Log.
- Sum the per-turn prompt sizes (from the
llm_turnactivity events if present, or estimate as static + cumulative reasoning + cumulative tool results). Compare topatch_result.tokens_total— a big discrepancy may indicate the trace is missing entries.
Other efficiency checks:
- Redundant tool calls — e.g. multiple
emit_diffcalls in a row, ormaterialize_dportscalled twice when once would do. - Did the agent re-read files it had already read?
- Did it call tools with the wrong args (e.g. passing origin where relpath was expected)?
- Did the agent use raw
get_file overlay.dopson a multi-target port instead ofget_effective_overlay? On wide overlays this can double the per-turn carry.
10. Playbook coverage (Step 27)
- Did
playbooks_selectedfire for each role this bundle ran? (triage always; patch if it reached patch; convert if it's a convert bundle.) - Does
includedlook right for the bundle's classification + detected toolchains? Emptyincludedon a port with recognized USES= is a red flag. - For every patch bundle: is
flow-patch.mdinincluded? Its absence is a Step-27/42 wiring regression. - Does the
skipped_samplereveal a likely-buggy entry (e.g. atoolchain-cmake.mdskipped withtoolchains:no-overlap-with-['cmake']when the port's Makefile clearly hasUSES=cmake)? That points atdetect_toolchains()not seeing the Makefile. - Caveat: the text-format
playbooks_selectedactivity row only shows counts (included=N skipped=M), not filenames — which means the "doesincludedlook right" check above is unverifiable from the defaultget-activityoutput. To actually verify which entries fired, pull raw JSON:dportsv3 tracker get-activity --job <id> --jsonand inspect the row'spayloadfield for the filename list. If--jsondoesn't expose it either, fall back tocurl http://<tracker>/api/jobs/<id>/activityand grep forplaybooks_selected. If no surface exposes the filenames, flag as a tracker feature gap and treat the count as a black box.
11. Lifecycle hygiene
- Always run
list-bundles --origin <category/port> --limit 10for the port even if the current bundle looks like a clean one-shot. It's one cheap CLI call and gives ground truth about prior agentic activity on this port. Theport/STATUSfile (when present in the bundle snapshot) is a compat-era artifact that convert deletes — it can hint at upstream version history but is not a substitute for the tracker query and disappears entirely on already-dops ports. - Was this port previously bundled? If so, did the loop converge (older bundles
accepted/verified) or thrash (multipledead/budget-exhaustedover time)? - If MANUAL tier: was the classification one that should have been AUTO/ASSIST?
assert_port_cleanis now unconditional. If the patch job aborted with a clean-check failure, that's an env-state problem (uncommitted edits from a prior run, chroot in unknown state), not a patch agent bug.
Known failure modes (extend this list)
Patterns seen in the wild. When you see a new one, append it here and flag it in your report's "Skill update suggestions" section so the operator folds it in.
- Empty
changes.diffwithrebuild_ok=true. Agent edited files inside the writable overlay but the runner's diff capture didn't pick it up. Suspected causes: overlay isn't a git working tree, or the diff scope path is wrong. Operator gets "agent fixed" with nothing to land. Always flag this as a bug. Confirmed historically for dops-modeput_filewrites tooverlay.dops(devel_gperf-20260523-094119Z) and for freshly-createdoverlay.dopsfiles (multimedia_v4l_compat-20260523-101601Z) — the bug fires on anyput_filewrite the overlay's baseline misses. Verify on every current bundle that the diff is non-empty whenever aput_fileagainstoverlay.dopssucceeded. - Agent passes origin where
emit_diffwants a relpath. Tool signature isemit_diff(env, origin, relpath); the LLM sometimes passes onlyorigin(e.g."devel/gperf"). Either the prompt is unclear or the schema is. - Agent proceeds against a substrate
assess_dopsflagged as not-yet-converted. The runner should refusestate:<state>and escalate; if the trace shows the patch agent continuing past such a block it's a runner/policy bug. - Agent declares a patch "obsolete" based on shallow upstream inspection. Removing a patch and getting a green dsynth is not proof the patch was actually obsolete — it may have addressed a runtime or platform-specific issue dsynth doesn't catch. Flag when the agent's justification is thin.
- Wasted
get_fileturn from a mis-guessed offset. When inspecting a C source file for include directives, the agent sometimes first reads from a non-zero offset and then re-reads from the top, burning a turn. Steer the agent towardgrepfor include-presence checks instead of speculative offset reads. rebuild_proof.jsonmissing on budget-exhaustion / give-up bundles. Expected per the success-gate contract: even on terminal failure the runner should emit a proof JSON withrebuild_ok=falseand a reason. Operator skimming the artifact list can't distinguish a clean "agent gave up" from "agent crashed mid-attempt." Flag wheneverpatch_audit.jsonreports a terminal status butrebuild_proof.jsonis absent.- Knowledge gap:
.for-parsed Makefile list variables and value-with-spaces. Variables likeBINARY_ALIAS,MAKE_ENV,PLIST_SUBare iterated by.for var1 var2 in ${VAR}which tokenizes on whitespace and expects N words per row. A value with an embedded space (e.g.BINARY_ALIAS=gmd5sum=md5 -r) produces the compose errorWrong number of words (N) in .for substitution list with M variables. The agent typically misdiagnoses by toggling flags rather than escaping the value or switching to a wrapper script. Worth a dedicatederror-for-substitution-list.mdplaybook. - Attempt-boundary amnesia. When attempt 1 fails on a specific value and attempt 2 receives a fresh context, the agent sometimes re-emits the exact same edit that already failed in attempt 1. After Step 42 the failure-context message is just the last 2KB of attempt-1 prose, so if attempt 1's prose didn't surface the specific edit that broke, attempt 2 has no signal. Flag whenever attempt N writes the same
overlay.dopscontent as attempt N-1. - Premature
materialize_dportson the consumer origin before the provider overlay is activated. When a port usesMASTERDIR(or otherwise shares compose artifacts with a sibling origin), the agent sometimes materializes the consumer origin immediately after writing the dops overlay for the provider, before materializing the provider itself. Compose runs against the wrong origin, showsmodes: compat=1, and the wasted call is only caught because the agent then self-corrects with a second call to the right origin. Seen inmultimedia_v4l_compat-20260523-101601Z. Prompt should steer the agent to alwaysmaterialize_dportsthe origin that ownsoverlay.dopsfirst. - Target-mismatch ghost (
target @mainin fresh overlay → all ops silently skipped). When an agent emits a freshoverlay.dops, the header MUST betarget @any. Compose runs against@2026Q2(or whatever the env's build target is); perengine/apply.py:296-313every op withtarget=@mainis filtered withstatus="skipped"and aninfo-levelI_APPLY_TARGET_MISMATCHdiagnostic that didn't bubble to stage output. Thesummary applied=0reads as "patch didn't take" and the agent typically diagnoses it as a compose bug, burning hundreds of K tokens chasing the ghost. Mitigated by commit663a8eae819(compose stage warningI_COMPOSE_DOPS_ALL_OPS_SKIPPED). When analyzing a bundle: verify the fresh overlay hastarget @anyand that the warning fires on any dead overlay. Confirmed historically in skalibs / libfyaml / gnome_subr 20260601 bundles. - Analysis paralysis — 0 meaningful edits, 0 dsynth_build calls, full ASSIST budget consumed. Agent investigates indefinitely without committing a hypothesis. Often correlated with: (a) a complex
## Deferred from Convertsection that invites verdict-first investigation, (b) a port class where the agent can't easily map the bug to a dops edit shape, (c) the prompt's "4+ tool calls without an edit = drifting" rule failing to fire (it's soft, no enforcement). The agent never gets concrete substrate feedback because it never tested anything. Reasoning_content can hit 50K+ chars total across the session. Confirmed historically inlang_python311-20260601-222113Z. Flag: count ofput_file overlay.dopsand count ofdsynth_buildcalls; both zero withbudget-exhaustedis the signature. - Static-prompt bloat from
## Port Filessection. The runner inlines every file underport/<origin>/into the user prompt regardless of whether the agent will read it. On ports with manyfiles/patch-*+ a giant pkg-plist (python311's was 533KB; the inlined section was 48KB of a 96KB user prompt), this section can dominate the static prompt and re-ship 10-12K tokens per turn for files the agent never queries. The agent hasget_fileand can pull on demand; pre-emptive inlining pays a quadratic cost. Confirmed historically inlang_python311-20260601-222113Z. - Raw
get_file overlay.dopson multi-target overlays instead ofget_effective_overlay. The raw read returns ops for every target; the agent then has to manually filter by the env's compose target. On wide overlays this is both error-prone (silent inclusion of out-of-scope ops in the agent's mental model) and expensive (per-turn carry of unscoped ops). Patch-flow toolget_effective_overlayreturns the filtered view as structured data. Flag any patch trace that reads raw and never reads effective. - Proof-block orphan — successful build but budget cut off before the proof JSON was written. Distinct from plain
budget-exhausted:dsynth_buildreturnedrebuild_ok=trueand the agent attempted one more tool call (typicallyemit_diffor a finalget_file), but the budget check after the LLM turn refused dispatch. The runner then synthesizes arebuild_ok=falseproof, writesmanual_handoff.mdinstead ofproposed_fix.md, and classifies the run asbudget-exhausted— misrepresenting a correct fix as a failure. Detection signature:rebuild_proof.json.synthetic=trueANDchanges.diffis non-empty ANDmanual_handoff.mdexists AND the tool_trace shows a successfuldsynth_buildfollowed by one more tool call. When you see this, surface it as[high]and tell the operator the fix is actually applicable; the handoff document will be misleading because its "Last Failing Build" section reflects the pre-success state, not the actual final outcome. Confirmed indevel_nspr-20260606-001249Z(budget overrun ~59K tokens, ~5% over a 1.2M ceiling). Calibration signal: patch-error bundles with N stale patches appear to need ~N × headroom; the current ASSIST budget may be undersized for multi-patch decay. genpatch→install_patchesalways fails.genpatchreturns anoutput_dirvalue that is a chroot-internal path (/root/.cache/dports-dev/envs/<env>/writable/work/genpatch-out).install_patchesthen looks for files at that path host-side, where it does not exist, and raisesFileNotFoundError. Every trace that callsinstall_patchesaftergenpatchwill hit this. The correct post-genpatchsequence isget_file <patch_path under wrksrc>→put_file dragonfly/patch-<...>to write the patch directly into the repo. Agents that find this on their own typically burn ~2 turns + ~50K tokens recovering; agents that don't may loop. Flag any trace containinginstall_patchesand note whether the agent self-corrected. Real fix is either changinggenpatch's return to the host-side sidecar path, or removinginstall_patchesfrom the tool surface entirely. Confirmed indevel_nspr-20260606-001249Zturns 8-10.
Historical (pre-Step-42) failure modes
These were intent-flow-specific (the deleted edit-intent layer: apply_intent/intent_reference tools, intent_log.json artifact, per-directive intent renderers and schemas, intent-*.md playbooks). They no longer apply to current bundles, but if you're analyzing an archived bundle from before commit 3788ed20b58 (2026-06-06) you may still see these patterns. The list is also preserved as forensic context for why the layer was deleted.
- Intent-flow: agent retries past
intent_log_full=True. Should have escalated to MANUAL — the log either hit count or byte caps. - Intent-flow: agent ignores
blocked_by: substrate_invariant. Half-migrated substrate had to be resolved by an operator or convert agent before patch intents could land; ignoring the block kept getting refused. - Intent-flow: agent ignores
blocked_by: transaction_mode_drift. Once the firstapply_intentpinnedmode_at_apply, subsequent calls in a different mode were refused. - Intent-flow: substrate_diff disagrees with the rendered changes.diff. Concat of ok=true
substrate_diffvalues should have equalledchanges.diff; drift meant either the diff accumulator was buggy or the canonical-log path was bypassed (canonical_log_broken=true). - Intent-flow: patch job aborted with
patch_preflight_dirtyorpatch_preflight_error. Now subsumed under the unconditionalassert_port_cleanpreflight — same diagnosis, the env had uncommitted edits or was in an unknown state. - Intent-flow:
change_makefile op=setappended duplicatemk setlines. Two sequentialsetintents on the same key produced additive substrate_diffs and a final overlay with duplicatemk set KEYlines; the agent rationalized "last set wins" inpatch.md. Confirmed indatabases_redis-20260526-205826Z. One of the structural reasons the layer was deleted. - Intent-flow:
add_patchfor a wrksrc-only target shipped the wrong overlay shape. Should have beenadd_file kind=materialize; agent typically self-corrected after 2-3 turns. Confirmed indevel_libuv-20260601-222117Z. - Intent-flow:
drop_patchleft the patch file orphaned on disk. Removed the overlay reference but not the file underdragonfly/; a subsequentadd_patchwith the same target failed with "patch already exists." Confirmed indevel_libuv-20260601-222117Z. - Intent-flow: pre-emptive
intent_referencebatching. Agent fetched references for 3-5 intent types upfront before committing to any. Each unused reference was 3-6KB of context carry. Confirmed indevel_libuv-20260601-222117Z. - Step-27 telemetry:
playbooks_selectedactivity rows missing. Confirmed absent indatabases_redis-20260526-205826Z. Cause traced toqueue_roothandling in_log_playbook_selection. Worth a spot-check on current bundles too — ifplaybooks_selectedis still missing post-Step-42 the same regression survived.
Report shape
Produce something like this (markdown, no fluff):
# Agentic analysis — <origin> (<bundle-id>)
## Summary
<2-3 sentences: what the agent tried to do, did it land, is the result trustworthy>
## Pipeline
<which jobs ran, in order, with state>
- triage-1: <classification> (<tokens>)
- convert: <status> (<tokens>) ← if convert ran
- triage-2: <classification> (<tokens>) ← if convert ran; same/different from triage-1?
- patch: <status>, attempts=N (<tokens>)
## Triage
- Classification: <X> (confidence: <Y>) — <assessment: correct / questionable / wrong>
- Root cause as stated: <quote>
- Actual root cause from logs: <if different>
- Round changes: <if 2 triages, same/different conclusion, why>
## Convert (if convert ran)
- Status: <verified / failed / no-op>
- reapply_ok: <true / false>
- Overlay target directive: <@any (correct) / @main (REGRESSION) / other>
- Deferred patches: <count> [<paths>]
- Tokens: <prompt / completion / total>
## Patch
- Status: <success / needs-help / budget-exhausted / blocked-by-substrate>
- Attempts: <N> / tier max
- Tokens: <prompt / completion / total>
- Tool sequence: <one-line summary of the trace>
- overlay.dops edits: <N put_file writes, each followed by validate_dops? Y/N>
- Fix narrative: <what patch.md claims>
- Fix verdict: <is the fix real?>
## Per-bundle metrics
- Turn-to-first-meaningful-edit: <N> (floor ~10-15 on clean success; 0 = paralysis)
- overlay.dops edits emitted: <N> / reversed mid-attempt: <M> (self-correction count)
- dsynth_build calls: <N> (must be ≥1 for any meaningful run)
- After dsynth_build success/fail, verification approach: <tool exit code only / grep extracted source for symptom / no build run>
## Materialize signal (P0a/P0b regression check)
- materialize_dports calls: <N>
- Each call's summary: <applied=N skipped=M errors=K, warnings present>
- I_COMPOSE_DOPS_ALL_OPS_SKIPPED: <absent on every materialize (healthy) / present on N calls (overlay target mismatch)>
- Any fresh overlay with `target @main`? <yes/no> (yes = P0a regression — flag immediately)
## Path discipline
- /work/DPorts/<origin> reads: <count> (forbidden; lock root)
- /xports/ or other non-/work paths passed to chroot tools: <count>
- Hand-constructed /work/obj paths not derived from extract.wrksrc: <count>
- Host-side path leaks (e.g. genpatch output_dir confusion): <count>
- Raw `get_file overlay.dops` on a multi-target port (should use get_effective_overlay): <count>
## Output contract
- changes.diff: <bytes> — <ok / empty-bug / mismatched>
- proposed_fix.md: <present and references diff / present but stale / absent on success (bug)>
- manual_handoff.md: <present on escalation / absent on escalation (bug)>
- rebuild_proof.json: <present / absent on terminal failure (bug)>
## Playbooks (Step 27 + Step 42)
- Triage: included=<list or "—">, skipped=<count> — <looks right / suspicious / missing event>
- Patch: included=<list or "—">, skipped=<count>, flow-patch.md present? <yes/no>
- Convert (if convert bundle): included=<list>, skipped=<count> — <…>
## Token shape
- Static prompt (system + user): <bytes> ≈ <tokens>/turn × <N> turns = <subtotal>
- User prompt composition (if >50KB): break down by `## section`, flag sections >10KB the agent never read
- Reasoning_content cumulative: <chars>; single-turn outliers >5KB: <list>
- Top-3 biggest tool returns: <name, bytes>
- Estimated breakdown by source vs. actual total from `patch_result`: <static/reasoning/tool_carry/completion percentages>
## Inefficiencies
- <bullets>
## Bugs / contract violations
- <bullets, each tagged with severity: low / medium / high>
## Lifecycle context
- Prior bundles for this port: <count, dates>
- Loop behavior: <one-shot / retrying / thrashing>
## Skill update suggestions
- <anything this analysis surfaced that this SKILL.md should have warned about>
Keep it terse. The operator skims this. Per-bundle metrics + materialize signal + path discipline are the minimum data set — every bundle gets these even when nothing surprising shows up, so we can spot drift over time.
Delegating to a subagent
If the bundle is large or you have several to analyze, spawn a sonnet 4.6 subagent:
Agent({
subagent_type: "general-purpose",
model: "sonnet",
description: "Analyze agentic bundle <id>",
prompt: "<paste the Expected behavior + Known failure modes + Report shape sections from
.claude/skills/dportsv3-agentic-analysis/SKILL.md, plus the bundle ID and tracker
base URL>. Fetch the artifacts via /api/bundles/<id>/artifacts/<path> with curl
(HTTP only, no HTTPS upgrade). Produce the report in the shape specified."
})
Read the returned report, sanity-check the bug claims against the raw artifacts if anything looks wrong, then hand it to the user.