solidity-auditor - SKILL.md Agent Skill

name: solidity-auditor description: Security audit of Solidity code while you develop. Trigger on "audit", "check this contract", "review for security". Modes - default (full repo) or a specific filename.

Smart Contract Security Audit

You are the orchestrator of a parallelized smart contract security audit.

Mode Selection

Exclude pattern: skip directories interfaces/, lib/, mocks/, test/ and files matching *.t.sol, *Test*.sol or *Mock*.sol.

Default (no arguments): scan all .sol files using the exclude pattern. Use Bash find (not Glob).
$filename ...: scan the specified file(s) only.

Flags:

--file-output (off by default): also write the report to a markdown file (path per {resolved_path}/report-formatting.md). Never write a report file unless explicitly passed.

Orchestration

Turn 1 — Discover. Print the banner, then make these parallel tool calls in one message:

a. Bash find for in-scope .sol files per mode selection b. Glob for **/references/hacking-agents/shared-rules.md — extract the references/ directory (two levels up) as {resolved_path} c. ToolSearch select:Agent d. Read the local VERSION file from the same directory as this skill e. Bash curl -sf https://raw.githubusercontent.com/pashov/skills/main/solidity-auditor/VERSION f. Bash mktemp -d ./.audit-XXXXXX → store as {bundle_dir}

If the remote VERSION fetch succeeds and differs from local, print ⚠️ You are not using the latest version. Please upgrade for best security coverage. See https://github.com/pashov/skills. If it fails, skip silently.

Turn 1b — Model selection (Claude Code only). This turn applies ONLY when both AskUserQuestion and the Agent tool (with a model parameter) are available in your runtime — i.e., Claude Code. On Codex, Gemini, Cursor's native agent, or any runtime without these, SKIP this turn entirely, leave {agent_model} unset, and proceed to Turn 2. Do NOT emit the question as prose. Do NOT substitute any other mechanism.

On Claude Code:

Read your system prompt to detect your own model family (Opus, Sonnet, or Haiku). Ignore the version digits — the Agent tool's model parameter takes the family name ("opus" / "sonnet" / "haiku"), and the runtime resolves to the latest version in that family.

Call AskUserQuestion with:

Question: "Which Claude model should the 12 audit agents use?"
Three single-select options. Mark the orchestrator's own family as (Recommended) and place it first.
On each option, set the description field to latest.
On each option, set the preview field verbatim (preserve all whitespace exactly — the box widths must stay equal across all three):

Opus preview:

┌──────────────────────────────────────────────────────────┐
│  opus  ·  highest reasoning  ·  most expensive           │
└──────────────────────────────────────────────────────────┘

Sonnet preview:

┌──────────────────────────────────────────────────────────┐
│  sonnet  ·  balanced reasoning  ·  mid cost              │
└──────────────────────────────────────────────────────────┘

Haiku preview:

┌──────────────────────────────────────────────────────────┐
│  haiku  ·  lowest reasoning  ·  cheapest                 │
└──────────────────────────────────────────────────────────┘

Store the runner's choice as {agent_model}. If no answer, default to the orchestrator's own model.

Turn 2 — Prepare. In one message, make parallel tool calls: (a) Read {resolved_path}/report-formatting.md, (b) Read {resolved_path}/judging.md.

Then build all bundles in a single Bash command using cat (not shell variables or heredocs):

{bundle_dir}/source.md — ALL in-scope .sol files, each with a ### path header and fenced code block.
Agent bundles = source.md + agent-specific files:

Bundle	Appended files (relative to `{resolved_path}`)
`agent-1-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/math-precision-agent.md` + `hacking-agents/shared-rules.md`
`agent-2-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/access-control-agent.md` + `hacking-agents/shared-rules.md`
`agent-3-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/economic-security-agent.md` + `hacking-agents/shared-rules.md`
`agent-4-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/execution-trace-agent.md` + `hacking-agents/shared-rules.md`
`agent-5-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/invariant-agent.md` + `hacking-agents/shared-rules.md`
`agent-6-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/periphery-agent.md` + `hacking-agents/shared-rules.md`
`agent-7-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/first-principles-agent.md` + `hacking-agents/shared-rules.md`
`agent-8-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/asymmetry-agent.md` + `hacking-agents/shared-rules.md`
`agent-9-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/boundary-agent.md` + `hacking-agents/shared-rules.md`
`agent-10-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/numerical-gap-agent.md` + `hacking-agents/shared-rules.md`
`agent-11-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/trust-gap-agent.md` + `hacking-agents/shared-rules.md`
`agent-12-bundle.md`	`source.md` + `senior-auditor-sop.md` + `hacking-agents/flow-gap-agent.md` + `hacking-agents/shared-rules.md`

Each bundle = source.md + SOP + specialty + shared-rules. Agents read the bundle; no Read/Grep needed for the initial scan. Targeted Read/Grep allowed for cross-file investigation.

Print line counts for every bundle and source.md. Do NOT inline source code into the Agent call prompt itself.

Turn 3a — Spawn all 12 agents. In one message, spawn all 12 agents as parallel BACKGROUND Agent calls (run_in_background=true). If Turn 1b set {agent_model}, pass model={agent_model} on every Agent call. If {agent_model} is unset (Turn 1b skipped — Codex, Gemini, others), omit the model parameter entirely — do NOT substitute any default. The orchestrator will receive a notification when each agent completes — do NOT poll or sleep. Single phase, no later spawns. Proceed to Turn 3b only after all 12 have notified completion.

Agents 1–9 use the single-specialty prompt (Turn 3a-i). Agents 10–12 use the gap-hunter prompt (Turn 3a-ii).

Turn 3a-i — Single-specialty prompt template (agents 1–9, substitute real values):

You are an attacker. Your specialty, mindset, source, and output rules
are in your bundle. Read it fully before producing findings.

Read first:
- {bundle_dir}/agent-N-bundle.md (XXXX lines) — source + SOP + specialty + shared rules.

The bundle contains all in-scope source. Do NOT re-read in-scope files
for the initial scan. Use Read/Grep only for cross-file searches or
out-of-scope context (interfaces/, lib/, mocks/, test/).

What a finding looks like:
- file, function
- root cause — the one-sentence code-level defect
- minimal fix — the smallest change that eliminates the defect
- proof — concrete numbers, a trace, or quoted code

Without concrete proof, it's a LEAD, not a finding. Leads are honest
about what you couldn't verify — they're not failures, they're
calibration. Emit them.

Don't skim. Don't trust your first read. Trust your discomfort.

Output format: see shared-rules.md inside your bundle.

Turn 3a-ii — Gap-hunter prompt template (agents 10–12, substitute real values):

You are an attacker. Your gap-hunter specialty, mindset, source, and
output rules are in your bundle. Read it fully before producing findings.

Read first:
- {bundle_dir}/agent-N-bundle.md (XXXX lines) — source + SOP + gap-hunter specialty + shared rules.

The bundle contains all in-scope source. Do NOT re-read in-scope files
for the initial scan. Use Read/Grep only for cross-file searches or
out-of-scope context (interfaces/, lib/, mocks/, test/).

What a finding looks like:
- file, function
- seam — which two or three lenses combine
- root cause — the one-sentence code-level defect that lives at the seam
- minimal fix — the smallest change that eliminates the defect
- proof — concrete numbers, a trace, or quoted code showing the seam

Without concrete proof of the seam, it's a LEAD, not a finding.
Leads are honest about what you couldn't verify — they're not failures,
they're calibration. Emit them.

Don't skim. Don't trust your first read. Trust your discomfort.

Output format: see shared-rules.md inside your bundle (gap-hunter-specific
output fields are in your specialty file).

Turn 3b — Wait for all 12 agents to complete. Once every one of the 12 spawned agents has notified completion, proceed to Turn 4. Do NOT proceed to dedup until every agent has finished — let them run to natural completion. Do NOT poll or sleep; act only on completion notifications.

Turn 4 — Deduplicate, validate & output. Single-pass: deduplicate all agent results, gate-evaluate, and produce the final report in one turn. Do NOT print an intermediate dedup list — go straight to the report.

Dedup. Parse every FINDING and LEAD from the 12 agents. Group by group_key (Contract | function | bug-class). Exact-match first; merge synonymous bug_class within same (Contract, function). Keep best per group, number sequentially, annotate [agents: N].

MANDATORY — Wide-description (group_key). Merged group with distinct mechanisms (different fix:, code-level cause, or attack path) MUST list every mechanism. No dropping. Same function can have multiple coexisting bugs at the same group_key — all MUST appear.

MANDATORY — Function-level second pass (after group_key dedup). Run at (Contract, function), ignoring bug_class. Agents often label coexisting bugs with different bug_class tags but reference multiple mechanisms in the body. For every (Contract, function) with multiple final findings: scan body (description, path, proof, fix) of every constituent for distinct mechanisms across bug_class boundaries. Every mechanism in any constituent body MUST appear in ≥1 final finding.

MANDATORY — Function isolation (HARD). NEVER merge across different function: fields. Dedup only within (Contract, function). Different function = different bug. Second pass above stays WITHIN (Contract, function), never across.

MANDATORY — Fix preservation (HARD GATE). Before writing merged fix: on a multi-finding (Contract, function):
1. Collect every raw fix: from agents flagging the tuple.
2. Group by ADD-lines (+ lines, or equivalent require/assignment).
3. Distinct if ADD-lines differ in: called function/expression (e.g., require(msg.value == amount) vs require(zrc20 != _ETH_ADDRESS_)), check direction (validate/restrict/ban), or checked parameter.
4. ≥2 distinct → present as Option A, B, … — one block per distinct fix, verbatim from agent text (no paraphrase).
5. Label intuitively: validate / restrict / allow-and-handle / ban-path.
Output format when 2+ distinct fixes exist:
```
**Fix (Option A — <label>)**:

```diff
<verbatim diff from raw agent N1's fix>
```
Fix (Option B — ):
```
<verbatim diff from raw agent N2's fix>
```
```
**Inline check before printing**: count distinct fixes from raw for this (Contract, function). ≥2 distinct but merged shows 1 → violation, add alternatives.

**MANDATORY — Completeness (HARD GATE).** Before print: list every unique (Contract, function, bug-class) in any raw FINDING/LEAD across the 12 agents. Every unique (Contract, function) MUST have ≥1 item in final. Zero = silent drop, fix it. Multiple bug-class within same (Contract, function) MAY collapse to one item (wide-description), but the (Contract, function) MUST survive. Print inline before report: `Completeness: N unique (Contract, function) in raw, N covered in final.`

Composite chains: if A's output feeds B's precondition AND combined impact > either alone, add `Chain: [A] + [B]` at conf = min(A, B). Most audits: 0–2.
```
Gate. Run each deduped finding through the four gates in judging.md (no skip, no reorder, no revisit after verdict).

Single-pass: every relevant code path ONCE in fixed order (constructor → setters → swap → mint → burn → liquidate). One-line verdict: BLOCKS / ALLOWS / IRRELEVANT / UNCERTAIN. UNCERTAIN = ALLOWS. Commit, no re-examination.
Lead promotion / rejection.
- LEAD → FINDING (conf 75) if: full exploit chain in source, OR [agents: 2+] demoted (not rejected) same issue.
- [agents: 2+] does NOT override a code path that interrupts attack before harm — demote to LEAD if execution uncertain.
- No deployer-intent reasoning — what code allows, not how deployer might use it.
Format/print per report-formatting.md. Exclude rejected. If --file-output: also write to file. Do NOT re-read source to "verify the most critical claim" — agents did that, dedup filtered. Re-verification costs ~5min, rarely changes verdicts. Skip.
Auto-clean. After print (and --file-output write): rm -rf {bundle_dir}. Bundle dir = transient build state, not an artifact. Don't skip. For debugging: copy bundle elsewhere before re-running.

Banner

Before doing anything else, print this exactly:


██████╗  █████╗ ███████╗██╗  ██╗ ██████╗ ██╗   ██╗     ███████╗██╗  ██╗██╗██╗     ██╗     ███████╗
██╔══██╗██╔══██╗██╔════╝██║  ██║██╔═══██╗██║   ██║     ██╔════╝██║ ██╔╝██║██║     ██║     ██╔════╝
██████╔╝███████║███████╗███████║██║   ██║██║   ██║     ███████╗█████╔╝ ██║██║     ██║     ███████╗
██╔═══╝ ██╔══██║╚════██║██╔══██║██║   ██║╚██╗ ██╔╝     ╚════██║██╔═██╗ ██║██║     ██║     ╚════██║
██║     ██║  ██║███████║██║  ██║╚██████╔╝ ╚████╔╝      ███████║██║  ██╗██║███████╗███████╗███████║
╚═╝     ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝ ╚═════╝   ╚═══╝       ╚══════╝╚═╝  ╚═╝╚═╝╚══════╝╚══════╝╚══════╝