name: elon description: "Evaluate a codebase or diff through the Musk engineering algorithm — question requirements, DELETE, simplify, accelerate cycle time, automate last — scored against hard git/grep evidence, not impressions. Produces an evidence-grounded 6-criterion scorecard (deletion ratio, idiot index, requirements sanity, cycle time, automate-last, vertical ownership) plus a VERIFIED, import-checked, tiered deletion list with per-cut coupled-edit checklists. Orchestrates the built-in /code-review command (find what's broken) and /simplify command + code-simplifier agent (reduce mass, apply cuts) as phases inside the algorithm. Optionally renders the report via html-artifact. Use when the operator says 'put the Elon hat on', 'elon', 'musk hat', 'first-principles audit', 'what can we delete', 'deletion ratio', 'idiot index', 'is this codebase too big', 'what should we delete', 'audit this codebase for bloat', 'find dead/duplicate code to cut', or pairs delete/cut/prune/simplify with first-principles/Musk/should-this-exist language. NOT for finding correctness bugs alone (use /code-review), pure style cleanup (use /simplify), agent-harness scoring (agentharness-audit), or project-quality/CLAUDE.md freshness (/audit)." allowed-tools: Read, Edit, Bash, Skill, Agent, Write
elon — first-principles codebase evaluation (the Musk hat)
The operator wants the opposite of a conventional review: not "is this well-written?" but "how much of this should exist at all?" This skill scores a codebase or diff against Musk's engineering algorithm using hard git/grep evidence, then hands back a scorecard and a deletion list where every cut is import-verified and carries its coupled-edit checklist.
Self-validate after edits. Run
./scripts/validate.shfrom the skill directory.
Decision aid — the algorithm IS the phase order (run in sequence)
| # | Phase | Question | Tool / sub-skill | Evidence produced |
|---|---|---|---|---|
| 1 | Requirements sanity | Should this exist? Who asked? | grep dead/unused surfaces; read owner docs | unsupported lanes, orphan features |
| 2 | Delete | What can be removed? | scripts/evidence.py + import-trace |
deletion ratio, dead code, duplicate surfaces |
| 3 | Simplify | Is complexity earned? | /simplify + code-simplifier agent |
idiot-index hotspots, over-abstraction |
| 4 | Accelerate | How fast idea→prod? | time the test/build/import | cycle-time numbers |
| 5 | Automate (last) | Only after 1–4 | review tooling/codegen mass | automation atop deletable surfaces |
| — | Correctness pass | Is what remains broken? | /code-review |
bugs in code that survives the cut |
Phases 3, 5, and the correctness pass wrap the built-in commands; phases 1, 2, 4 are the hat's own evidence work. Full procedure + exact recipes: references/operate.md.
Branch out for speed. The evidence work in phases 1–2 is embarrassingly parallel — fan it out across concurrent read-only Explore subagents (one subsystem each, ~5–7 per turn), each returning only file:line + a one-line verdict. /simplify and /code-review go through the Skill tool (sequential; /simplify already parallelizes internally). See references/operate.md § Branch out.
Hard rules
- Evidence over impressions. Every grade and every LOC figure traces to a command (git numstat,
wc -l, route/import grep). No estimates in the output — if unmeasured, write—. This is the whole point of the hat. - Import-trace before any verdict — and a grep hit is not an import. A surface is "delete" only after grepping its importers outside itself and tests, and "keep" only after a real importer is confirmed. Open every matching line: a hit can be a string literal, error-code, or attribute name, not an
import. The trap cuts both ways — the precedent run over-claimed a −3,628 LOC delete that import-tracing cut to ~1,164, and a later run wrongly KEPT a dead 225-LOC gate for two passes because the string literal"missing_runtime_boundary"matched its name. Two cross-checks that catch both: (a) a module with 0 importers while its peers have 6–19 is the dead tell; (b) before calling anything DEAD, check dynamic dispatch (importlib, by-name registry loading likef"...{name}", entry points) — static grep can't see it (a script loaded by name inexecutor.pylooked orphan but was live). Shared-but-mislocated ≠ dead. Seereferences/agent-handbook.md. - Every cut ships its coupled-edit checklist. A delete that leaves callers, factory branches, config refs, or tests red is a break, not a delete. List the same-commit edits per cut (repo rule: behavioral change → test update in the same commit).
- Order is law: delete before simplify, simplify before automate. Never praise automation built atop a surface that should be deleted (Musk's Tesla-line trap). Judge phase 5 only after 1–4.
- Legitimate axis ≠ duplication. A split along a real product axis (per-runtime, per-SDK) is kept; a split that is not a product axis (duplicate apps, dead lanes) is a cut. Ask which axis before flagging.
- State the blind spots. The hat under-weights correctness, security, and long-horizon maintainability by design. The output must carry that caveat so it pairs with a conventional review, not replaces it.
Closeout — mandatory
- Emit the scorecard (6 criteria, each graded 0–10 with the evidence that set it) and the tiered deletion list (per cut: files, verified LOC, coupled-edit checklist, confirm-first caveats).
- Always render the report as an HTML artifact — it is a standard deliverable of this skill, not optional. Use the
html-artifactreport lane (via theSkilltool), save todocs/audits/<repo>-musk-hat-<YYYY-MM-DD>.html, embed an#artifact-dataJSON block (scorecard + tiered list + verification refutations) so numbers stay machine-readable, andxdg-openit. Precedent:docs/audits/musk-hat-audit-2026-05-30.html. - Render the artifact AFTER any approved execution, not before. If the operator approves and you execute cuts in the same session, the artifact must reflect outcomes (which cuts shipped, which verdicts changed under deeper tracing) — a report rendered as a recommendation snapshot then left unedited goes stale and self-contradictory (e.g. listing a "KEEP" for something you then deleted). Render last, or update it post-cut before declaring done.
- Route findings per repo convention (builder-side →
docs/goal/ROADMAP.mdorbuilder backlog; app-side → managed-app backlog). Do not execute deletions unless the operator approves a tier.
Load references on need
| When | Load |
|---|---|
| Running the phases (exact recipes; how to invoke /code-review & /simplify) | references/operate.md |
| Grading a criterion / what earns each score / the caveat text | references/best-practices.md |
| Editing this skill, the precedent run, the over-claim lesson | references/agent-handbook.md |
| Gathering quantitative evidence deterministically | scripts/evidence.py — deletion ratio, LOC, largest files, dead-import candidates |