autoresearch - SKILL.md Agent Skill

name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat forever until stopped. version: 1.0.0

Claude Autoresearch — Autonomous Goal-directed Iteration

Inspired by Karpathy's autoresearch. Applies constraint-driven autonomous iteration to ANY work — not just ML research.

Core idea: You are an autonomous agent. Modify → Verify → Keep/Discard → Repeat. Never stop.

When to Activate

User invokes /autoresearch or /ug:autoresearch
User says "work autonomously", "iterate until done", "keep improving", "run overnight"
Any task requiring repeated iteration cycles with measurable outcomes

Setup Phase (Do Once)

Read all in-scope files for full context before any modification
Define the goal — What does "better" mean? Extract or ask for a mechanical metric:
- Code: tests pass, build succeeds, performance benchmark improves
- Content: word count target hit, SEO score improves, readability score
- Design: lighthouse score, accessibility audit passes
- If no metric exists → define one with user, or use simplest proxy (e.g. "compiles without errors")
Define scope constraints — Which files can you modify? Which are read-only?
Create a results log — Track every iteration (see references/results-logging.md)
Establish baseline — Run verification on current state. Record as iteration #0
Confirm and go — Show user the setup, get confirmation, then BEGIN THE LOOP

The Loop (Runs Forever)

Read references/autonomous-loop-protocol.md for full protocol details.

LOOP FOREVER:
  1. Review: Read current state + git history + results log
  2. Ideate: Pick next change based on goal, past results, what hasn't been tried
  3. Modify: Make ONE focused change to in-scope files
  4. Commit: Git commit the change (before verification)
  5. Verify: Run the mechanical metric (tests, build, benchmark, etc.)
  6. Decide:
     - IMPROVED → Keep commit, log "keep", advance
     - SAME/WORSE → Git revert, log "discard"
     - CRASHED → Try to fix (max 3 attempts), else log "crash" and move on
  7. Log: Record result in results log
  8. Repeat: Go to step 1. NEVER STOP. NEVER ASK "should I continue?"

Critical Rules

NEVER STOP — Loop until manually interrupted. User may be asleep.
Read before write — Always understand full context before modifying
One change per iteration — Atomic changes. If it breaks, you know exactly why
Mechanical verification only — No subjective "looks good". Use metrics
Automatic rollback — Failed changes revert instantly. No debates
Simplicity wins — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
Git is memory — Every kept change committed. Agent reads history to learn patterns
When stuck, think harder — Re-read files, re-read goal, combine near-misses, try radical changes. Don't ask for help unless truly blocked by missing access/permissions

Principles Reference

See references/core-principles.md for the 7 generalizable principles from autoresearch.

Adapting to Different Domains

Domain	Metric	Scope	Verify Command
Backend code	Tests pass + coverage %	`src/*/.ts`	`npm test`
Frontend UI	Lighthouse score	`src/components/**`	`npx lighthouse`
ML training	val_bpb / loss	`train.py`	`uv run train.py`
Blog/content	Word count + readability	`content/*.md`	Custom script
Performance	Benchmark time (ms)	Target files	`npm run bench`
Refactoring	Tests pass + LOC reduced	Target module	`npm test && wc -l`

Adapt the loop to your domain. The PRINCIPLES are universal; the METRICS are domain-specific.