name: rde-eval description: Run the react-doctor-evals (RDE) harness to test a local rule change against the OSS corpus — locally for fast iteration, or fanned out to the cloud (Vercel Sandbox) for scale. Use after rule tests pass, when checking a new or changed rule for false positives across real repos, or whenever rule-validate's "RDE Rule Validation" step calls for it.
RDE Eval
Test a react-doctor rule change against thousands of OSS repos with the
react-doctor-evals (RDE) harness. Two modes:
- local — spawn react-doctor as a child process on your laptop. Fast, no creds, uses your working tree (incl. uncommitted edits).
- cloud — Vercel Sandbox microVMs (4 vCPU / 8 GB). Fan out to 50+ repos in parallel. Requires a pushed branch.
Validating an oxlint lint rule? Use local. Observed: cloud runs currently fire only the project-level analyzers (dead-code, supply-chain, security — ~15 rules) and not the oxlint AST rule layer (the 100+ per-element rules in
oxlint-plugin-react-doctor). A 139-repo cloud run produced 15 distinct rules / 0 oxlint hits; the same corpus locally produced 150+ distinct rules / 1000s of hits. So for any rule in the oxlint plugin, cloud will report a misleading 0 — run it local. Cloud is still valid for dead-code / supply-chain / security rules. (Root cause not yet diagnosed — treat as a known limitation.)
$RD is your react-doctor checkout — the one with the rule changes, at the
monorepo root (the CLI appends packages/react-doctor/...). $EVALS is the
react-doctor-evals checkout. Run every node dist/cli.js command from $EVALS.
Setup (every run)
export RD=~/projects/million/react-doctor # checkout with your rule changes (monorepo root)
export EVALS=~/projects/million/react-doctor-evals # the harness
cd $EVALS && git pull && pnpm install && pnpm build # stale builds break cloud workers
(cd $RD && pnpm build) # local mode needs the built bin
Cloud mode also needs $EVALS/.env.local with five vars — the harness won't
boot (or sandbox provisioning 403s) if any is missing:
| var | how to get it |
|---|---|
VERCEL_TOKEN |
vercel.com → Account Settings → Tokens (with access to the team). An empty/expired value provisions a 403 invalidToken. Sanity-check: curl -s -o /dev/null -w '%{http_code}' -H "Authorization: Bearer $TOKEN" https://api.vercel.com/v2/user → 200. |
VERCEL_TEAM_ID |
node setup.mjs, or orgId in $EVALS/.vercel/project.json. |
VERCEL_PROJECT_ID |
node setup.mjs, or projectId in $EVALS/.vercel/project.json. ⚠️ vercel env pull does not always write this one — copy it from project.json if absent. |
AXIOM_DATASET |
Axiom dashboard. Required even though traces only ship when NODE_ENV=production — the harness reads it at boot, so a missing value is a hard ConfigError, not a silent no-op. |
AXIOM_TOKEN |
Axiom dashboard → API token. |
Version specs
| spec | reaches cloud? | notes |
|---|---|---|
path:$RD |
NO — local only | your working tree incl. uncommitted edits; --runner local only |
git:https://github.com/millionco/react-doctor@<ref> |
yes | sandbox clones + turbo run builds the monorepo; ref must be pushed |
npm:<x.y.z> |
yes | published baseline |
path:does not work with--pool vercel: the Vercel worker only uploads the evals tree, never the react-doctor checkout. (EVAL.md Flow 1 is stale on this.) Rules live inoxlint-plugin-react-doctor, aworkspace:*dep of react-doctor, so you can't ship react-doctor alone —git:works because it builds the whole monorepo in the sandbox. For uncommitted changes in the cloud, push a scratch branch and usegit:.
--take N caps repos (default = all of repos.json, ~8.4k — always cap while
iterating). --dataset small|medium|large = 50 | 500 | 1000.
Local fast loop (no push)
node dist/cli.js run path:$RD --runner local --take 100
node dist/cli.js digest path:$RD --rule <rule> # all hits for your rule
node dist/cli.js digest path:$RD --json --rule <rule> > hits.json
Cloud loop (fan out)
# in $RD first: git push -u origin <branch>
B=git:https://github.com/millionco/react-doctor@main
C=git:https://github.com/millionco/react-doctor@<branch>
node dist/cli.js run "$B" --runner worker-pool --pool vercel --take 100
node dist/cli.js run "$C" --runner worker-pool --pool vercel --take 100
node dist/cli.js parity "$B" "$C" --verbose # +added / -removed by your branch
node dist/cli.js parity "$B" "$C" --json > parity.json # machine-readable
Force a fresh run (cache)
Results are cached per spec in $EVALS/.evals/<sanitized-spec>.jsonl. Re-running
the same spec logs N repos already processed. continuing… and serves the
cache as-is — including repos that previously errored, so a failed run blocks
its own retry and returns stale/empty results. To force a clean run, delete that
spec's file first:
# cloud (git: spec) — slashes/colons become dashes
rm "$EVALS/.evals/https---github-com-millionco-react-doctor-<branch>.jsonl"
# local (path: spec) — the absolute path with slashes → dashes
rm "$EVALS/.evals/-Users-...-<your-rd-checkout>.jsonl"
Tip: check wc -l on the file and confirm entries aren't all "error" before
trusting a digest. Pinning a commit SHA (git:…@<sha>) instead of a branch name
is also a fresh cache key.
Read the diff
+= your branch adds a diagnostic;-= drops one.- New rule → every hit is a
+. On an existing rule: net-negative = less noise (good if dropping false positives), net-positive = more findings (good if true positives). - Net 0 with identical files/lines = no functional change (likely message-text diffs only).
Validate hits, then fix
For each hit (or a sample per rule when counts are high):
git clone --depth 1 <url> /tmp/xat the pinnedref.- Read
filePath:line:column; is the pattern really there, and is the rule's call correct for that context? - Classify true positive / false positive.
- Add a regression test for every false positive, re-run, confirm it's gone.
Feed the counts (repos scanned, rootDir scans, target diagnostics, false positives found) into rule-validate's eval table.
Troubleshooting
checks skipped (likely OOM)— oxlint OOM'd in a sandbox; huge single-project repos can hit the 8 GB cap. Skip that repo.CreateSandboxError/InstallDepsError: npm install exited 1— usually a stale evals build shipped to the worker.cd $EVALS && git pull && pnpm install && pnpm build.ConfigError: Expected string … ["AXIOM_DATASET"]—.env.localis missingAXIOM_DATASET/AXIOM_TOKEN. Set both (required at boot even with tracing off).403 … "invalidToken": trueon sandbox create —VERCEL_TOKENis empty or expired. Refresh it; verify with thecurl … /v2/user → 200check above.Service not found: rde/WorkerPool— theruncommand's layer stack doesn't provide the pool service.Runner.layerWorkerPoolmustLayer.provide(WorkerPool.layer)(itsWorker+WorkerPoolConfigdeps are supplied by the command). Without it,--pool verceldies before any scan.- Whole run crashes mid-way on one repo (e.g.
ENOENT … pnpm-workspace.yaml→ schema-decode defect) — a single repo'sPlatformErrormust not be fatal. The per-repo invoke inRunner.runisolates defects withEffect.catchDefect(fold intoRepository.WithFailure); otherwise one odd monorepo takes down all N scans. - No Vercel creds — drop
--runner worker-pool --pool vercel; local is the default.