testing-d-research-scripts

name: testing-d-research-scripts description: Verify the d-research-skill helper scripts (Node + Python). Use after editing any file under `scripts/`, after regenerating a script, or before publishing a new version of the skill.

Testing D Research Skill Scripts

When to use this sub-skill

Run these checks whenever you have:

edited a file under scripts/
regenerated or replaced a helper script
bumped a dependency in package.json
prepared a release of d-research-skill

The same checks run automatically in CI on every pull request (see "CI" below), so local runs are mainly to fail fast before pushing.

Prerequisites

Node.js 18+ (for *.mjs scripts and npm run)
Python 3.9+ (for *.py scripts; stdlib only — no pip install needed)
pandoc >= 2.11 (only for the citation_render.py self-test; the script degrades cleanly when pandoc is missing)
Optional: npx playwright install — only needed for real-world browser runs, not for any self-test

No external API keys are required. Every self-test runs offline.

Quick validation: one command

From the repo root:

npm run self-test

This is the same chain CI runs. It executes every script's offline self-test in sequence and exits non-zero on the first failure. Pass criteria: exit code 0 and the final command (check_internal_refs.py) prints OK: all backticked internal refs resolve.

Quick validation: individual scripts

If you want to isolate a failure, run scripts one at a time. The repo ships 35 files in total: 33 research helpers (each with an offline --self-test: Python research utilities, 6 top-level Node scripts, plus 1 Node helper at scripts/lib/http_cache.mjs) plus 2 pre-commit utility scripts (check_node_syntax.py, check_no_plan_files.py) that run as checks rather than self-tests. run_python.mjs is a thin wrapper.

Node scripts (6 top-level + 1 helper)

node scripts/playwright_probe.mjs   --self-test   # → "playwright_probe self-test ok"
node scripts/playwright_extract.mjs --self-test   # → "playwright_extract self-test ok"
node scripts/playwright_crawl.mjs   --self-test   # → "playwright_crawl self-test ok"
node scripts/api_fetch.mjs          --self-test   # → 4× "✓ PASS" (parseArgs, Link header, cursor, offset)
node scripts/lib/http_cache.mjs     --self-test   # → "http_cache.mjs self-test ok"
node scripts/web_search.mjs         --self-test   # → "web_search self-test ok"

Python scripts (24)

python3 scripts/evidence_ledger.py     self-test   # → "evidence_ledger self-test ok" (incl. tamper detection)
python3 scripts/data_clean.py          self-test   # → "ALL TESTS PASSED" (5 subtests: clean/stats/dedup/validate/merge)
python3 scripts/citation_export.py     self-test   # → "All self-tests passed!" (6 subtests)
python3 scripts/citation_render.py     self-test   # → "All self-tests passed!" (incl. pandoc integration)
python3 scripts/extract_tables.py      self-test   # → "All self-tests passed!" (5 subtests)
python3 scripts/score_source.py        self-test   # → "All self-tests passed!" (4 subtests)
python3 scripts/research_plan.py       self-test   # → "OK: research_plan self-test passed (NN sub-tests)."
python3 scripts/run_dogfood.py         self-test   # → "OK: eval benches valid; dogfood-bench.json: 12 tasks, frontier-bench.json: 52 tasks."
python3 scripts/pdf_extract.py         self-test   # → "pdf_extract self-test ok"
python3 scripts/wayback.py             self-test   # → "wayback self-test ok"
python3 scripts/wikidata.py            self-test   # → "wikidata self-test ok"
python3 scripts/social_snapshot.py     self-test   # → "social_snapshot self-test ok"
python3 scripts/citation_resolver.py   self-test   # → "citation_resolver self-test ok"
python3 scripts/report_render.py       self-test   # → "report_render self-test ok"
python3 scripts/ocr.py                 self-test   # → "ocr self-test ok"
python3 scripts/translate.py           self-test   # → "translate self-test ok"
python3 scripts/embed_corpus.py        self-test   # → "embed_corpus self-test ok"
python3 scripts/citation_graph.py      self-test   # → "citation_graph self-test ok"
python3 scripts/multi_extract.py       self-test   # → "multi_extract self-test ok"
python3 scripts/dedup_near.py          self-test   # → "dedup_near self-test ok"
python3 scripts/http_cache.py          self-test   # → "http_cache self-test ok"
python3 scripts/bench_harness_check.py self-test   # → "bench_harness_check self-test ok"
python3 scripts/run_metadata.py        self-test   # → "run_metadata self-test ok"
python3 scripts/harvest_terms.py       self-test   # → "harvest_terms self-test ok"
python3 scripts/check_internal_refs.py             # → "OK: all backticked internal refs resolve."
python3 scripts/check_internal_refs.py --decision-tree   # → "OK: every references/*.md is reachable from the decision tree."

Pre-commit utility scripts (checks, not self-tests)

Two stdlib helpers exist solely to drive cross-platform pre-commit hooks. They have no self-test subcommand because they are checks themselves.

python3 scripts/check_node_syntax.py                       # → runs `node --check` on every .mjs; exit 0 on success
python3 scripts/check_no_plan_files.py README.md package.json   # → exit 0 (no PLAN file in the list)
python3 scripts/check_no_plan_files.py PLAN-foo.md         # → exit 1, prints "blocked: PLAN-foo.md"

On Windows, use python if python3 is not on PATH.

Pass criteria (universal)

Exit code 0 for every command
Output contains a positive marker: ok, PASS, ALL TESTS PASSED, All self-tests passed!, or OK: …
No Python tracebacks, no FAIL, no unhandled-promise warnings from Node

The evidence_ledger.py self-test intentionally exercises the tamper-detection path; a TAMPER DETECTED line in the middle of its output is expected and is followed by the success marker.

Real-world smoke tests (optional)

These hit live public APIs. Use them to verify network paths after a script change, not as gates for CI.

`api_fetch.mjs` — OpenAlex

node scripts/api_fetch.mjs \
  --url "https://api.openalex.org/works?search=machine+learning&per_page=5" \
  --max-pages 1 \
  --out openalex.json

Expected: openalex.json is a JSON array; each item has id, title, and (usually) doi.

`data_clean.py` — CSV dedup

python3 scripts/data_clean.py clean --file input.csv --out cleaned.csv

Expected: duplicates collapsed, whitespace normalized, ISO 8601 dates.

`citation_export.py` — BibTeX export

python3 scripts/citation_export.py export \
  --file evidence.csv --format bibtex --out refs.bib

Expected: refs.bib contains @misc{ (or @article{) entries with title = {…} and url = {…} fields.

For the full list of npm shortcuts (npm run probe, npm run plan:render, npm run citation:render, …) see the "npm scripts" section of README.md.

CI

Two GitHub Actions workflows replicate these checks on every pull request:

.github/workflows/lint-and-self-test.yml
- ruff check scripts/ — Python lint
- node --check on every scripts/*.mjs — JS syntax
- npm run self-test — every offline self-test (with pandoc installed for citation_render)
.github/workflows/link-check.yml
- scripts/check_internal_refs.py — backticked in-repo path references
- lychee --offline on all markdown — standard [text](url) link integrity
- A weekly lychee-external job (non-blocking) validates external URLs

If any of the 33 research-helper self-tests (or the four supplementary checks: check_internal_refs.py, check_internal_refs.py --decision-tree, check_node_syntax.py, check_no_plan_files.py) fail locally, the same failure will block the PR. Fix locally before pushing.

Common failure modes

Symptom	Likely cause	Fix
`ImportError` or `ModuleNotFoundError`	Generated script missing a stdlib import	Add the import; re-run `python3 -c "import py_compile; py_compile.compile('scripts/<name>.py', doraise=True)"`
`ERR_UNKNOWN_FILE_EXTENSION ".py"`	Tried to invoke `.py` with `node` directly	Use `python3` (or `node scripts/run_python.mjs scripts/<name>.py …`)
Pandoc-related FAIL in `citation_render`	Pandoc not installed or `< 2.11`	Install pandoc; the self-test will skip the pandoc-dependent subtest if pandoc is genuinely missing
`playwright_*` self-test hangs	Real browser launch attempted	Self-tests must not require a browser; check the script wasn't edited to drop the offline branch
`check_internal_refs.py` reports a missing path	A markdown file backticks an in-repo path (e.g. a reference, adapter, template, or script) that no longer exists	Update the reference, restore the file, or remove the link
Eval bench schema error	New task missing required keys or frontier/refusal rule violation	Compare against an existing task; required keys are listed in `docs/eval.md`

Adding a new script

When you add a new helper to scripts/:

Implement an offline self-test (Python) or --self-test (Node) subcommand. CI runs offline, so any network dependency must degrade cleanly.
Append the new self-test to the chained self-test script in package.json so it runs in CI.
Add the script to SKILL.md's "Optional bundled scripts" list and link it from a reference doc.
Update the script-count notes in README.md if the total changes.
Re-run npm run self-test locally before opening a PR.

See CONTRIBUTING.md for the full conventions (argparse, shebangs, error formatting, etc.).

name: testing-d-research-scripts description: Verify the d-research-skill helper scripts (Node + Python). Use after editing any file under scripts/, after regenerating a script, or before publishing a new version of the skill.