name: research-team description: "Milestone-based research-team workflow for theory+computation projects with reproducible artifacts, independent parallel workstreams (default: host-native subagents; configurable), and a strict convergence gate.\n"
Research Team (Lean Entry)
This is the trigger-loaded entry for the research-team skill.
For the full manual (English), see references/usage_guide.md.
For the Chinese manual (human-oriented), see references/usage_guide.zh.md.
For the KB index exporter docs (English), see references/kb_index.md.
When to use
Use research-team when you want a project workflow with:
- deterministic preflight gates (fail-fast),
- a human notebook (
research_notebook.md) plus a machine contract (research_contract.md), - reproducible artifacts (manifests/summaries/figures),
- and a strict 2-member convergence loop (Member A + Member B).
Workflow authority boundary
- Generic literature workflow authority does not live inside
research-team; it lives in the checked-inliterature-workflowsworkflow-pack (packages/literature-workflows/recipes/+ session protocol) and the checked-in public statefulautoresearch workflow-planfront door. research-teamconsumes that authority during prework / KB building and later evidence-oriented stages; it should not redefine provider-neutral literature workflow truth.scripts/bin/literature_fetch.pyis a source-adapter helper for INSPIRE/arXiv/Crossref/DataCite/GitHub/DOI and local KB preparation; when it needs workflow truth, it must call the checked-in front door or lower-level consumer path rather than restating recipe semantics locally.- When a literature pull is too shallow (KB notes left metadata-only, no cross-paper synthesis), the
deep-literature-reviewskill is the right surface: it consumes the same recipes, deep-reads sources to fill this skill's KB note template (with locators), synthesizes consensus/tensions/gaps into a checkableliterature_survey_v1, and hands the extracted claims toclaim-grounding.
Non-negotiable contracts (fail-fast)
- Strict convergence: if either member reports mismatch/fail/needs revision, you must fix and rerun until converged (or explicitly narrow/kill as
SCOPE/MATCHING). - Notebook split:
research_notebook.mdis the human entry;research_contract.mdis the machine-stable gate surface. - Reproducibility Capsule (mandatory):
research_contract.mdmust include a filled capsule block (between<!-- REPRO_CAPSULE_START -->and<!-- REPRO_CAPSULE_END -->). - Sweep semantics (mandatory): capsule must include
### G) Sweep semantics / parameter dependence (MANDATORY)(even if “no sweep”: declare baseline + held-fixed constants). - Branch semantics (mandatory when applicable): capsule must include
### H) Branch Semantics / Multi-root Contract (MANDATORY); if multi-root quantities exist (multiple solutions/branches), you must declare branches/assignment/outputs/invariants/diagnostics. - Method-validity preconditions (capsule
### J; mandatory whenever a recorded result depends on an implemented / discretized / projected / effective method precondition — novel or textbook alike; otherwise the section must statenot applicable: <reason>): when a result's validity rests on an operator/structural identity — an operator commuting with a projector/symmetrizer, Hermiticity, self-adjointness, idempotency, unitarity, positivity, or a variational/Galerkin subspace being invariant under the operator — the capsule's### J) Method-validity preconditionssection must (i) name the identity/property, (ii) give a disconfirming residual (non-zero iff the property fails), and (iii) report that residual at the exact configuration that produced the headline number (not only at the smallest/cheapest setting; if scale-invariance is itself claimed, add a residual scan across settings). For any eigenvalue/variational result from a projected or effective operator, report the true-operator residual‖Oψ − λψ‖ / ‖Oψ‖(with a documented norm; guard a near-zero denominator with a fixed scale) and the variance — not merely that ψ has the right symmetry. A precondition verified only at a smaller/cheaper setting than the result is NOT verified — discretization/finite-size artifacts (aliasing, grid parity, periodic wrapping) typically appear only above the minimal size. Backed by a fail-fast:check_reproducibility_capsule.pyrequires §J (filled or an explicitnot applicable: …) whenrequire_method_preconditionis enabled — so "MANDATORY" is enforced, not advisory. The novelty of the domain result is irrelevant; what matters is that the implemented operator carries the precondition. - Cross-check resolution honesty (mandatory): a cross-check whose tolerance/resolution is coarser than the discrepancy it must detect is non-diagnostic, not a pass. For every cross-check, record explicitly what it cannot resolve (e.g. an order-unity / scheme-ambiguous agreement cannot certify a finer absolute value, and cannot detect a non-variational shift smaller than its bracket). Never mark such a check "passed" for a property finer than its resolution.
- Scope-qualified confidence (mandatory): every confidence label on a computed / implemented / discretized result ("machine-precise", "converged", "verification-grade", "exact") must carry the scale/configuration at which it was established; an unqualified label next to a headline produced at a different configuration is a defect, not shorthand. For a purely symbolic / continuum / analytic claim the qualifier is the scope itself — label it "symbolic/continuum only" (no numerical scale needed), and never let that be read as covering a discretization/implementation.
- Falsification gate, not agreement gate: convergence requires, per precondition above, the smallest test that could falsify it plus evidence that test was actually run at the production configuration. Reviewer agreement on a derivation, or any check whose outcome was guaranteed before running it, does not substitute for executing the production-scale falsifiers.
- Reference reproduction (mandatory whenever a recorded result claims to reproduce / match a published value): a claim that a result reproduces / matches / agrees with a published reference value is earned by computing the claimed observable on a comparable state / regime / configuration and comparing to the published number numerically — not by a qualitative "same scale / same sign" assertion and not by citing the source. Compare term by term where the claim is term-level (a net total can agree while individual contributions are suppressed or sign-flipped); an order-of-magnitude same-direction discrepancy, or a sign reversal, is a finding, not convergence. Independently: any established cross-validation must not silently lapse — a structurally different-model engine, or a check valid only in a degenerate / limit regime, is labeled as a different-model / limit-regime comparison, never presented as validation, and the absence of an apples-to-apples independent check is recorded as an explicit limitation. Routed to
numerical-reliability-gateG8 and thereview-swarmreference-reproduction reviewer; the failure modes are catalogued inresearch-integrity(Reference-reproduction fidelity). - Pointer lint (mandatory): code pointers in the notebook must be resolvable under the configured
pointer_lint.strategy. - No silent retries: when a gate fails, stop, apply the minimal fix, rerun with a new tag (
M2-r2,M2-r3, ...). - Run artifact identity: the canonical project artifact root is
artifacts/runs/<run_id>/. Use a safe, sortable, readablerun_id, preferably<YYYYMMDDTHHMMSSZ>-<milestone>-<short-topic>-rN.team/runs/<tag>/is the research-team reviewer packet/log surface; it is not the project artifact SSOT unless the project explicitly mirrors or summarizes it underartifacts/runs/<run_id>/research_team/. - Tag relation: with
--auto-tag, pass a meaningful base tag such as20260502T023000Z-m3-branch-scan; the resolved<base>-rNis the research-team cycle tag and may be used as the control-planerun_idfor that reviewed cycle. Do not use bare UUIDs orrun_<uuid>as human-facing research tags.
Quick Start (3 commands)
Commands below stay install-location-portable by resolving the skill via
SKILL_DIR, with a host-neutral fallback that probes known agent skill homes (~/.claude,~/.codex,~/.config/opencode).
- Environment check (optional flags shown):
SKILL_DIR="${SKILL_DIR:-$(for r in "${CLAUDE_CONFIG_DIR:-$HOME/.claude}" "${CODEX_HOME:-$HOME/.codex}" "$HOME/.config/opencode"; do [ -d "$r/skills/research-team" ] && echo "$r/skills/research-team" && break; done || true)}"
bash "${SKILL_DIR}/scripts/bin/check_environment.sh" --require-codex
# or (if you explicitly want A=Claude, B=Gemini):
# bash "${SKILL_DIR}/scripts/bin/check_environment.sh" --require-claude --require-gemini
- Scaffold the workflow into a project repo:
SKILL_DIR="${SKILL_DIR:-$(for r in "${CLAUDE_CONFIG_DIR:-$HOME/.claude}" "${CODEX_HOME:-$HOME/.codex}" "$HOME/.config/opencode"; do [ -d "$r/skills/research-team" ] && echo "$r/skills/research-team" && break; done || true)}"
bash "${SKILL_DIR}/scripts/bin/scaffold_research_workflow.sh" \
--root /path/to/project \
--project "My Project" \
--profile mixed \
--full
Scaffold creates prompts/_system_member_a.txt and prompts/_system_member_b.txt (note the leading underscore; they are copied from the skill assets system_member_a.txt / system_member_b.txt).
Use --full when you want those research-team host-local assets immediately; the default scaffold stays minimal.
The public scaffold and contract-refresh entrypoints now run in real_project mode: use an external project root, and keep real-project run/intermediate outputs outside the autoresearch-lab development repo. Internal maintainer fixtures remain a lower-level contract mode only, not part of the public workflow.
- Run a team cycle from the project root:
cd /path/to/project
SKILL_DIR="${SKILL_DIR:-$(for r in "${CLAUDE_CONFIG_DIR:-$HOME/.claude}" "${CODEX_HOME:-$HOME/.codex}" "$HOME/.config/opencode"; do [ -d "$r/skills/research-team" ] && echo "$r/skills/research-team" && break; done || true)}"
bash "${SKILL_DIR}/scripts/bin/run_team_cycle.sh" \
--tag 20260502T023000Z-m0-topic --auto-tag \
--notes research_contract.md \
--out-dir team \
--member-a-system prompts/_system_member_a.txt \
--member-b-system prompts/_system_member_b.txt
Tip: add --preflight-only to run deterministic gates without calling external LLMs.
By default, Member A and Member B should be assigned through the current host agent's official subagent mechanism with config-derived reasoning depth. run_team_cycle.sh keeps CLI compatibility runners for shell-only environments; use --member-a-runner-kind / --member-b-runner-kind or research_team_config.json only when you explicitly want a provider-specific CLI runner.
Keep --out-dir on a real-project path as well; do not point real-project team outputs back into the development repo.
The command above writes reviewer-cycle packets and logs under team/runs/<tag>/.
Durable research outputs and claims should point to the canonical project root
artifacts/runs/<run_id>/; when the team cycle is evidence for that run, record
or summarize it under artifacts/runs/<run_id>/research_team/ and keep the
team/runs/<tag>/ paths as reviewer provenance.
Workspace disk policy: team/runs/<tag>/workspaces/ contains per-member
project snapshots that the reviewer subprocess runs against. They can grow
large (full project tree × 2 members × N runs). They are ephemeral scratch
space, not durable artifacts — every piece of forensic data needed to audit
a run lives at the run_dir top level (cycle_state.json, <tag>_member_*.md,
member_*_evidence.json, member_*_audit.jsonl, logs/member_*/).
run_team_cycle.shdeletesworkspaces/automatically when the cycle completes successfully. SetRESEARCH_TEAM_KEEP_WORKSPACES=1to disable.- On failure the workspaces are preserved by default for debugging. Set
RESEARCH_TEAM_KEEP_WORKSPACES_ON_FAILURE=0to also clean up on failure. - At the start of every new cycle,
run_team_cycle.shalso sweeps orphaned workspaces from any earlier cycle whoseon_exittrap could not fire (SIGKILL / OOM kill / power loss). The startup sweep only deletes workspaces of clean successful exits (statuscompleted|converged|early_stop|preflight_only) and only when the workspace mtime is at least 30 minutes old. SetRESEARCH_TEAM_KEEP_WORKSPACES_AT_STARTUP=1to disable the sweep. - To reclaim disk on existing projects whose old runs still carry workspaces,
use the prune utility. Defaults are
--keep-last 0,--keep-failedunset, and dry-run; pass--applyto actually delete and any combination of the flags below to widen what survives:
The tool only touches# Dry-run preview of all eligible workspaces (deletes nothing). python3 "${SKILL_DIR}/scripts/bin/prune_team_workspaces.py" --root /path/to/project # Apply, preserving the 3 most-recent runs and any failed runs: python3 "${SKILL_DIR}/scripts/bin/prune_team_workspaces.py" --root /path/to/project \ --keep-last 3 --keep-failed --applyteam/runs/<tag>/workspaces/subdirectories; all forensic data at the run_dir top level is preserved. Pass--jsonto emit a machine-readable plan plus, after--apply, aresultenvelope. - For projects that run cycles in an unattended loop, either keep the
RESEARCH_TEAM_KEEP_WORKSPACES_ON_FAILURE=1default and schedule a periodicprune_team_workspaces.py --root <project> --keep-last N --keep-failed --applysweep (cron / launchd / scheduled task), or setRESEARCH_TEAM_KEEP_WORKSPACES_ON_FAILURE=0to clean up failures inline once the loop is known healthy.
Capabilities index (discoverability)
- Team cycle (core):
scripts/bin/run_team_cycle.sh(preflight → A/B → convergence). - Draft (TeX) review cycle:
scripts/bin/run_draft_cycle.sh(TeX-source-first; optional 3-party convergence). - Autopilot:
scripts/bin/run_autopilot.sh(plan autofill + loop coordinator; usesscripts/bin/autopilot_loop.py). - Packet build only:
scripts/bin/build_team_packet.py,scripts/bin/build_draft_packet.py. - Literature fetch (INSPIRE/arXiv/Crossref/DataCite/DOI/GitHub):
scripts/bin/literature_fetch.py(project-leader source-adapter helper for prework/KB building; reviewers must not use network).- Generic literature workflow sequencing authority lives in
literature-workflowsrecipes / session protocol plus the checked-in public front door, not in this script. - Use
python3 "${SKILL_DIR:-$(for r in "${CLAUDE_CONFIG_DIR:-$HOME/.claude}" "${CODEX_HOME:-$HOME/.codex}" "$HOME/.config/opencode"; do [ -d "$r/skills/research-team" ] && echo "$r/skills/research-team" && break; done || true)}/scripts/bin/literature_fetch.py" workflow-plan ...when you need the lower-level literature workflow plan consumer during skill-side prework. - Literature/reference/knowledge-evidence work must maintain both
knowledge_base/methodology_traces/literature_queries.mdandknowledge_base/methodology_traces/literature_saturation.json; a single result page or fixed paper count is not a completion criterion. - Subcommands (arXiv):
arxiv-search,arxiv-get --write-note,arxiv-source(syntax:python3 "${SKILL_DIR:-$(for r in "${CLAUDE_CONFIG_DIR:-$HOME/.claude}" "${CODEX_HOME:-$HOME/.codex}" "$HOME/.config/opencode"; do [ -d "$r/skills/research-team" ] && echo "$r/skills/research-team" && break; done || true)}/scripts/bin/literature_fetch.py" <subcommand> ...; downloads LaTeX source toreferences/arxiv_src/<arxiv_id>/by default).
- Generic literature workflow sequencing authority lives in
- Export a portable bundle:
scripts/bin/export_paper_bundle.sh(wrapper) /scripts/bin/export_paper_bundle.py. - KB index export (deterministic/L1):
scripts/bin/kb_export.py+scripts/bin/validate_kb_index.py+scripts/schemas/kb_index.schema.json. - Demo generation:
scripts/bin/generate_demo_milestone.sh. - Project kickstart prompt:
scripts/bin/generate_project_start_prompt.py. - Deterministic hygiene tools (as needed):
scripts/bin/fix_markdown_*,scripts/bin/fix_bibtex_revtex4_2.py,scripts/bin/upgrade_reference_anchors.py; use the standalonemarkdown-hygieneskill for manual Markdown math/TOC cleanup outside a team-cycle preflight. - Claim DAG & evidence (optional): render via
autoresearch graph --kind claims(the domain-neutral@autoresearch/shared/graph-vizfront door; auto-rendered best-effort toknowledge_graph/at convergence when anautoresearchCLI is reachable) + gates underscripts/gates/. - Roadmap dependency-map (plan-summary / milestone-handoff):
assets/roadmap_dependency_map_template.md+autoresearch graph --kind roadmap(a planning view of milestones/lanes; complements — does not replace — the Claim DAG, see below). - Exploration stage debt helper:
scripts/bin/exploration_debt_dashboard.py. - Scaffold pruning (move/archive optional files):
scripts/bin/prune_optional_scaffold.py. - Environment snapshot:
scripts/bin/capture_env_snapshot.sh. - Lifecycle updates:
scripts/bin/update_project_map.py,scripts/bin/update_research_plan_progress.py,scripts/bin/update_trajectory_index.py. - Secondary utilities (advanced; see
references/usage_guide.md):- Autofill:
scripts/bin/auto_fill_prework.py,scripts/bin/auto_fill_research_plan.py - Tag helpers:
scripts/bin/next_team_tag.py,scripts/bin/next_draft_tag.py - Claim gates:
scripts/bin/auto_enable_claim_gates.py - Post-run helpers:
scripts/bin/summarize_team_reports.py,scripts/bin/validate_evidence.py - Diagnostics/hygiene:
scripts/bin/check_md_double_backslash.sh,scripts/bin/check_low_order_quadrature_usage.py,scripts/bin/discover_latex_zero_arg_macros.py,scripts/bin/format_kb_reference_links.py - Adjudication:
scripts/bin/build_adjudication_response.py - Member review (debug):
scripts/bin/run_member_review.py - Internal helpers:
scripts/bin/team_cycle_*.py(used byrun_team_cycle.sh; usually not called directly)
- Autofill:
Plan-summary / milestone-handoff: roadmap dependency-map
At a plan-summary or milestone-handoff moment (communicating a multi-phase
plan to a stakeholder, closing out a milestone, or handing off), produce a
roadmap dependency-map from assets/roadmap_dependency_map_template.md. It is
a one-page planning view with five parts: (1) a roadmap summary table (per
milestone/lane: status · effort estimate with uncertainty · resource/compute
cost · upstream deps · unlocks); (2) a milestone/lane dependency graph where node
fill encodes status and edge type encodes dependency kind (solid = hard "unlocks";
dashed = soft "feeds into"), with the critical path marked; (3) a binding-constraint
callout (the single hardest resource/feasibility limit, with its scaling); (4) a
critical-path recommendation (minimal ordered chain + what is parallelizable +
"later upgrade ≠ prerequisite"); (5) honest estimate discipline (numbers are
estimates with stated uncertainty, distinct from measurements).
Render the graph through the autoresearch graph --kind roadmap --spec <roadmap.json>
front door (consumes the @autoresearch/shared/graph-viz engine: always writes DOT;
optional PNG/SVG only if Graphviz is installed). This is a planning view and is
intentionally distinct from the Claim DAG (knowledge_graph/, which encodes
what we believe — claims + evidence): it reuses the Claim DAG's rendering
conventions but shares no input files and must not be conflated with it.
Deep dive (read only when needed)
- Full manual (English):
references/usage_guide.md - Chinese manual (human-oriented):
references/usage_guide.zh.md - KB index exporter (English):
references/kb_index.md - Troubleshooting / rerun recipes:
RUNBOOK.md - Gate contract notes:
FULL_VALIDATION_CONTRACT.md - Artifact contract:
references/artifact_contract.md