systematic-review - SKILL.md Agent Skill

name: systematic-review description: Use when planning or running an end-to-end literature review with this framework. Guides question framing, search-term design, PRISMA/PRISMA-S reporting, config drafting, pilot sampling, QA gates, rule versioning, PDF handling, and failure-mode safeguards.

Use this skill when the user wants to set up, refine, or run a literature review workflow in this repo.

Start with the review question, scope, audience, inclusion/exclusion criteria, and the minimum acceptable audit trail.
Work from review.example.toml and fill in source, stage, model, QA, and parser settings instead of inventing ad hoc commands.
Require a pilot run before any large review stage. The pilot must have explicit manual QA size and pass threshold.
Keep prompts/rules versioned in SQLite via init-db or register-rules; config should select rule sets and versions, not serve as the long-term prompt ledger.
Prefer conservative screening. If the record is ambiguous or abstract-free, bias to maybe.
Never write model-generated summaries into the canonical abstract field.
Use PRISMA 2020 and PRISMA-S as the reporting baseline. For study-selection/data-collection expectations, use the Cochrane references below.

Read workflow for the end-to-end sequence.
Read prisma when the user needs methodology/reporting guidance.
Read safeguards before finalizing prompts, QA gates, or abstract-recovery workflows.
Read config when drafting or editing TOML.

uv run --project literature_review literature-review init-db --config literature_review/review.example.toml
uv run --project literature_review literature-review ingest-manual --config literature_review/review.example.toml --file literature_review/examples/unicellular_learning/sample_records.jsonl
uv run --project literature_review literature-review sample-review --config literature_review/review.example.toml --stage title_abstract --seed 7
uv run --project literature_review literature-review qa-import-labels --config literature_review/review.example.toml --run-id <RUN_ID> --labels literature_review/examples/unicellular_learning/sample_labels.jsonl --reviewer human
uv run --project literature_review literature-review qa-evaluate --config literature_review/review.example.toml --run-id <RUN_ID> --min-accuracy 0.9
uv run --project literature_review literature-review commit-run --config literature_review/review.example.toml --run-id <RUN_ID>

Single-agent review plus human QA is the default.
Escalate the maybe queue with a stronger model or multi-model voting only after the baseline pilot is satisfactory.
Keep the skill concise. Load the reference files only as needed.