name: hypothesis-review-pipeline description: Run the decomposed review pipeline for a single hypothesis and persist each review stage as a structured artifact.
hypothesis-review-pipeline
Goal:
- Run the decomposed review pipeline for a single hypothesis and persist each review stage as a structured artifact.
Inputs:
ResearchPlanMetaReviewHypothesis- optional resume context from
state/PIPELINE_STATE.json
Outputs:
hypotheses/<id>/REVIEW/INITIAL_REVIEW.jsonhypotheses/<id>/REVIEW/FULL_REVIEW.jsonhypotheses/<id>/REVIEW/DEEP_VERIFICATION.jsonhypotheses/<id>/REVIEW/OBSERVATION_REVIEW.jsonhypotheses/<id>/REVIEW/SIMULATION_REVIEW.jsonhypotheses/<id>/REVIEW/REVIEW_SUMMARY.jsonliterature/queries/<query_id>/*when full review or deep verification needs external literature evidence- updated
hypotheses/<id>/HYPOTHESIS.json - updated
state/PIPELINE_STATE.json - updated
state/CURRENT_STAGE.json
Sub-skills:
hypothesis-initial-reviewliterature-searchhypothesis-full-reviewhypothesis-deep-verificationhypothesis-observation-reviewhypothesis-simulation-reviewhypothesis-review-summary
Context Loading:
- Open
skills/shared-references/schema-index.md. - Read
packages/agent_contracts/review.pybefore writing any per-stage review artifact underhypotheses/<id>/REVIEW/. - Read
packages/agent_contracts/hypothesis.pybefore updating the embedded review payload insidehypotheses/<id>/HYPOTHESIS.json. - Read
packages/agent_contracts/literature.pyandskills/shared-references/literature-search-contract.mdbefore dispatchinghypothesis-full-revieworhypothesis-deep-verification. - Read
packages/agent_contracts/pipeline_runtime.pybefore updatingstate/PIPELINE_STATE.jsonorstate/CURRENT_STAGE.json. - Read
skills/shared-references/codex-reviewer-routing.mdbefore using any optional Codex reviewer subagent route. - Read
RUN_POLICY.yamlbefore dispatching review work so the activereview_rigorsetting is honored. - Read the current hypothesis, research plan, and any optional run-level review guidance before starting the stage sequence.
Execution Contract:
- This pipeline skill does not own hidden prompt templates.
- It orchestrates atomic review skills so the runtime can trace each review stage independently.
- Downstream atomic review skills remain responsible for the exact field-level contents of each standalone review artifact.
- The canonical
hypotheses/<id>/HYPOTHESIS.jsonartifact must be updated after every review stage so the embeddedreviewpayload stays in sync with the per-stage review files. - Use
from tools import sync_hypothesis_reviewafter each stage artifact write. The canonical synchronization helper is implemented inpackages/run_artifacts/review_sync.py; do not hand-edit embedded review JSON. - Use
from tools import sync_pipeline_stage_artifactsas the canonical paired write surface for entering theReflectionsubstage. - Treat
tools.sync_hypothesis_review(run_dir, hypothesis_id)as a hard gate. If it fails, stop the review sequence immediately instead of continuing with later stages. - On entry, call
tools.sync_pipeline_stage_artifacts(run_dir, current_phase="Reflection", current_skill="hypothesis-review-pipeline")before starting the review sequence. - When consumed by a host agent, validate both the embedded hypothesis review payload and the per-stage review artifacts. If validation fails, stop and repair the review bundle before running the next stage or returning control.
- Full review and deep verification must use
tools.search_literature(run_dir, request)or an existing matchingEvidenceBundleContractbefore making external literature support claims. - The review pipeline must not accept prompt-invented paper metadata or model-memory citations as formal evidence.
- The review pipeline must not accept empty completed review artifacts. A completed or passing stage must contain concrete, hypothesis-specific findings that downstream evolution and ranking can use.
- Do not use placeholder review phrases such as
Viable evolved hypothesis,Refined from parent,Must outperform parent,Synthesize evolved catalyst, orBenchmark against parentas the substantive content of any completed review artifact. - Codex reviewer subagents are optional. If unavailable, run the same review contract in the main thread and record
reviewerRoute = local_main_threadin the trace. - If a Codex reviewer subagent is used, record
state/agent_traces/codex/<skill>/<timestamp>.json; the trace is audit evidence and must not replace canonical review artifacts. - Reviewer subagents must not write deterministic mechanics artifacts or bypass
tools.sync_hypothesis_reviewand schema validation. - Regardless of
review_rigor, run the full review stack for every hypothesis:hypothesis-full-reviewhypothesis-deep-verificationhypothesis-observation-reviewhypothesis-simulation-review
review_rigormay shorten or deepen the reasoning inside those stages, but it must not disable any review artifact.
Execution Steps:
- Open
skills/shared-references/schema-index.mdandskills/shared-references/codex-reviewer-routing.md, then readpackages/agent_contracts/review.py,packages/agent_contracts/hypothesis.py,packages/agent_contracts/literature.py, andpackages/agent_contracts/pipeline_runtime.pybefore writing any per-stage review artifact, embedded review state, literature evidence artifact, reviewer trace, or run-level stage artifact. - Read the research plan, current hypothesis, optional meta-review guidance, and active
RUN_POLICY.yaml. - Before dispatching atomic review work, call
tools.sync_pipeline_stage_artifacts(run_dir, current_phase="Reflection", current_skill="hypothesis-review-pipeline"). - Run
hypothesis-initial-reviewand persist the resulting stage artifact. - Immediately call
tools.sync_hypothesis_review(run_dir, hypothesis_id)to synchronizereview.initial_reviewback intohypotheses/<id>/HYPOTHESIS.json. If sync fails, stop here and repair the artifact instead of continuing. - Determine the required reasoning depth from
review_rigor, then run the full remaining atomic review stack for the hypothesis. Whenhypothesis-full-revieworhypothesis-deep-verificationneeds external literature evidence, require those skills to calltools.search_literature(run_dir, request)or consume an existing matching evidence bundle. - After each later review stage, call
tools.sync_hypothesis_review(run_dir, hypothesis_id)again so the standalone stage artifact and aggregate hypothesis artifact remain consistent. If sync fails, stop before the next stage. - Run
hypothesis-review-summaryafter the required review stages finish. - Run
python -m tools.validation.contract_validation <run_dir> --skill hypothesis-review-pipeline. If validation fails, stop and repair the review bundle before declaring completion or returning control.
Completion Rule:
- This skill is complete only when all review-stage artifacts exist, the embedded review payload in
hypotheses/<id>/HYPOTHESIS.jsonmatches those stage artifacts, and the resulting files validate for downstream routing.