rh-inf-discovery

─────────────────────────────────────────────────────────────────────────────

REQUIRED FRONTMATTER — Level 1 disclosure (loaded into system prompt at startup)

─────────────────────────────────────────────────────────────────────────────

name: "rh-inf-discovery" description: > Interactive research assistant for healthcare informatics evidence discovery. Searches PubMed, PMC, ClinicalTrials.gov, and curated government/society sources to build an evidence-based discovery plan (L1 → L2 lifecycle stage). Modes: plan · verify. compatibility: "rh-skills >= 0.1.0" context_files:

reference.md # domain advice checklist, source taxonomies, US gov coverage
examples/plan.yaml # worked example: diabetes-ccm discovery-plan.yaml (structured sources)
examples/readout.md # worked example: diabetes-ccm discovery-readout.md (domain narrative)
examples/output.md # worked example: plan transcript excerpt metadata: author: "RH Skills" version: "1.0.0" source: "skills/.curated/rh-inf-discovery/SKILL.md" lifecycle_stage: "l1-discovery" reads_from:
- RESEARCH.md
- sources/ reads_via_cli:
- "rh-skills search pubmed"
- "rh-skills search pubmed --offline"
- "rh-skills search pmc"
- "rh-skills search clinicaltrials"
- "rh-skills source scan"
- "rh-skills validate --plan" writes_via_cli:
- "rh-skills source add"
- "rh-skills source download --url"

Overview

rh-inf-discovery is the L1 evidence discovery stage of the lifecycle. It guides a clinical informaticist through finding, evaluating, and documenting evidence-based source material for a clinical research area. The result is a topic-scoped discovery-plan.yaml (structured source list, the single source of truth), discovery-readout.md (generated domain narrative), and downloaded open-access source files. rh-inf-ingest consumes these to normalize, classify, and annotate sources for the same topic.

The skill acts as an interactive research assistant — it does not stop at a single search pass. After each pass the agent explicitly prompts the user with expansion suggestions and awaits direction. The plan is a living document written to disk only when the user approves it.

Discovery is topic-centered. Discovery runs against a concrete topic and writes its artifacts under that topic's process/plans/ directory. The optional --domain <label> flag is descriptive metadata used to frame domain advice and search strategy; it does not replace the topic identity.

Guiding Principles

Discovery is pure research. All searches are delegated to rh-skills CLI commands. All clinical reasoning, source evaluation, and research synthesis happen in this skill.
Download at the end, not during research. Source files are only downloaded after the user approves and saves the plan (Step 12). The plan can be freely re-run, revised, and reviewed before any acquisition occurs.
Single source of truth. discovery-plan.yaml is the authoritative source list. discovery-readout.md is a generated narrative derived from it and should never be edited directly.
Always close the loop. Every response must end with a status block followed by a friendly user prompt and lettered next-step options. The user must never have to ask "what do I do next?"

User Input

$ARGUMENTS

Inspect $ARGUMENTS before proceeding. The first word is the mode (plan or verify). Discovery runs against a required <topic> positional argument. The optional --domain <label> flag names the clinical research area used to frame domain advice and search strategy.

Mode	Arguments	Example
`plan`	`<topic> [--domain <label>] [--force]`	`plan diabetes-ccm --domain diabetes`
`verify`	`<topic>`	`verify diabetes-ccm`

Mode Defaulting: If the first token is not plan or verify, treat the input as plan <topic> ....

<topic>: required kebab-case lifecycle identifier for artifact location and tracking context.
--domain <label>: optional freeform research area label used for domain advice, search framing, and plan/readout metadata. If omitted, infer from the topic/context.
--force: overwrite existing discovery-plan.yaml and discovery-readout.md (plan mode only).
If the mode is unrecognized (neither plan nor verify), print this table and exit with an error.

Pre-Execution Checks

For plan mode only: check whether a plan already exists for the topic:

ls topics/<topic>/process/plans/discovery-plan.yaml 2>/dev/null

If it exists and --force was not passed: warn the user, offer to load it for continuation or start fresh with --force. Wait for the user's choice.
If it exists and --force was passed: proceed, overwriting on save.
If it does not exist: proceed normally.

When continuing an existing plan: Load the saved plan into memory as the starting sources[] list. Resume at Step 1 to refresh domain advice, then run Steps 2–5 to fill any identified gaps. Always run the full Steps 7–10 before saving — source review table, access advisories (for any authenticated/manual sources), expansion suggestions, and interactive prompt. Do not skip these steps even when the session prompt is task-framed ("add them and re-validate") or when running in an automated eval context.

Mode: `plan`

The plan is an interactive research loop. Complete all steps; prompt the user after each major pass before proceeding.

Step 0 — Scan for Manually Placed Files

Before starting searches, check whether the user has pre-placed any source files (PDFs, CSVs, etc.) in the sources/ directory:

rh-skills source scan

Interpret the output:

UNTRACKED files — the user has placed a file that is not yet in the plan. Read the filename and use head -20 sources/<file> (for text/CSV) to infer content. Preview each as an access: manual source entry via rh-skills source add --dry-run --access manual ... to generate and review the YAML snippet. Hold these entries in memory — do not use --append-to-plan here because no plan file exists yet. They are merged into sources[] when the plan is written in Step 11. Infer --type from extension: .pdf/.md/.txt → document or clinical-guideline; .csv/.tsv/.xlsx → dataset.
SHA-CHANGED files — a previously tracked file has changed on disk. Warn the user and suggest re-running ingest to pick up changes.
TRACKED files — already registered; nothing to do.

If sources/ does not exist or is empty, skip this step silently and proceed to Step 1.

Emit status block:

  Step:   0 — Manual File Scan · Complete
  Found:  <N> untracked, <N> SHA-changed, <N> tracked
  Plan:   <N> sources in memory
  Next:   Step 1 — Domain Advice

Step 1 — Domain Advice

Read reference.md → Domain Advice Checklist. Present domain-specific guidance for the research area, addressing all checklist items relevant to the domain:

CMS program alignment (eCQMs, MIPS/QPP, Medicaid, CMMI models)
SDOH relevance (Gravity Project domain taxonomy)
Health equity considerations
Existing quality measure landscape
Key terminology systems likely needed (SNOMED, LOINC, ICD-10, RxNorm)
Health economics angle (cost of care, disease burden) — required when the topic involves a chronic condition, preventive intervention, or CMS program

This advice is captured in the Domain Advice section of discovery-readout.md (the generated narrative file) and guides which sources to prioritize.

Emit status block:

  Step:   1 — Domain Advice · Complete
  Plan:   0 sources in memory
  Next:   Step 2 — run PubMed/PMC searches

Step 2 — PubMed Search

rh-skills search pubmed --query "<terms>" --max 20 --json
rh-skills search pmc   --query "<terms>" --max 20 --json

Choose search terms based on the topic name, clinical synonyms, and domain advice from Step 1. If --domain <label> was provided, use it to refine the search framing and readout metadata. Run at least one PubMed search AND one PMC search per plan.

Parse the JSON results. For each result evaluate:

Relevance to the topic
Evidence level (see reference.md Evidence Level Taxonomy)
Access: PMC articles are open; PubMed abstracts are open (URL to abstract); full-text may require institutional access → authenticated

If the search commands fail (e.g. network/DNS error in a sandboxed environment), retry with --offline to get reference links and record the query, then continue with domain-knowledge-based source selection:

rh-skills search pubmed --offline --query "<terms>"
rh-skills search pmc    --offline --query "<terms>"

Mark the step as SKIPPED (offline) in the status block and log the attempted queries in the plan's notes field so they can be re-run later.

Emit status block — use the appropriate variant:

  Step:   2 — PubMed/PMC Search · Complete         ← network available
  Step:   2 — PubMed/PMC Search · SKIPPED (offline) ← network unavailable
  Plan:   <N> sources in memory
  Next:   Step 3 — ClinicalTrials.gov search

Step 3 — ClinicalTrials.gov Search

rh-skills search clinicaltrials --query "<terms>" --max 20 --json

Include active or completed trials relevant to the topic. These are registry type with evidence_level: n/a (trials are not yet evidence until published).

If the search command fails, retry with --offline:

rh-skills search clinicaltrials --offline --query "<terms>"

Mark as SKIPPED (offline) and add a note in the plan to check ClinicalTrials.gov manually.

Emit status block:

  Step:   3 — ClinicalTrials.gov Search · Complete          ← network available
  Step:   3 — ClinicalTrials.gov Search · SKIPPED (offline) ← network unavailable
  Plan:   <N> sources in memory
  Next:   Step 4 — US government sources

Step 4 — US Government Healthcare Sources

For any topic with a US clinical care or population health angle, actively search the following (not via rh-skills search — reason from your knowledge and present URLs for the user to confirm). Do not use web search tools. Supply URLs from domain knowledge; if uncertain about a specific URL, record the field as tbd with a note to verify rather than searching the web:

Source	Check for
CMS eCQM Library / CMIT	Existing quality measures for the condition
QPP / MIPS	Clinician performance measures (if clinician-facing)
USPSTF	Preventive service grades (A/B especially)
Gravity Project	SDOH domain taxonomy and FHIR IGs
AHRQ	Evidence-based practice reports, PCORI-funded studies
CDC / MMWR	Surveillance data, epidemiological burden
HCUP / MEPS / GBD	Health economics and cost-of-care data

See reference.md → US Government Healthcare Coverage for full URLs and guidance.

Emit status block:

  Step:   4 — US Government Sources · Complete
  Plan:   <N> sources in memory
  Next:   Step 5 — medical society and journal sources

Step 5 — Medical Society and Journal Sources

Identify the relevant medical societies and journals for the topic. Present these with recommended search terms even if they require authentication:

Include society clinical practice guidelines
Include specialty-specific journals from reference.md → Medical Journals
For authenticated sources, set access: authenticated and include auth_note with the specific login mechanism and suggested search terms

Do not attempt to download authenticated sources — flag them in the plan with recommended: true where they are high-value, and include an inline access advisory (FR-011a/FR-011b).

Emit status block:

  Step:   5 — Medical Society Sources · Complete
  Plan:   <N> sources in memory
  Next:   Step 6 — compile and validate source list

Step 6 — Build the Living Plan

Maintain a sources[] list in memory throughout the plan. After Steps 2–5, compile the list applying these rules:

Minimum 5 sources: if fewer than 5, search additional databases or source categories before presenting
Maximum 25 sources: if more than 25 high-quality candidates, select the 25 most relevant; log extras as expansion candidates
At least one terminology entry (SNOMED, LOINC, ICD-10, RxNorm, etc.). Add a terminology source for each system explicitly named in the domain advice — one entry is the floor, not the target. For most clinical topics, domain advice will identify SNOMED CT, LOINC, ICD-10-CM, and RxNorm; each should have its own entry.
At least one health-economics entry when topic involves chronic conditions, preventive interventions, or CMS programs
All mandatory fields per FR-005: name, type, rationale, search_terms[], evidence_level, access. Plus url when access: open.

Emit status block:

  Step:   6 — Source List · Complete
  Plan:   <N> sources in memory
  Next:   Step 7 — present list for approval

Step 7 — Present to User

Present a formatted summary table of proposed sources:

| # | Name | Type | Evidence | Access | Notes |
|---|------|------|----------|--------|-------|
| 1 | ADA Standards of Care | guideline | grade-a | open | URL provided |
| 2 | NEJM article ... | pubmed-article | grade-b | authenticated | requires login |
...

For access: open sources, confirm a URL is recorded. For access: authenticated sources, note the auth method. For access: manual sources, note what the user must retrieve. Open-access sources are downloaded in Step 12 of this discovery workflow via rh-skills source download --url ...; authenticated/manual sources remain advisory and are gathered by the user.

Ask the user to approve, modify, or add/remove sources. Incorporate feedback and loop back if needed.

Emit status block:

▸ rh-inf-discovery  <topic>
  Step:   7 — Source Review
  Plan:   <N> sources in memory
  Next:   approve list / modify / add sources
What would you like to do next?

A) Approve all sources — proceed to access advisories B) Modify the list — add, remove, or edit sources C) Ask a question about a specific source

You can also ask for status rh-inf-status at any time.

Step 8 — Access Advisories

For each approved access: authenticated or access: manual source, print an access advisory so the user knows what to gather before running rh-inf-ingest:

⊘ <Source Name>
   Why relevant: <rationale for this topic>
   Access URL:   <url>
   Auth method:  <institutional login | free registration | society membership | library proxy>
   Search terms: <specific terms to use once authenticated>

For access: manual sources without a URL, describe where to find the artifact and what filename convention to use when placing it in sources/.

Command usage boundary:

Use rh-skills source scan during discovery for inventory and drift detection (UNTRACKED, TRACKED, SHA-CHANGED).
Use rh-skills ingest list-manual [<topic>] during ingest plan as a registration pre-check to list untracked files and print per-file rh-skills ingest implement ... commands.

Emit status block:

  Step:   8 — Access Advisories · Complete
  Plan:   <N> sources in memory
  Next:   Step 9 — expansion suggestions

Step 9 — Research Expansion Suggestions

After the source pass, generate 3–7 Research Expansion Suggestions. These are prospective adjacent areas — NOT sources already in the plan. Each must include:

The adjacent topic
Why it is relevant to the primary topic
The first rh-skills command the user would run to explore it

Cover at minimum (when applicable):

(a) Adjacent comorbidities or closely related conditions
(b) Healthcare economics angle (cost, burden, cost-effectiveness)
(c) Health equity or disparate-population angle
(d) Implementation science gap (barriers to guideline adoption)
(e) Data/registry gap (limited evidence or active trial inquiry)

Do NOT add suggestions to sources[] automatically. They are offered for the user to act on.

Present the suggestions as a numbered list and explicitly offer to explore any of them now. Example:

Research Expansion Suggestions:
  1. Diabetic kidney disease — comorbidity with shared CMS quality measures
     → rh-skills search pubmed --query "diabetic nephropathy CKD quality measures"
  2. Health equity in diabetes management — disparities by race/ethnicity
     → rh-skills search pubmed --query "diabetes management disparities SDOH"
  ...

Would you like to explore any of these now? Reply with the number, or proceed
to save the plan.

If the user selects an expansion area, loop back to Steps 2–9 for that area, merge new findings into the living plan, then return here.

Emit status block:

  Step:   9 — Expansion Suggestions · Complete
  Plan:   <N> sources in memory
  Next:   explore an expansion area / save the plan

Step 10 — Interactive Prompt

After presenting the plan and suggestions, emit the status block followed by the options below. No lead-in, no preamble before the status block. Do not pre-answer choices not yet made. Do not add guidance like "if you choose C..." or "the next step would be...". Wait for the user's reply.

If the user wants to explore an expansion area, loop back to Steps 2–9 for that area and merge the findings into the living plan.

Emit status block:

▸ rh-inf-discovery  <topic>
  Step:   10 — Awaiting Direction
  Plan:   <N> sources in memory
  Next:   explore expansion / modify / save plan
What would you like to do next?

A) Explore expansion area — reply with the number (e.g. "explore 2") B) Add, remove, or modify sources C) Save the plan

You can also ask for status rh-inf-status at any time.

Step 11 — Save Checkpoint

When the user approves saving:

Write topics/<topic>/process/plans/discovery-plan.yaml — the structured source list (see Level 3 below for the required format)
Write topics/<topic>/process/plans/discovery-readout.md — the generated domain narrative (Domain Advice + Research Expansion Suggestions prose); add a note at the top that it is derived from discovery-plan.yaml
Update RESEARCH.md root portfolio row for the topic (source count, date)

After saving, emit status block:

▸ rh-inf-discovery  <topic>
  Step:   11 — Save Checkpoint · Complete
  Plan:   saved · <N> sources → topics/<topic>/process/plans/discovery-plan.yaml
  Next:   Step 12 — Download open-access sources
What would you like to do next?

A) Proceed to download open-access sources now B) Return to the plan to revise sources

You can also ask for rh-inf-status at any time.

Step 12 — Download Open-Access Sources

After the plan is saved, download all access: open sources immediately. Launch one subagent per source in parallel:

rh-skills source download --url <url> --name <name> --type <source-type> --topic <topic>

Type passthrough policy: For each open-access source, pass the source type from discovery-plan.yaml into source download --type so tracking has an initial classification hint at registration time. rh-skills ingest classify remains authoritative and may refine or replace this value later.

Topic passthrough: Always pass --topic <topic> on every download call. Discovery is topic-scoped, so downloaded sources should be associated with the same topic at registration time.

NEVER use curl, wget, Python requests, or any other download method. rh-skills source download --url is the only permitted download mechanism.

Once all subagents complete, display a summary:

Downloads complete:
  ✓ ada-guidelines-2024       sources/ada-guidelines-2024.pdf
  ✓ cms-ecqm-cms122           sources/cms-ecqm-cms122.html
  ⊘ cochrane-review           exit 3 — auth redirect (see auth_note)
  ⛔ nice-guidelines           exit 4 — network blocked (sandbox)

Exit 3 → authentication redirect — print the auth_note advisory and skip
Exit 2 → file already exists and checksum matches — skip (idempotent)
Exit 1 → network error — report and continue
Exit 4 → sandbox network restriction — outbound network is blocked. Stop retrying downloads. Inform the user:

"Downloads require outbound network access, which is blocked in this sandbox. Please run the following commands in a shell with network access:" Then list every blocked rh-skills source download --url … --topic <topic> command.

For access: authenticated or access: manual sources: print the auth_note advisory only. Do not attempt to download them.

--append-to-plan boundary: use rh-skills search ... --append-to-plan <topic> only when appending to an existing topic plan file. During the initial discovery planning loop, keep entries in memory and write once in Step 11.

Emit status block:

  Step:   12 — Download · Complete
  Downloaded: <N> open · <M> skipped (auth) · <P> skipped (manual)
  Next:   Step 13 — Verify Recommendation

Step 13 — Verify Recommendation

Remind the user that verify mode runs non-destructive checks on the saved plan. Once it passes, the plan and downloaded sources are ready to hand off to rh-inf-ingest for normalization, classification, and annotation.

Mode: `verify`

Read-only — no file writes, no tracking.yaml modifications.

rh-skills validate --plan topics/<topic>/process/plans/discovery-plan.yaml

Report the output verbatim. Exit with the same code as rh-skills validate --plan.

On exit 0, before emitting the status block, emit:

Verification passed! Your discovery plan is well-formed and ready for rh-inf-ingest.
▸ rh-inf-discovery  <topic>
  Mode:    verify
  Result:  PASS
  Next:    rh-inf-ingest plan <topic>
What would you like to do next?

A) Run rh-inf-ingest plan <topic> — begin source acquisition B) Return to the discovery plan to revise sources

You can also ask for status rh-inf-status at any time.

On exit 1, emit:

Verification failed. Please address the following issues in topics/<topic>/process/plans/discovery-plan.yaml:
▸ rh-inf-discovery  <topic>
  Mode:    verify
  Result:  FAIL — <N> check(s) failed
  Next:    fix: <specific issue(s) listed above>, then re-run verify
What would you like to do next?

A) Fix the listed issues and re-run rh-inf-discovery verify B) Return to the plan to modify sources

You can also ask for status rh-inf-status at any time.

Level 3 — Discovery Plan Format

The discovery plan is two files written to the same directory under topics/<topic>/process/plans/:

`discovery-plan.yaml` — Structured Source List

Pure YAML. This is what rh-skills validate --plan and rh-inf-ingest operate on.

topic: "<topic>"                    # required; matches topics/<topic> directory
date: "YYYY-MM-DD"
domain: "<freeform research area label>"   # optional, from --domain flag or inferred
sources:
  - name: "<display name>"
    type: "<source type>"            # see reference.md Source Type Taxonomy
    rationale: "<why this source>"
    search_terms:
      - "<term1>"
      - "<term2>"
    evidence_level: "<level>"        # see reference.md Evidence Level Taxonomy
    access: "<open|authenticated|manual>"
    url: "<url>"                     # required when access: open
    auth_note: "<how to access>"     # required when access: authenticated
    recommended: true                # optional; true for high-value authenticated sources

See examples/plan.yaml for a complete worked example.

`discovery-readout.md` — Generated Domain Narrative

Markdown prose. Generated by the agent; consumed by agents and humans for context; not machine-parsed. Must begin with the derivation note.

> **Note:** This file is a narrative readout derived from `discovery-plan.yaml`.
> It is generated by the `rh-inf-discovery` skill and should not be edited directly.
> The structured source list in `discovery-plan.yaml` is the single source of truth.

## Domain Advice

<Prose addressing the Domain Advice Checklist from reference.md:
 CMS program alignment, SDOH relevance, health equity, quality measure landscape,
 terminology systems, health economics angle.>

## Research Expansion Suggestions

1. **<Adjacent Topic>** — <Why relevant to primary topic>.
   Start with: `rh-skills search pubmed --query "<terms>" --max 20`

2. ...

See examples/readout.md for a complete worked example.

Error Handling

Condition	Action
Fewer than 5 sources after all searches	Search additional databases; do not save
More than 25 sources	Select top 25, log extras in expansion suggestions
`rh-skills validate --plan` exits 1	Report failures; do not proceed to ingest
Plan already exists (no `--force`)	Offer continuation or fresh start
`rh-skills search` network error	Retry with `--offline` flag; mark step `SKIPPED (offline)`; continue with domain knowledge

CLI Quick Reference

Command	Purpose
`rh-skills search pubmed --query "..." --json`	Search PubMed (live)
`rh-skills search pubmed --query "..." --append-to-plan <topic>`	Append PubMed results to an existing topic plan
`rh-skills search pubmed --offline --query "..."`	Record query; get reference links (no network)
`rh-skills search pmc --query "..." --json`	Search PMC open-access (live)
`rh-skills search pmc --query "..." --append-to-plan <topic>`	Append PMC results to an existing topic plan
`rh-skills search clinicaltrials --query "..." --json`	Search ClinicalTrials.gov (live)
`rh-skills search clinicaltrials --query "..." --append-to-plan <topic>`	Append ClinicalTrials results to an existing topic plan
`rh-skills source scan`	List untracked/SHA-changed files in sources/
`rh-skills source add --type <type> --title "..." --rationale "..."`	Add a single source entry
`rh-skills source add --dry-run ...`	Preview source entry without writing
`rh-skills schema show discovery-plan`	Show plan schema and allowed taxonomies
`rh-skills validate --plan topics/<topic>/process/plans/discovery-plan.yaml`	Validate a saved discovery plan
`rh-skills validate --plan -`	Validate via stdin (pipe or heredoc)

rh-inf-discovery

─────────────────────────────────────────────────────────────────────────────

REQUIRED FRONTMATTER — Level 1 disclosure (loaded into system prompt at startup)

─────────────────────────────────────────────────────────────────────────────

rh-inf-discovery

Overview

Guiding Principles

User Input

Pre-Execution Checks

Mode: plan

Step 0 — Scan for Manually Placed Files

Step 1 — Domain Advice

Step 2 — PubMed Search

Step 3 — ClinicalTrials.gov Search

Step 4 — US Government Healthcare Sources

Step 5 — Medical Society and Journal Sources

Step 6 — Build the Living Plan

Step 7 — Present to User

Step 8 — Access Advisories

Step 9 — Research Expansion Suggestions

Step 10 — Interactive Prompt

Step 11 — Save Checkpoint

Step 12 — Download Open-Access Sources

Step 13 — Verify Recommendation

Mode: verify

Level 3 — Discovery Plan Format

discovery-plan.yaml — Structured Source List

discovery-readout.md — Generated Domain Narrative

Error Handling

CLI Quick Reference

Mode: `plan`

Mode: `verify`

`discovery-plan.yaml` — Structured Source List

`discovery-readout.md` — Generated Domain Narrative