haipipe-probe - SKILL.md Agent Skill

name: haipipe-probe description: "Research probe pipeline — drives how tasks/runs in a project roll out. Each probe is a claim-directed research thread: design (hypothesis + planned arms), bridge (scaffold tasks/runs in C_task), result (harvest arms → claim), review (structural QA + Codex semantic verdict), explore (coverage + propose next), loop (review→propose→materialize→re-review). Contains no code — pure steering layer on top of C_task execution. Feeds F_paper. Trigger: probe, claim, hypothesis, drive probe, plan next runs, aggregate runs, statistical test, paired-t, coverage, propose next probe, review-loop, iterate until claim holds, implement the plan, deploy probes, /haipipe-probe." argument-hint: [function] [probe_ref_or_path] [args...] allowed-tools: Bash, Read, Grep, Glob, Skill metadata: version: "1.2.0" last_updated: "2026-06-01" summary: "Research probe pipeline — drives how tasks/runs in a project roll out." changelog: - "1.0.0 (2026-05-31): baseline metadata added." - "1.1.0 (2026-06-01): document lightweight probe folder naming (MM-NN_slug) plus year archive folders." - "1.2.0 (2026-06-01): probe folder naming switches to date-based MMDD_slug + P.MMDD refs (same-day collisions get a letter suffix)."

Skill: haipipe-probe (orchestrator)

User-facing entry for the research probe pipeline.

Naming note: the command and folder remain /haipipe-probe and probes/ for compatibility. Conceptually this layer is D_probe: each probe folder is a focused probe that asks reality whether a candidate claim or story direction survives contact with evidence.

Two pipelines live side-by-side in a project; this skill owns the research side and never crosses into execution:

EXECUTION PIPELINE          (C_task)
  task / run = 做什么、怎么做
  artifacts:  code, notebooks, configs, runtime.yaml, metrics.json
  question:   "this run, did it work?"

RESEARCH PROBE PIPELINE     (D_probe ← this skill; project folder probes/)
  probe      = 朝哪个方向探索、为什么做、接下来做什么
  artifacts:  probe.yaml, daily logs, review.md, claim
  question:   "across these runs, does the hypothesis hold?"

A probe is a research thread, not a claim repository. It steers how tasks and runs roll out: defines hypothesis → bridges plan into C_task tasks → harvests arms → judges → proposes next move → iterates. It contains NO code, NO notebooks, NO metrics computation; all that lives in tasks/. It only holds steering state (plan + links + verdict + narrative).

The two pipelines have a strict one-way dependency: probes read task artifacts and link to them; tasks never reference probes.

/haipipe-probe                                -> dashboard (list probes)
/haipipe-probe design new <slug>              -> define new probe folder
/haipipe-probe design link <probe> <run-path> -> link a run to an arm
/haipipe-probe result <probe>                 -> aggregate + claim
/haipipe-probe review <probe>                 -> structural QA gate
/haipipe-probe review integrity <probe>       -> Codex fraud-pattern audit
/haipipe-probe review claim <probe>           -> Codex semantic verdict
/haipipe-probe bridge <probe>                 -> scaffold arms in C_task + deploy
/haipipe-probe explore [project-path]         -> coverage map + propose
/haipipe-probe loop <probe>                   -> iterate review→propose→materialize
/haipipe-probe inspect [<probe> | <project>]  -> list / status / audit
/haipipe-probe "<natural language>"           -> infer, dispatch

Specialists

haipipe-probe-design       PRE-RUN:  new / link arms (defines what to test)
haipipe-probe-bridge       BRIDGE:   scaffold arms as tasks in C_task + deploy
haipipe-probe-result       POST-RUN: aggregate stats + write claim
haipipe-probe-review       QA:       structural checks + Codex claim verdict
haipipe-probe-explore      META:     coverage map + propose next
haipipe-probe-loop         ITERATE:  chain review→propose→materialize→re-review
haipipe-probe-inspect      READ:     list / status / audit (no writes)

Function Verb Map

new, define, create, design, hypothesis           -> design (new)
link, attach, add run, assign run                 -> design (link)
bridge, deploy, scaffold, materialize, implement,
make-runnable, run the plan, 实现实验, 部署        -> bridge
aggregate, compute, mean+std, paired-t            -> result (aggregate)
claim, conclude, write statement                  -> result (claim)
review, qa, quality check                         -> review (structural)
audit, integrity, fraud, fake-GT, phantom results,
honesty check, scope check, leakage check         -> review (integrity)
verdict, judge, supports?, semantic check         -> review (claim)
explore, coverage, gap, propose, suggest          -> explore
loop, iterate, until passes, auto-review-loop,
review-loop, keep improving                       -> loop
inspect, list, status, show probes           -> inspect

Files Owned by This Umbrella

SKILL.md                       (this file)
ref/                           shared across specialists:
  probe-yaml-schema.md    probe.yaml field spec
  probe-cycle-audit-template.txt       CYCLE.md — per-probe closed-loop audit.
                              Self-contained: stage trail · method · evidence ·
                              verdict · the DIKW THIS cycle produced (its 🟦 D +
                              own 🟨 K). One probe cycle = probe.yaml + CYCLE.md.
  probe-run-dashboard-template.txt     TASK-RUNS.md — campaign run dashboard:
                              arms × runs (proposed vs existing/done), seed fill.
  probe-status-template.txt   canonical campaign status tracker (4 sections).
  (DIKW is PARTITIONED, not duplicated: per-probe 🟦 D + 🟨 K live in the cycle
   (CYCLE.md); cross-probe 🟩 I + meta-🟨 K + 🟧 W live in insights/INDEX.md —
   the narrative layer. No separate campaign DIKW dashboard.)
                              Arc is a LOOP: probe = machine hub (judge before
                              insights); the HUMAN narrative closes it by
                              proposing the next probe (explore only feeds that
                              human decision):
                              probe(plan)->task(run)->probe(judge)->insights
                              ->explore->NARRATIVE(human: propose next)-> ...
                              NEVER `paper` (that is a downstream application).
  probe-entry-template.txt  per-probe entry template (project log)
  probe-headline-template.txt headline scoreboard skeleton
  probe-caveats-checklist.txt 8+ confound categories
  _legacy-scope-expmt.md       migrated content reference (read-only)

Where probes live (project-level)

examples/Proj-X/
├── probes/                            ← project-level folder
│   ├── INDEX.md                            (auto: list all probes)
│   ├── coverage.md                         (auto: /explore coverage output)
│   ├── propose.md                          (auto: /explore propose output)
│   ├── comparison.md                       (auto: /result render output)
│   │
│   ├── 0601_framing_loss-aversion/        ← active folder-per-probe
│   │   ├── probe.yaml                      source of truth (claim + arms + result)
│   │   ├── review.md                       latest QA + Codex verdict (overwritten)
│   │   ├── CLAIMS_FROM_RESULTS.md          Codex verdict snapshot
│   │   └── logs/                           daily narrative
│   │       ├── 2026-06-01.md
│   │       └── 2026-06-02.md
│   │
│   ├── 0602_simplification_plain-language/
│   │   └── ...
│   │
│   └── 2026-archive/                       inactive/completed/deprecated probes
│       ├── 0501_social-norm/
│       └── 0502_long-message/
│
├── tasks/...                               (execution, C_task owns)
└── paper/...                               (claims feed F_paper)

Naming: active probe folders live directly under probes/ as <MMDD>_<short-name>/, where MMDD is the creation date (MM = month, DD = day). A second probe created the same day gets the next free lowercase letter suffix (0601 → 0601b). The canonical source ref is P.<MMDD> (e.g. P.0601). Inactive, completed, or deprecated probes move to probes/<YYYY>-archive/ with the original folder name preserved. probe.yaml is source of truth; comparison.md and INDEX.md are derived; logs/<DATE>.md is append-only daily narrative. NO code in probe folders — figures/tables/notebooks live in tasks/ and are referenced via evidence: field in probe.yaml.

Probe identity contract:

folder:             probes/0601_framing_loss-aversion/
source of truth:    probes/0601_framing_loss-aversion/probe.yaml
yaml id:            P.0601
mixed source refs:  P.0601
resolver accepts:   P.0601 | 0601 | probes/0601_framing_loss-aversion/

Legacy grouped layouts such as probes/A_baseline_controls/01_lhm_vs_baseline/ may be read during migration, but new probe folders should use the lightweight active/archive layout.

Routing Logic

Step 1: Parse $ARGUMENTS.
Step 2: Resolve verb -> specialist via verb map.
Step 3: Validate target (probe ref/folder/path or project path).
Step 4: Dispatch: Skill("haipipe-probe-<specialist>", args="<verb> <rest>").
Step 5: Surface specialist tail.

Specialist Return Contract

status:    ok | blocked | failed
summary:   2-3 sentences
artifacts: [paths created / read]
next:      suggested next command (often inspect or explore)

Relation to other top-level skills

A_discover    feeds ideas  → suggested probes
C_task     provides runs → linked into probe arms
E_insight        consumes claims → analysis methodology
F_paper       consumes claims → paper writing

D_probe is the central hub: reads from B, writes claims that
feed both E_insight (analysis) and F_paper (writing).