name: iterate-ml-experiment
description: >
Owns the iteration loop on top of an ML workspace: the
journal/JOURNAL.md index and the per-experiment
journal/NN_short_name.md design notes that must be drafted and
approved by the user before experiments/NN_short_name.py is
created. Drives the propose → iterate → approve → implement →
record loop; dispatches to iterate-from-skore /
iterate-from-user for sourcing.
TRIGGER — any of:
- A session opens in an ML workspace (whether or not
journal/exists yet — missing/placeholder → bootstrap mode). - User says "what's next", "resume", "where were we", "let's iterate", "propose next", "first baseline".
- About to create a new
experiments/NN_*.py(the matchingjournal/NN_*.mdmust exist and be approved first). - User wants to record an outcome from a finished run.
- User asks to compare past experiments or review what's been tried ("compare X and Y", "where are we?").
SKIP when: no journal/ yet AND no workspace scaffold (route to
organize-ml-workspace); the work is mechanical inside
pipeline.py / evaluate.py / data.py with no journal-level
implication (owned by build-ml-pipeline /
evaluate-ml-pipeline); the user asks for a symbol lookup
(python-api); the user is diagnosing a single skore report
without a "what next" framing (evaluate-ml-pipeline).
HOW TO USE: read journal/JOURNAL.md first, classify the turn via
the Mode picker (table near the top), then read only the
matching section. Sibling skills open just-in-time when a step
requires them — do not pre-read all sibling skills at session start.
Design notes are the only artifact this skill writes; read,
compare, and overview modes don't write.
Iterate ML Experiment
The loop on top of experiments/: what to try next, why, what
counts as a result, how the trail is recorded. Pipeline / evaluation
mechanics live in sibling skills.
Next-step pointers — flow at a glance
session open
│
├── JOURNAL.md missing / placeholder ──► § 0 Bootstrap
│ │
│ ├─► G-EDA (explore-ml-data: run | skip)
│ │
│ └─► design note → G-DESIGN → § 3 implement
│
├── "what's next?" with ≥1 done row ───► § 1 → § 2 (sourcing) → § 3 implement
│
├── "run finished" ─────────────────────► § 4 record outcome
│ │
│ └─► dispatch audit-ml-pipeline
│
└── "status?" / "compare X Y" ──────────► references/maintenance_modes.md
Always re-emit the Pre-flight checklist with evidence before declaring the turn done.
First action — read state + emit read-set tracker
Open each sibling SKILL.md just-in-time when a step calls for
it (e.g. open evaluate-ml-pipeline before § 3's CV-strategy
step). Do not pre-read all at session start.
Sibling skills (just-in-time):
- organize-ml-workspace, data-science-python-stack,
python-env-manager, python-api, python-code-style,
explore-ml-data, build-ml-pipeline, evaluate-ml-pipeline,
test-ml-pipeline, smoke-test-ml-pipeline,
iterate-from-skore / iterate-from-user
Then before answering:
- Read
journal/JOURNAL.md. Missing/placeholder → bootstrap (§ 0). This is the canonical project digest (Status, Data understanding (EDA), History, Backlog). - Check
Workspace decisionsblock for pre-recorded gates (tabular, env_manager, package, skore_mode, cv_splitter) — a recorded decision skips itsAskUserQuestion. - Emit the Pre-flight checklist with each box filled.
- Use the Mode picker to find which section to read.
Mode picker — read this before navigating the body
You read one mode section per turn. Match the user's signal, then jump.
| Signal / workspace state | Mode | Section |
|---|---|---|
JOURNAL.md missing / placeholder / 0 History rows |
Bootstrap | § 0 |
journal/ not scaffolded (no src/, no experiments/) |
Bootstrap → handoff first | → organize-ml-workspace, then § 0 |
| "what's next?" / "let's iterate" / "propose next" — with ≥1 done row | Iterate (propose) | §§ 1–3 + Dispatch table |
| "the run finished" / "log the result" / "we got X = …" | Iterate (record) | § 4 |
| "where are we?" / "status?" / "what have we tried?" | Project overview | references/maintenance_modes.md § "Project overview" |
| "compare X and Y" / "X vs Y" | Compare (read-only) | references/maintenance_modes.md § "Compare past experiments" |
| "let's pivot the goal" / "actually we care about |
Goal pivot | references/maintenance_modes.md § "Goal pivots" |
| "abandon X" / "drop X" | Abandoned | references/maintenance_modes.md § "Abandoned experiments" |
| Re-do a prior experiment under different conditions | Re-run | references/maintenance_modes.md § Re-runs |
If two modes seem to match ("compare X and Y, then propose"), pick the read mode first, stop. Re-entering § 1 is a separate turn.
Stop conditions — read before anything else
No design note, no script. Never create or edit
experiments/NN_*.pyuntiljournal/NN_*.mdexists, is filled, and the user has explicitly approved it.JOURNAL.mdis read at session start, not improvised. Don't reconstruct history fromexperiments/filenames orgit log— those don't carry the why.Strategy is picked, not assumed. Name the sourcing strategy in every proposal (
skore/user/my-pick/B<N>). Don't silently default. Exception: bootstrap — baseline is forced by workspace defaults; no strategy dispatch.Approval is explicit. "approved" / "yes" / "go" / "looks good" from the user is the gate. Ambiguous → re-ask via
AskUserQuestion.Outcomes are recorded, not narrated. When the run finishes, the outcome lands in
JOURNAL.mdAND the Status block before the conversation moves on.Prior experiments stay reproducible. Every
donerow must remain runnable onmainwith the same result. When touchingsrc/<pkg>/, default behavior preserves prior experiments' shape (seebuild-ml-pipeline§ Reproducibility). Cheap check:tests/smoke/— any prior smoke test going red means default behavior is broken.Three skills, in order, before any code in
src/<pkg>/. After G-DESIGN:build-ml-pipeline→pipeline.py/features.py/data.py.evaluate-ml-pipeline→evaluate.py. Owns CV-strategy viaAskUserQuestion. Writingevaluate.pywithout invoking it is the most common shortcut.test-ml-pipeline→smoke-test-ml-pipeline→ smoke test.
Only then assemble
experiments/NN_*.py.Harness "no clarifying questions" hints do NOT waive gates. G-DESIGN, G-RUN, the §1 mode pick, the §2 sourcing menu, the §0 config gates are operating-contract gates.
Post-hoc audit — required before ending the turn. Walk every pre-flight row; surface unfilled Evidence cells explicitly.
Forbidden shortcuts
| Shortcut | Why it's wrong |
|---|---|
| User said "quick baseline" → skip G-DESIGN | G-DESIGN is non-negotiable; "quick" never waives it. The design note is the postmortem's frozen Method |
| Scaffold + implement in one turn before G-DESIGN | Inverts the contract. Code that lands before approval has no Motivation/Risks the user signed off on |
Skipped evaluate-ml-pipeline because KFold(5) "feels right" |
Even empty split_kwargs is a justified pick the skill exists to surface. Bypass = user never got the choice |
| Bootstrap mode → skip ALL questions, not just the sourcing menu | Bootstrap forbids the sourcing menu only. G-PKG-NAME / G-ENV-MGR / G-TABULAR / G-SKORE-MODE / G-EDA / G-DESIGN / G-CV-SPLITTER / G-RUN still fire |
| Ambiguous "hmm interesting" / "I guess" read as approval | Approval is explicit. Ambiguity → re-ask, never silent yes |
Auto-detect run finished via reports/ mtime |
§ 4 is user-triggered (v1). The skill never auto-records |
| § 4 finishes recording → declare done, skip audit dispatch | § 4 audit dispatch is part of record-outcome, not optional. The audit digest carries the headline metrics for the JOURNAL row |
| Run experiment in same turn as G-RUN → declare done without § 4 | § 4 follows G-RUN in the same turn when the run completes successfully. Don't stop at "I ran it" — record the outcome |
| Pre-read every sibling SKILL.md file at session start | Read-set tracker is not a blocking gate. Open siblings just-in-time; emit pending list but proceed |
Pre-flight — emit before any design-note write
Compact checklist; Evidence-format spec in
references/preflight_evidence.md.
Pre-flight (iterate-ml-experiment):
- [ ] `journal/JOURNAL.md` read this turn (or confirmed missing → bootstrap)
Evidence: Read journal/JOURNAL.md (this turn) | "missing — bootstrap"
- [ ] `Workspace decisions` block checked for pre-recorded gates
Evidence: lists each <gate>: <value | not recorded>
- [ ] Mode: bootstrap | iterate-propose | iterate-record |
overview | compare | goal-pivot | abandoned | re-run
Evidence: rule that matched (Mode picker row)
- [ ] Last experiment + status: <NN_name> | n/a — bootstrap
Evidence: last row of JOURNAL.md History
- [ ] (Iterate-propose only) Sourcing menu presented; user picked
Evidence: AskUserQuestion id=<id>, answer=<skore|user|my-pick|B<N>>
| user free-text quote turn N
| "n/a — bootstrap / read-only mode"
- [ ] (Bootstrap only) Upfront config gates fired (G-PKG-NAME,
G-ENV-MGR, G-TABULAR, G-SKORE-MODE)
Evidence: per-gate ask id OR JOURNAL.md Status reference
| "n/a — iterate mode"
Note: G-CV-SPLITTER is NOT an upfront gate — it fires later, in
the § 3 chain at the evaluation step (after G-DESIGN).
- [ ] (Bootstrap only) G-EDA fired BEFORE the baseline draft
Evidence: explore-ml-data dispatched; answer=<run|skip>;
JOURNAL.md `## Data understanding (EDA)` section present
| "n/a — iterate mode"
- [ ] Design note drafted (or Backlog enriched, for `skore`)
Evidence: Write journal/<NN>_<name>.md (this turn) | "Backlog
rows B<x>..B<y> appended" | "n/a — read-only mode"
- [ ] G-DESIGN: user approved before any `experiments/NN_*.py` touched
Evidence: AskUserQuestion id=<id>, answer=approved | user quote |
"n/a"
- [ ] (§ 3 only) Three-skill chain ran in order:
build → evaluate → test
Evidence: each owning skill produced its file this turn
| "n/a outside § 3"
- [ ] (§ 3 only) G-CV-SPLITTER resolved during the evaluate step
Evidence: evaluate-ml-pipeline fired the splitter AskUserQuestion
(or mapped split_kwargs) before `evaluate.py` write
| "n/a outside § 3"
- [ ] (§ 3 only) G-RUN resolved: run now | leave for later
Evidence: AskUserQuestion id=<id> | "n/a outside § 3"
- [ ] (§ 4 only) All artifacts written: Status block + JOURNAL row +
Backlog hygiene + audit dispatch
Evidence: list each artifact written | "n/a outside § 4"
- [ ] python-api consulted for any new external symbol
Evidence: Read/Write scratch/api/<lib>/<v>/<topic>.md (this turn)
| "n/a — only re-using cached symbols"
- [ ] Pre-flight re-emitted with evidence before final message.
Evidence: this checklist appears in the end-of-turn summary.
§ 0 Bootstrap (first session only)
Workspace is in bootstrap mode when journal/JOURNAL.md is missing,
placeholder, or has 0 History rows.
Procedure (compact — full version in references/bootstrap.md):
- Scaffold first if needed. No
src//experiments//journal/→ hand off toorganize-ml-workspace, return when the placeholderJOURNAL.mdexists. - Rewrite
JOURNAL.mdfromtemplates/JOURNAL.md. - Derive the goal default from
data/README.mdbefore asking. Propose one sentence; user confirms or amends. - Explore the data BEFORE designing the model (G-EDA). Dispatch
to
explore-ml-data. The gate is binary (run / skip); on run it executesdata/eda.py, writesdata/eda.md+ HTML, and fills the## Data understanding (EDA)JOURNAL section. The findings (target balance / skew, datetime / group columns, missingness, cardinality) feed the next step's learner and metric defaults and inform the CV strategy chosen later at the evaluation step. The run path needs the agent feature (ipython) and may triggerG-AGENT-FEATUREhere, before the baseline; if the user declines it, EDA falls back to skip. On skip, the JOURNAL section recordsStatus: skipped. - Auto-draft
journal/01_baseline.mdvia the consultation chain, informed by the EDA findings: learner default (build-ml-pipeline) and metric default (python-apion skore.evaluate). Do NOT fix a splitter here — the cross-validation strategy is data-driven and decided at the evaluation step (G-CV-SPLITTER, owned byevaluate-ml-pipeline) once the pipeline's X-marker exists; the note simply records that it is decided then. Conflicts with the EDA findings or the goal → flag in Risks, don't override. - User's role in bootstrap is approve or amend — not invent.
- Exit bootstrap once the baseline is approved and recorded. Audit file lands at first § 4 record-outcome.
Bootstrap skips the sourcing menu — NOT the config gates
Skipped: sourcing menu, § 1 resume/record/propose pick.
Still fires:
| Gate ID | Picks | Owner | Fires |
|---|---|---|---|
G-PKG-NAME |
src/<pkg>/ import name |
organize-ml-workspace |
before manifest creation |
G-ENV-MGR |
Env manager | python-env-manager |
before any install command |
G-TABULAR |
Tabular library (pandas / polars) | data-science-python-stack |
before data.py write |
G-SKORE-MODE |
Skore Project mode (local / hub / mlflow) + hub workspace name or MLflow tracking URI | organize-ml-workspace |
before pyproject.toml write |
G-EDA |
Explore the data (run / skip) before the baseline is designed | explore-ml-data |
before the journal/01_baseline.md draft |
G-AGENT-FEATURE |
Install ipython + pyright (install / skip) | python-env-manager |
conditional — when G-EDA = run and the agent feature isn't present (else first audit at § 4) |
G-DESIGN |
User approval of journal/01_baseline.md |
this skill | before any src/<pkg>/ or experiments/ code — i.e. before the § 3 chain |
G-CV-SPLITTER |
CV family for skore.evaluate |
evaluate-ml-pipeline |
inside the § 3 chain, AFTER G-DESIGN — at the evaluate step, before evaluate.py write; mandatory even with empty split_kwargs |
G-RUN |
"run now" vs "leave for later" | this skill | before executing the experiment script |
Free-text "quick baseline" / "you pick" do NOT resolve any of
these — fall through to structured AskUserQuestion.
→ next: G-DESIGN, then § 3 implementation chain.
§ 1 Session start (iterate mode)
Read
JOURNAL.md.Summarize to the user in 2–3 lines: dataset, goal, last experiment + status, what's ripe in Backlog.
Ask via
AskUserQuestion— three options, no silent default:- resume — last experiment still planned/approved/unfinished.
- record outcome — last one ran; enter § 4.
- propose next — last one is
doneorabandoned; → § 2.
Free-text "let's keep going" / "yeah" is ambiguous — wait.
§ 2 Propose the next experiment
The sourcing menu — surface VERBATIM
Every time § 2 runs in iterate mode, surface this menu with the JOURNAL.md Backlog table. Never silently default.
How would you like me to source the next experiment?
skore — read the audit digest at scratch/audit/<stem>/audit.md
from the latest run; follow each surfaced check's
documentation_url to draft a Backlog row, summarize,
re-present this menu.
user — you tell me what to try: article URL, GitHub issue,
spec / reference repo, or free text.
my-pick — I synthesize 2–4 candidate ideas; you pick one.
B<N> — promote a Backlog row directly.
Backlog (pick by index):
<paste JOURNAL.md Backlog table here>
Use AskUserQuestion for the pick. Plain-text enumeration only if
unavailable.
Free-text handling — first match wins
| User said… | Resolves to |
|---|---|
Exact label (skore / user / my-pick / B<N>) |
that pick |
B2 / "let's do B2" |
B<N> pick |
| Scientific article URL pasted | user → article-link branch |
GitHub issue URL / org/repo#N / spec path |
user → resource-link branch |
| "give me ideas" / "you decide" | my-pick |
| "let me try X" / "use Y instead" | user → free-text branch |
| Ambiguous / off-menu | fire AskUserQuestion, don't guess |
Branches
skore→ dispatch toiterate-from-skore. Returns Backlog-candidate rows + summary. Write rows with stableB<N>, surface summary, re-present sourcing menu. No design note this turn.user→ dispatch toiterate-from-user. Returns a Proposal. Draft intojournal/NN_short_name.md.my-pick→ handled inline. Read JOURNAL.md Status, last Implication / Risks, current Backlog. Synthesize 2–4 candidates, present viaAskUserQuestion. Draft the design note on pick.B<N>→ promote the row. The row'sItembecomes the seed; the row'sSourcebecomesSourcing strategy. Remove from Backlog on approval.
For user / my-pick / B<N>: write draft to
journal/NN_short_name.md using templates/experiment_design.md.
NN is the next free integer; short_name is the user's call.
→ next: § 3.
§ 3 Iterate on the design note + implement
Surface the draft: file path + 3–5 line summary (Question / Method / Risks).
Mid-iteration feedback is free-text. Edit
journal/NN_*.mdin place; loop here.Final approval gate is
AskUserQuestionwith two options:- approved — flip status, add JOURNAL History row, hand off to the three-skill chain.
- more changes — back to amendment loop.
Clear free-text "approved" / "go" / "looks good" resolves; ambiguous → structured ask.
Do not create
experiments/NN_*.pyduring design iteration.Track provenance honestly. Risks-only edits keep the original
Sourcing strategy. Method changes →<original> + user override.
Three-skill implementation chain — non-skippable
After G-DESIGN passes, dispatch in order:
build-ml-pipeline→src/<pkg>/{pipeline,features,data}.py.evaluate-ml-pipeline→src/<pkg>/evaluate.py. Owns the CV-strategy viaAskUserQuestion. Bypassing is the named forbidden shortcut.test-ml-pipeline→smoke-test-ml-pipeline→ matching smoke test attests/smoke/test_NN_<short_name>.py.
Only then assemble experiments/NN_*.py. Confirm signatures via
python-api, not memory.
G-RUN — post-smoke run gate
Once tests/smoke/ passes (the new test AND every prior one):
ask via AskUserQuestion:
- run now — execute
pixi run python experiments/NN_<short_name>.py. - leave for later — do NOT print the command, do NOT auto-propose. Surface JOURNAL Status + Backlog verbatim, stop.
No silent default.
→ next: if the run completed in this turn, continue immediately to § 4. Don't stop at "I ran it" — record the outcome.
§ 4 Record outcome
Trigger: user says "the run finished" / "log it", OR the
agent ran the experiment in the same turn (G-RUN = run now) and it
completed successfully. Do NOT auto-detect via reports/ mtime
or polling for runs the user kicked off themselves.
Procedure
- Audit-first: dispatch to
audit-ml-pipelineto place + executeaudit/NN_<short_name>.py. The audit reads the report read-only via the bundled runner and streams a markdown digest that carries the headline metrics. The audit replaces scratch probes — don't writescratch/<ts>_inspect_*.pyto extract metrics from the report when the audit is the canonical path.- Agent feature must be installed; if not, audit-ml-pipeline
routes to
python-env-manager§ Agent feature (G-AGENT-FEATURE).
- Agent feature must be installed; if not, audit-ml-pipeline
routes to
- Read the audit digest. The metrics + checks summary are the source for the next 3 steps.
- Fill all four Status-block fields in
journal/NN_*.md:- State:
done(orabandonedwith one-line reason). - Approved by user on: unchanged from approval.
- Headline result: metric + uncertainty (e.g.
RMSE 0.083 ± 0.004 (5-fold CV)). - Implication for next iteration: 1–2 sentences.
- State:
- Smoke-test gate before
done— alltests/smoke/must pass. Prior failures = reproducibility regression → route tobuild-ml-pipeline§ Reproducibility. The CV report can still land in skore Project, but the JOURNAL row staysapproveduntil full smoke suite is green. Abandonment doesn't require passing smoke. - Append the headline to
JOURNAL.mdHistory. - Backlog hygiene: scan for items the new run answered or
killed. Delete or strikethrough (
~~old~~ — resolved in NN_X). Diagnostic mining of the new report isiterate-from-skore's job, not § 4's. - (Opt-in) GitHub issue close-the-loop — if the experiment's
Sourceis a GitHub issue, ask viaAskUserQuestionwhether togh issue comment <N>with the headline. Never auto-post.
Stop here. Do NOT auto-propose the next experiment in the same
turn. Surface the implication, ask via AskUserQuestion:
- draft it now — re-enter § 1 with the implication as seed.
- not yet — record the implication in Backlog, stop.
The user controls cadence; this skill records, it doesn't propose-and-record in one breath.
Dispatch table — which iterate-from-* skill
| Situation | Action |
|---|---|
| No prior experiment (bootstrap) | § 0 forces auto-drafted baseline. No strategy skill |
User names a Backlog row (B2, "let's do B5") |
Promote directly; no strategy skill |
| "mine the report" / "what does skore see?" | iterate-from-skore — enriches Backlog, re-shows menu. No design note this turn. |
| "I want to try X" / article URL / GitHub issue | iterate-from-user — three-branch ask. If free-text already resolved, pass pre-resolved branch |
| "give me ideas" / "you decide" | my-pick — handled inline. Synthesize 2–4 candidates, AskUserQuestion |
| Open-ended "what's next?" with ≥1 recorded experiment | Present sourcing menu verbatim + Backlog. No silent default |
The strategy skills are intentionally shallow: they source, this
skill drafts. The skore strategy requires a prior experiment
with an on-disk report — bootstrap can't use it.
If iterate-from-skore returns zero candidates: append a
one-liner to JOURNAL Status (Audit checks clean on <stem> as of <date> or Audit digest inaccessible on <stem> as of <date>).
No History row. Re-present sourcing menu.
Maintenance modes — pointers
Each is read-only or rare. Full procedures in
references/maintenance_modes.md:
- Project overview — read-only summary from JOURNAL + Backlog. Don't generate a separate document.
- Compare past experiments — read-only. v1 is pairwise side-by-side. Don't draft a design note. Don't add JOURNAL rows.
- Goal pivots — update Status with date + reason, insert a horizontal divider in History, flag incomparability in the next experiment's Risks.
- Abandoned experiments —
AskUserQuestion(abandon/defer/run now). Status becomesabandonedwith one-line reason. - Re-runs — single (
NN_<stem>_rerun) or batch (NN_paired_comparison). New design note; original notes unchanged.
Files this skill owns
journal/
├── JOURNAL.md # status + history + backlog (index)
├── 01_baseline.md # design note for experiments/01_baseline.py
├── 02_<short_name>.md
└── …
Pairing rule (hard, four-way): journal/NN_<short_name>.md ↔
experiments/NN_<short_name>.py ↔
tests/smoke/test_NN_<short_name>.py ↔
audit/NN_<short_name>.py, identical stems, 1:1.
JOURNAL.md shape
- Status — 2-3 lines: dataset, goal, last experiment + status.
- Data understanding (EDA) — short summary + link to
data/eda.md. Owned byexplore-ml-data(written at the G-EDA bootstrap step); this skill only reserves the section. - History (chronological) — one row per experiment: stem, intent, status, headline, design-note link.
- Backlog (forward-looking) — indexed table; columns
#,Item,Source(skore:<stem>/my-pick:<stem>/user).
Template: templates/JOURNAL.md. These four are the only sanctioned
sections — don't invent others.
Per-experiment design-note shape
Template: templates/experiment_design.md. Sections:
- Question / hypothesis — one sentence.
- Motivation — pulled from sourcing strategy; cite concretely.
- Method — what changes vs. previous, in prose. Mechanics live
in
build-ml-pipeline/evaluate-ml-pipeline. - Risks — what would make the metric move for the wrong reason.
- Status block —
planned→approved→done | abandoned.
No "Success criteria" section. The user judges post-run.
What this skill does NOT do
- Run experiments (user / runner does that).
- Explore / profile the dataset (
explore-ml-dataowns the G-EDA step and the## Data understanding (EDA)section). - Open or query the skore Project (
evaluate-ml-pipeline+python-api). - Edit
pipeline.py/features.py/data.py(build-ml-pipeline). - Decide whether a workspace exists or where things go
(
organize-ml-workspace). - Write commits / PRs.
- Define what counts as a successful experiment.
- Pick a sourcing strategy on the user's behalf.
Companion skills
| Skill | Relationship |
|---|---|
organize-ml-workspace |
Scaffold + stem-pairing rule |
explore-ml-data |
§ 0 fires G-EDA before the baseline; the EDA findings seed the baseline note's Method / Risks and the ## Data understanding (EDA) JOURNAL section |
iterate-from-user |
User-sourced proposals (article / resource / free text) |
iterate-from-skore |
Report-sourced Backlog enrichment |
build-ml-pipeline |
pipeline.py / features.py / data.py body; reproducibility mechanics |
evaluate-ml-pipeline |
evaluate.py body; CV-strategy decision; report inspection |
test-ml-pipeline → smoke-test-ml-pipeline |
Smoke-test body; § 4 won't flip done until smoke is green |
audit-ml-pipeline |
§ 4 dispatch; audit digest carries the headline metrics for the JOURNAL row |
python-api |
Signature lookups |
python-env-manager |
G-AGENT-FEATURE for audit AND explore-ml-data (EDA) prerequisites |
References (load on demand)
references/bootstrap.md— full bootstrap procedure, config-gate details, baseline-template substitution.references/record_outcome.md— full § 4 procedure with Backlog hygiene examples, GitHub comment template.references/maintenance_modes.md— overview / compare / goal-pivot / abandoned / re-runs with full procedures.references/preflight_evidence.md— Evidence-format spec.
Templates
templates/JOURNAL.md— four-section index skeleton.templates/experiment_design.md— design note with Status block.
Copy, don't rewrite.