ds-analysis-campaign - SKILL.md Agent Skill

name: ds-analysis-campaign description: Use when a quest needs one or more follow-up runs such as ablations, robustness checks, error analysis, or failure analysis after a main experiment. skill_role: stage license: MIT metadata: author: ResearAI/DeepScientist version: "1.0.0"

Analysis Campaign

Use this skill when one or more follow-up runs are needed and the quest needs a coordinated evidence campaign.

This is the shared DeepScientist protocol for supplementary experiments after a durable result. Use the same route for:

ordinary ablations / robustness / sensitivity work
review-driven evidence gaps
rebuttal-driven extra experiments
writing-driven evidence gaps

For paper-facing work, treat “analysis campaign” broadly:

not only post-hoc interpretation
also ablations, sensitivity checks, robustness checks, efficiency or cost checks, highlight-validation runs, and limitation-boundary work beyond the main result

Do not assume a writing-facing campaign means “analysis only”.

Do not invent a separate experiment system for those cases.

Interaction discipline

Follow the shared interaction contract injected by the system prompt.
For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
Hard execution rule: every terminal command in this stage must go through bash_exec; do not use any other terminal path for slice execution, smoke tests, Git, Python, package-manager, or file-inspection commands.
Prefer bash_exec for campaign slice commands so each run has a durable session id, quest-local log folder, and later read/list/kill control.
Keep ordinary subtask completions concise. When an analysis campaign or a stage-significant campaign checkpoint is complete, upgrade to a richer artifact.interact(kind='milestone', reply_mode='threaded', ...) report.
That richer campaign milestone report should normally cover: which slices completed, the main takeaway, whether the claim got stronger or weaker, and the exact recommended next route.
That richer milestone report is still normally non-blocking. If the post-campaign route is already clear, continue automatically after reporting instead of waiting for explicit acknowledgment.
If the active communication surface is QQ and QQ milestone media is enabled in config, prefer at most one aggregated campaign summary PNG on a meaningful campaign milestone.
That attachment should summarize the campaign as a whole; do not auto-send one image per slice.
Treat connector-facing campaign PNGs as report charts, not draft paper figures.
Preferred connector-chart palettes are Morandi-like and restrained:
- sage-clay: #E7E1D6, #B7A99A, #7F8F84 for the default aggregated campaign summary
- mist-stone: #F3EEE8, #D8D1C7, #8A9199 for conservative or uncertainty-heavy summaries
- dust-rose: #F2E9E6, #D8C3BC, #B88C8C only as a secondary accent when an extra comparison is necessary
Connector-facing campaign chart requirements:
- one campaign-level message, not a crowded slice dashboard
- low saturation and limited color count
- clear aggregation labels and direct comparison against the main run or baseline
- prefer one summary figure that communicates the boundary change honestly
Preferred campaign summaries are:
- point-range or bar summaries for slice-to-slice endpoint comparisons
- line plots only when the x-axis is truly ordered and comparable across slices
- small multiples instead of one rainbow figure when slices answer different questions
If a campaign view uses continuous color, keep it sequential for ordered magnitude and diverging only for signed deltas around a meaningful center.
Avoid rainbow / jet-like maps and decorative heatmaps when a simpler comparison plot would communicate the result better.
Keep the same muted palette semantics across the full campaign so the same color means the same role in every slice summary.
If a campaign figure is milestone-facing, paper-facing, or otherwise durable, open figure-polish/SKILL.md and complete its render-inspect-revise pass before treating the figure as final.
If plotting in Python, reuse the fixed Morandi plotting starter from the system prompt and keep the same palette discipline across the whole campaign.
If the runtime starts an auto-continue turn with no new user message, resume from the current campaign state and active requirements instead of replaying the previous user turn.
Progress message templates are references only. Adapt to the actual context and vary wording so messages feel human, respectful, and non-robotic.
If a threaded user reply arrives, interpret it relative to the latest campaign progress update before assuming the task changed completely.

Stage purpose

The analysis-campaign stage exists to test the strength, boundaries, and failure modes of a result. It preserves the core old DeepScientist analysis-experimenter discipline:

each analysis run should correspond to one clear question
campaign runs should stay isolated and comparable
negative results must remain visible
campaign-level conclusions should be aggregated explicitly

The campaign should behave like a disciplined evidence program, not an unstructured pile of extra runs.

For campaign prioritization and writing-facing slice design, read references/campaign-design.md. When the campaign is paper-facing and the mapping fields are not obvious, also read references/writing-facing-slice-examples.md.

Quick workflow

Treat this as the compressed campaign map. The authoritative slice protocol and aggregation rules remain in Workflow.

Bind the campaign to the parent run or idea and, when writing-facing, to the selected outline.
When the campaign is writing-facing, refresh paper/paper_experiment_matrix.* before freezing the slice frontier.
Before launching slices, create PLAN.md and CHECKLIST.md.
Use PLAN.md as the durable charter and CHECKLIST.md as the living execution surface while launching, monitoring, recording, and aggregating slices.
Run claim-critical slices first and smoke-test long slices before their real runs.
Revise the plan and matrix if slice feasibility, ordering, comparators, or campaign interpretation changes materially, and record every slice durably, including honest non-success states.
Close meaningful campaign milestones with a concise 1-2 sentence summary that says whether the claim gained stable support, partial support, contradiction, or unresolved ambiguity, what the matrix frontier now looks like, and what happens next.

Non-negotiable rules

Every analysis run must be code-based and fully automatable.
Do not introduce human evaluation or subjective assessment into a campaign.
Do not bring in a new dataset unless the quest scope explicitly changed.
Every analysis slice must have a specific research question and a falsifiable or at least decision-relevant expectation.
If the campaign is supporting a paper or paper-like report, do not launch it until a selected outline exists.
When a selected outline exists, every slice should map to a named research_question and experimental_design from that outline.
When the campaign is supporting a paper or paper-like report, do not launch or reorder the slice set without first reading paper/paper_experiment_matrix.md when it exists.
For writing-facing campaigns, every slice should correspond to a stable matrix row such as exp_id, not just a free-form note.
For writing-facing campaigns, every todo item must also carry section_id, item_id, claim_links, and paper_role; otherwise the slice is not paper-ready.
Do not aggregate campaign conclusions without per-run evidence.
Do not bury null or contradictory findings.

Use when

writing reveals evidence gaps
a main result needs ablations
robustness or sensitivity needs to be checked
a failure mode needs explanation
efficiency or environment variation matters to the claim

Do not use when

the quest still lacks a credible main run or accepted baseline
the next step is obviously another main experiment rather than follow-up evidence work

Preconditions and gate

Before launching a campaign, confirm:

the reference main run or accepted idea line
the claim or question being tested
the comparison target
the metric or observable of interest
the list of specific analysis questions
the current quest / user-provided assets that each planned slice will actually use
whether each slice is executable with the current assets, tooling, and available credentials
for paper-facing campaigns, the current paper experiment matrix frontier and which rows are actually feasible now
if durable state exposes active_baseline_metric_contract_json, read that JSON file before defining slice success criteria or comparison tables
treat active_baseline_metric_contract_json as the default baseline comparison contract unless a slice is explicitly testing a different evaluation contract

If the question list is fuzzy, sharpen it before running anything. Treat quest files, attached user assets, checkpoints, configs, extracted texts, baselines, and existing code paths as the first-choice asset pool. Do not design slices around hypothetical resources that the current system cannot actually access or run. If a slice cannot be executed with the current system, redesign it around available assets or explicitly report that the task cannot currently be completed. If infeasibility appears mid-run, attempt bounded recovery first; if still blocked, record the slice with a non-success status and explain why. If ids, active refs, or current quest state are unclear after restart, call artifact.get_quest_state(detail='summary') and artifact.resolve_runtime_refs(...) before launching or recording slices. If the exact quest brief / plan / status wording matters for campaign scope, call artifact.read_quest_documents(...). If earlier user instructions materially affect campaign scope or ordering, call artifact.get_conversation_context(...) before changing the slice set.

For concrete paper-facing cases:

if the slice is the only thing keeping a main-text section unsupported, make it main_required / main_text
if the slice is useful but non-blocking, make it appendix
if the slice is informative but not meant for the manuscript, keep it durable and mark it reference_only with a reason
after every completed paper-facing slice, verify the return path immediately:
- the matching outline result_table row is updated
- the section notes are updated when the outline folder exists
- paper/evidence_ledger.json reflects the new mapping
- the active paper line summary no longer treats that slice as missing

Do not leave a slice "completed" while the paper contract still looks stale.

Required plan and checklist

Before launching any real campaign slice, create a quest-visible PLAN.md and CHECKLIST.md.

Use references/campaign-plan-template.md as the canonical structure for PLAN.md.
Use references/campaign-checklist-template.md as the canonical structure for CHECKLIST.md.
PLAN.md is the durable campaign charter and should cover the claim under test, slice table, comparability boundary, available assets, required comparators, smoke and main-run strategy, monitoring and sleep rules, reporting expectations, and a revision log.
CHECKLIST.md is the living campaign execution list; update it during launch, asset preparation, slice execution, aggregation, and route changes.
If slice ordering, feasibility, required baselines, campaign interpretation, or the writing-facing outline mapping changes materially, revise PLAN.md before continuing.
The later charter report, slice artifacts, and aggregate report remain required, but PLAN.md and CHECKLIST.md should be the canonical campaign-control surface during execution.

Truth sources

Use:

main experiment artifacts
baseline artifacts
active_baseline_metric_contract_json when available
recent decisions and milestone reports
code and configs used in the accepted main line
actual analysis outputs and logs
bash_exec session ids and managed shell logs for campaign runs

Do not summarize a campaign from impressions alone.

Required durable outputs

A campaign should usually leave behind:

a campaign identifier
a selected outline reference when the campaign is writing-facing
a refreshed paper/paper_experiment_matrix.md
a refreshed paper/paper_experiment_matrix.json
one directory per analysis run
any supplementary baseline reproduced for analysis under baselines/local/<baseline_id>/ or attached under baselines/imported/<baseline_id>/
one quest-level supplementary baseline inventory at artifacts/baselines/analysis_inventory.json
one run artifact per analysis slice
one outline-bound todo manifest when the campaign is writing-facing
an aggregated campaign report
a decision about the next move

In the current runtime, represent that with existing artifact actions only:

one decision artifact with action='launch_analysis_campaign'
one charter report
one run artifact per slice
optional progress artifacts during execution
one aggregated report
one closing decision

Workflow

0. Launch the campaign durably

Before launching any slice, record the campaign start through artifacts:

write a decision artifact with:
- action='launch_analysis_campaign'
- campaign_id
- parent_run_id or parent_idea_id
- why the campaign is needed now
write a charter report with the planned slice list
update plan.md if the campaign materially changes the quest path

Do not start a multi-slice campaign from chat-only intent. Do not start it from chat-only intent plus vague notes either: write PLAN.md and CHECKLIST.md first, using references/campaign-plan-template.md and references/campaign-checklist-template.md as the default structures.

After the charter and launch decision are durably recorded, send one threaded artifact.interact(kind='milestone', ...) update naming:

why the campaign exists now
the claim-critical slices that will run first
the first thing the user should expect from the campaign
the first real checkpoint for the user
if the active surface is QQ, keep that campaign-launch milestone text-first unless a single summary image is already genuinely useful

0.1 Bind the campaign to the selected outline when writing-facing

If the campaign exists to support a paper or paper-like report:

do not proceed until one selected outline exists
if no selected outline exists yet, route to write or decision first so the outline can be created and selected durably
before deciding the slice list, create or refresh paper/paper_experiment_matrix.md when it is missing or stale
treat that matrix as the upstream paper experiment contract, not todo_items alone
use the matrix to decide:
- which rows are main_required
- which are main_optional
- which are appendix-only
- which are optional or should be dropped
do not start stable experiments-section drafting while currently feasible non-optional matrix rows remain unresolved
call artifact.create_analysis_campaign(...) with:
- selected_outline_ref
- research_questions
- experimental_designs
- todo_items
ensure each todo item names at least:
- exp_id
- todo_id
- slice_id
- title
- research_question
- experimental_design
- tier
- paper_placement
- completion_condition

For writing-facing campaigns, every slice should also carry paper-contract identity, not just free-form text:

section_id
item_id
claim_links
paper_role

Do not treat a completed analysis slice as paper-ready until those fields exist and the slice is mappable back into the selected outline or paper experiment matrix. Use references/writing-facing-slice-examples.md when the correct field values are not obvious.

This keeps the analysis campaign aligned with the paper plan instead of becoming a free-floating batch of slices.

1. Define the campaign charter

State:

campaign id
parent run or parent idea
main claim under test
list of analysis questions
what will be held fixed
what may vary

The charter should also include:

campaign type priority order
expected slice count
dependency structure between slices
the matrix path and current execution frontier
whether any slice requires isolated code changes or only reruns/config changes
the top-level success condition for ending the campaign
the top-level abandonment condition for stopping it early

Prefer to keep this charter in PLAN.md first and mirror the execution frontier in CHECKLIST.md.

For each analysis question, also state:

why it matters to the main claim
whether it exists mainly to support a core claim, validate a highlight, answer an efficiency or cost concern, or bound a limitation
what result would strengthen the claim
what result would weaken or complicate the claim
whether the run is:
- ablation
- robustness
- sensitivity
- error analysis
- efficiency
- environment variation

If there are many possible slices, order them by decision value:

most claim-critical ablation or contradiction check
strongest robustness or sensitivity checks
failure-mode explanation
efficiency or secondary supporting analyses

Do not spend half the campaign budget on secondary slices before the claim-critical ones run. When the parent line is still below solid evidence quality, use the campaign first to move it from minimum to solid before chasing broader polish.

2. Split into isolated analysis runs

Each analysis run should correspond to one need, such as:

remove one component
vary one hyperparameter family
run additional seeds
inspect one failure bucket
test one environment variation
measure one efficiency or cost dimension
validate one highlight hypothesis

Avoid changing many factors at once unless the campaign is explicitly exploratory.

For each slice, define at minimum:

research question
hypothesis or expected pattern
intervention
controls or fixed conditions
metric or observable
stop condition
evidence path expectations
required_baselines when the slice depends on an extra comparator that is not yet available in the quest

Recommended extra per-slice fields:

exp_id
slice_id
run_kind
slice_class, such as auxiliary, claim-carrying, or supporting
tier, such as main_required, main_optional, appendix, or optional
paper_placement
highlight_ids
required_baselines, where each item records at least baseline_id plus the reason, benchmark, and split when known

If a slice needs an extra comparator baseline:

reproduce it under baselines/local/<baseline_id>/ unless it is attached under baselines/imported/<baseline_id>/
keep the usual durable baseline notes there, including analysis_plan.md, setup.md, execution.md, and verification.md
do not overwrite the canonical quest baseline gate just because an analysis slice needed a supplementary baseline
after the comparator is ready, record it back through record_analysis_slice(..., comparison_baselines=[...]) with its baseline_id, path, benchmark/split, and metrics summary
parent_run_id
whether a code diff is required
whether an isolated branch/worktree is required
quantitative success criteria
quantitative abandonment criteria
contingency trigger for the next slice

Recommended run_kind naming in the current runtime:

analysis.ablation
analysis.robustness
analysis.sensitivity
analysis.error
analysis.efficiency
analysis.environment

Create the campaign with artifact.create_analysis_campaign(...) before starting any slice. Even one extra experiment should still be represented as a one-slice campaign so Git and Canvas show a real child node. Branch that campaign from the current workspace/result node rather than mutating the completed parent node in place. That tool should receive the full slice list, and each returned slice worktree becomes the required execution location for that slice. Only create the campaign after you have verified that the listed slices are actually executable with the current quest assets and runtime. When the campaign is writing-facing, the same call should also carry selected_outline_ref, research_questions, experimental_designs, and todo_items. If ids or refs are unclear, recover them first with artifact.resolve_runtime_refs(...), artifact.get_analysis_campaign(...), or artifact.list_paper_outlines(...) instead of guessing. Treat campaign_id as system-owned, and treat slice_id / todo_id as agent-authored semantic ids. Do not replace the normal campaign flow with repeated manual artifact.prepare_branch(...) calls. After each slice finishes, call artifact.record_analysis_slice(...) immediately so the result is mirrored back to the parent branch and the next slice can be activated. If a slice fails or becomes infeasible, still call artifact.record_analysis_slice(...) with an honest non-success status plus the real blocker and next recommendation; do not leave the campaign state ambiguous. After every completed, excluded, or blocked writing-facing slice:

reopen paper/paper_experiment_matrix.md
update the row status, feasibility, and result artifacts
update whether the row now belongs in main text, appendix, or omission
update the remaining execution frontier before choosing the next slice

Do not keep launching writing-facing slices from stale memory when the matrix has changed. For slice recording, deviations and evidence_paths are optional context fields, not mandatory ceremony; include them only when they materially help explanation or auditability. Each artifact.record_analysis_slice(...) call should also include an evaluation_summary with exactly these six fields:

takeaway
claim_update
baseline_relation
comparability
failure_mode
next_action

Use those six fields to keep each slice readable at a glance from Canvas, stage tabs, review, and rebuttal. The longer prose still matters, but the six-field summary is the stable routing summary.

For writing-facing campaigns, prefer running claim-carrying slices before supporting slices unless an auxiliary check is required to make the main slice interpretable.

For slices that run longer than a quick smoke check:

first run a bounded smoke test so the slice command, outputs, and metric path are validated cheaply
once the smoke test passes, launch the real slice with bash_exec(mode='detach', ...) and normally leave timeout_seconds unset for that long run
bash_exec(mode='read', id=...) returns the full rendered log when it is 2000 lines or fewer; for longer logs it returns the first 500 lines plus the last 1500 lines and a hint to inspect omitted sections with start and tail
if you need a middle section that was omitted from that default preview, use bash_exec(mode='read', id=..., start=..., tail=...)
monitor them with bash_exec(mode='list') and bash_exec(mode='read', id=..., tail_limit=..., order='desc')
after the first read, prefer bash_exec(mode='read', id=..., after_seq=last_seen_seq, tail_limit=..., order='asc') for incremental monitoring
if ids become unclear, recover them through bash_exec(mode='history')
launch long slices with a structured comment such as {stage, goal, action, expected_signal, next_check}
use silent_seconds, progress_age_seconds, signal_age_seconds, and watchdog_overdue from bash_exec(mode='list'|'read', ...) as the default stall checks
use an explicit wait-and-check cadence of about 60s, 120s, 300s, 600s, 1800s, then every 1800s while still running
if needed, use an explicit bounded wait such as bash_exec(command='sleep 60', mode='await', timeout_seconds=70) or bash_exec(mode='await', id=..., timeout_seconds=...) between checks
canonical sleep choice:
- if you only need wall-clock waiting between checks, use bash_exec(command='sleep N', mode='await', timeout_seconds=N+buffer, ...)
- keep a real buffer on that sleep timeout; do not set timeout_seconds exactly equal to N
- if you are waiting on an already running managed session, prefer bash_exec(mode='await', id=..., timeout_seconds=...) instead of starting a new sleep command
after the first meaningful signal and then at real checkpoints (e.g., completion, blocker, recovery, or a materially changed evidence frontier), send artifact.interact(kind='progress', ...) so the user sees the newest real state
after each completed sleep / await monitoring cycle for an active slice, inspect state first; only send another artifact.interact(kind='progress', ...) update if the user-visible state materially changed
include the estimated next reply time or next check time in those monitoring updates
stop them with bash_exec(mode='kill', id=..., wait=true, timeout_seconds=...) if the slice is invalid, wedged, or superseded; add force=true when immediate termination is required
when you control the slice code, prefer a throttled tqdm progress reporter and, when feasible, pair it with concise __DS_PROGRESS__ lines carrying phase and ETA
do not mark a slice complete until the managed log and outputs both confirm completion

3. Keep comparability

Comparability rules:

keep the same evaluation contract unless the variation is the point
when active_baseline_metric_contract_json exists, keep slice comparisons aligned with it unless the slice explicitly records why it differs
state exactly what changed
state exactly what stayed fixed
keep naming and output paths clean so multiple runs can coexist

For code-modifying slices, the default durable layout should stay interpretable:

working surface:
- .ds/worktrees/<slice_id>/ when isolated worktrees are used
experiment surface:
- experiments/analysis/<campaign_id>/<slice_id>/
artifact surface:
- artifacts/runs/<artifact_id>.json
- artifacts/reports/<artifact_id>.json

If the variation itself changes the evaluation setup, record that explicitly and do not present the run as a direct apples-to-apples comparison.

4. Record each analysis slice

Before a long slice starts, emit a progress artifact or artifact.interact(kind='progress', ...) update so the quest shows that the slice is active.

For each run, record:

analysis question
intervention
metric or qualitative evidence
whether the result strengthens, weakens, or complicates the claim
paths to the evidence

Preferred per-slice summary shape:

question
implementation change
main metric delta
interpretation
caveats

Each completed slice should also leave a run artifact containing at least:

campaign_id
slice_id
run_kind
parent_run_id
analysis_question
fixed_conditions
changed_factors
metrics_summary
metric_deltas
success_criteria
abandonment_criteria
verdict
reason
paths

If a slice fails before producing evidence, still record it as a failed or partial run artifact rather than silently skipping it.

When a slice materially changes the recommended route or weakens the main claim, do not wait until the final synthesis to mention it. Send a threaded artifact.interact(kind='milestone', ...) update at that point with the new boundary or risk.

5. Aggregate the campaign

The campaign report should explain:

which findings are stable
which findings are fragile
what changed the interpretation of the main result
which open questions still remain

Campaign reporting rules:

focus on the highest-impact findings first
results matter more than process narration
if using tables, show only the most decision-relevant rows
separate:
- stable support
- partial support
- contradiction
- unresolved ambiguity

When there are many slices, summarize the top 3-5 most important ones first, then point to the full evidence paths.

The aggregated report should also answer:

should the main claim be strengthened, weakened, narrowed, or abandoned?
which slice changed the interpretation most?
which slice is still worth rerunning, and why?
which planned slices were intentionally skipped because earlier results made them low value?

When the aggregated campaign report is complete, send a richer threaded artifact.interact(kind='milestone', ...) update. Lead that milestone with a concise 1-2 sentence campaign outcome summary before expanding into slice-level detail.

If QQ milestone media is enabled and the aggregated report materially changes the claim boundary, you may attach one campaign summary PNG to that closing milestone update. That update should explicitly classify the campaign outcome in the same language as the report:

stable support
partial support
contradiction
unresolved ambiguity

6. Route the next step

A campaign should end with an explicit next move:

continue the campaign
return to experiment
move to write
stop or reset the current line

Record the post-campaign route as a decision artifact. When helpful, include a reflection block with:

what_worked
what_failed
learned_constraints

and a next_direction block that states:

objective
key steps
success criteria
abandonment criteria

This makes the next stage executable without guesswork.

Analysis-quality rules

Good campaign behavior:

one clear question per run
one-factor-at-a-time changes when possible
clear comparison against the accepted reference line
visibility of null and negative findings
a logically ordered suite rather than a random batch

Strong campaign ordering usually looks like:

most claim-critical ablation or comparison
strongest robustness or sensitivity checks
failure-mode or error analysis
efficiency or secondary analysis

The exact order can vary, but the most claim-relevant evidence should appear first.

Weak campaign behavior:

hidden scope expansion
many untracked simultaneous changes
campaign summary without per-run evidence
ignoring contradictory analysis results
reporting every minor slice with equal weight instead of prioritizing the important ones

Memory rules

Stage-start requirement:

begin every analysis campaign pass with memory.list_recent(scope='quest', limit=5)
then run at least one analysis-relevant memory.search(...) before launching or resuming slices
if several campaigns, parent runs, or idea lines exist, narrow retrieval to the current campaign_id, parent_run_id, idea_id, or branch instead of mixing unrelated slice memory

Write to memory only when the campaign yields reusable lessons, such as:

robust failure patterns
evaluation caveats
reproducible sensitivity findings

Stage-end requirement:

if the campaign produced a durable cross-slice lesson, failure pattern, or comparability caveat, write at least one memory.write(...) before leaving the stage

The campaign’s main record belongs in run artifacts and the aggregated report. When synthesizing the campaign, read the per-slice evaluation_summary fields first, then expand into longer evidence only where the short summaries are still ambiguous.

Artifact rules

Typical artifact sequence:

decision artifact to launch the campaign
report artifact for the charter
progress artifacts during long campaigns
run artifacts per analysis slice
report artifact for the aggregated campaign summary
decision artifact for the next anchor

Failure and blocked handling

Record blocked or failed campaign states explicitly, such as:

missing parent run
analysis question under-specified
campaign run failed before evidence was produced
metrics not comparable
campaign conclusion still ambiguous

A blocked campaign should still name the next best action.

Exit criteria

Exit the analysis-campaign stage once one of the following is durably true:

the campaign produced enough evidence for writing or decision-making
the campaign exposed a problem that requires returning to experiment or idea
the campaign is blocked and the blocker is durably recorded