name: paper-outline description: Use when creating, revising, validating, or repairing a research-paper outline before writing; turns experiment evidence into a clear paper idea, scoped claims, method abstraction, evaluation plan, analysis plan, and evidence boundaries without copying run logs into the manuscript. skill_role: companion
Paper Outline
Use this before write when the outline feels like a run log, result dump, engineering note, or group-meeting report instead of a paper plan.
One-Sentence Summary
Keep one selected outline, but split two views:
paper_view: what the paper will say to readers.evidence_view: where the exact runs, paths, rows, settings, and reproducibility details live.
The paper should be faithful to the actual evidence, but it should not repeat the agent workflow.
Basic Workflow
- Read the current paper state.
Use
artifact.get_paper_contract(detail='full'),artifact.list_paper_outlines(...), and thenartifact.validate_academic_outline(detail='full')if an outline exists. - Find the one-sentence paper idea. Ask: "What should a researcher remember after reading this paper?" This is not a metric row and not an implementation setting.
- Separate facts from interpretation. Facts are measured results. Interpretations are the careful academic lesson supported by those facts. Unsupported claims go into "must not claim."
- Write or repair
paper_view. Fill the paper idea, problem/gap/method/result/limit, 1-3 scoped claims, method intuition, evaluation plan, and 4-8 useful analysis jobs. - Keep engineering details out of the story.
Put ports, worktrees, batch shorthand, route decisions, user requests, artifact ids, exact file paths, and local commands into
evidence_viewor appendix-only reproducibility fields. - Validate and compile.
Run
artifact.validate_academic_outline(detail='full'). If it passes, runartifact.compile_outline_to_writing_plan(detail='full').
What Good Means
A good outline does three things:
- It has a point: one clear claim or lesson, not a list of what the agent did.
- It is honest: every claim is tied to durable evidence, and limits are explicit.
- It is useful to a reader: the method and analyses teach something beyond "this setup got a number."
Strong papers often start from simple code but make a useful idea legible. Residual connections are more than a code shortcut; the paper teaches how to make depth trainable. Attention is more than a module; the paper teaches how to remove a bottleneck. Do the same only when the quest evidence supports that kind of interpretation.
Mature Outline Reminder
A mature paper outline is not just a section list. For paper_type: full_empirical and outline_maturity: mature, surface reminders when these are missing:
- a central thesis and a central insight that are reader-facing, not just metric summaries
- an
insight_laddershowing how observed facts become allowed interpretations - 1-3 scoped claims, each with
evidence_neededandwhat_would_falsify_it - a closest-neighbor / novelty boundary explaining what the paper is and is not claiming against prior or obvious alternatives
- at least three likely reviewer objections, each mapped to planned evidence, manuscript revision, claim downgrade, or accepted limitation
- 4-8 reviewer-facing analysis jobs beyond the headline result unless an explicit analysis-budget waiver downgrades the paper scope
Analysis quantity has two reminder levels:
paper_view.analysis_plan: normally 4-8 planned analysis jobs for a mature empirical paper.- paper-facing evidence package: normally 5-10 ready experiment/analysis groups total before treating the manuscript as strong. If the user specifies a number such as 4-8 analyses, track that target visibly until completed, waived, or explicitly downgraded.
Required Shape
Use this inside artifact.submit_paper_outline(..., detailed_outline={...}).
{
"paper_view": {
"paper_type": "full_empirical",
"outline_maturity": "mature",
"working_title": "Paper-native title",
"narrative_strategy": {
"central_thesis": "The one idea the paper wants readers to remember",
"central_insight": "The reusable lesson suggested by the evidence",
"reader_takeaway": "What another researcher can learn or reuse"
},
"insight_ladder": [
{
"level": "Observed fact -> interpretation",
"statement": "What this fact teaches",
"evidence": ["main-result-id"],
"claim_links": ["C1"],
"risk": "What could make the interpretation too strong"
}
],
"story_spine": {
"problem": "What scientific problem exists?",
"gap": "What prior/easy approach fails to address?",
"method": "What abstract method is introduced?",
"main_result": "What measured result supports the claim?",
"scope_limit": "Where the claim stops"
},
"positioning": {
"closest_neighbor": "The closest existing method, baseline, or obvious alternative",
"novelty_boundary": "Exactly what is new or reusable here",
"not_claiming": ["Claims this paper does not make"]
},
"core_claims": [
{
"claim_id": "C1",
"claim": "A scoped claim, not a section summary",
"scope": "Dataset/model/setting boundary",
"evidence_needed": ["main-result-id", "analysis-id"],
"what_would_falsify_it": "A result pattern that would weaken the claim"
}
],
"method_abstraction": {
"paper_name": "Method name if stable",
"intuition": "Why the method should work",
"mechanism_steps": ["Step 1", "Step 2", "Step 3"],
"appendix_only_details": ["local serving topology", "exact batch/query budget"]
},
"evaluation_plan": {
"setting": "The scientific evaluation setting",
"datasets_or_benchmarks": [],
"baselines": [],
"metrics": [],
"controlled_factors": []
},
"analysis_plan": [
{
"analysis_id": "A1",
"title": "Component ablation",
"analysis_role": "component ablation",
"reviewer_question": "Does the claimed mechanism actually cause the gain?",
"claim_links": ["C1"],
"target_display": "Main-text ablation table",
"main_or_appendix": "main_text",
"failure_interpretation": "How the claim should change if this fails"
}
],
"reviewer_objections": [
{
"objection": "Why a skeptical reviewer might reject or downgrade the paper",
"answer_route": "analysis | writing | claim_downgrade | limitation",
"linked_claims": ["C1"],
"needed_evidence": ["analysis-id"]
}
],
"evidence_grounding": {
"observed_facts": ["Facts directly visible in durable results"],
"allowed_interpretations": ["Careful interpretations allowed by the facts"],
"must_not_claim": ["Claims the paper must avoid"],
"evidence_gaps": ["Missing checks or unresolved risks"]
}
},
"evidence_view": {
"claim_to_items": [],
"sections": [],
"unmapped_items": [],
"appendix_reproducibility": []
}
}
The field names are machine-facing. The thinking should stay simple:
central_thesis: one-sentence paper idea.central_insight: what readers learn.story_spine: problem -> gap -> method -> result -> limit.evidence_grounding: facts, allowed interpretations, and things not to claim.analysis_plan: the checks a reviewer would ask for.
Analysis Plan
A mature empirical paper usually needs 4-8 analysis jobs beyond the main result. Choose them because they support the story, not because of a fixed checklist.
Useful analysis roles:
- component ablation
- robustness or sensitivity
- stronger-baseline comparison
- subgroup or case breakdown
- failure taxonomy
- mechanism or attribution check
- cost, budget, or efficiency tradeoff
- limitation or residual headroom analysis
If there are fewer than 4, mark outline_maturity: "idea_seed" or provide analysis_budget_waiver with a real reason.
Bad To Good Examples
Bad:
- "The abstract reports dual ports and 64+64."
Good:
- "All methods are compared under the same evidence budget; the exact serving setup is appendix-only."
Bad:
- "The latest route selected outline-008 and reran opposite-port probes."
Good:
- "The method performs an independent evidence pass and updates a decision only when the new support satisfies preset checks."
Bad:
- "Section 3 reports all experiments and Section 4 reports more experiments."
Good:
- "The main result tests whether the method improves the target task. The analyses then ask why: whether the gain comes from the proposed component, whether it survives stronger baselines, where it fails, and what budget it costs."
Bad:
- "We did only two follow-up analyses because those were the latest completed runs."
Good:
- "The outline plans six follow-ups: ablation, stronger baseline, sensitivity, failure taxonomy, subgroup breakdown, and cost. If only two can be run, the paper is marked early/narrow instead of mature."
Validation
Before handing to write, check:
artifact.validate_academic_outline(detail='full')passes.- The paper has one clear idea and 1-3 scoped claims.
- If the outline is mature/full-empirical,
insight_ladder, novelty boundary, reviewer objections, claim falsification criteria, and analysis-count reminders are present or explicitly waived. - The outline says what was observed, what can be interpreted, and what must not be claimed.
- The analysis plan has 4-8 useful jobs, or a waiver.
- Main-text experiment/analysis item ids are checked for stale duplicates that inflate evidence count.
paper_viewdoes not mention quest, worktree, selected outline, route history, user requests, ports, or64+64.- Exact engineering details are in
evidence_viewor appendix-only fields.
Read references/outline-patterns.md when you need more examples.