name: research-report
description: Create or revise a structured markdown research or experiment report with integrated plots, optional literature/reference support, plot manifest tracking, report version history, and report-body validation inside a single Harness without external Codex or Gemini calls. Use when you need to generate report.md, inventory or validate plots, create plots/plot_manifest.json, manage report_versions.json, connect report sections to supporting references, or turn an experiment or output directory into a reusable report workflow.
Research Report
Use this skill to produce a self-contained research or experiment report from an output directory that may contain notes, source code, tests, metrics, tables, and plots. The skill is harness-only: it never calls external Gemini or Codex MCP tools. Where the magi-researchers pipeline uses BALTHASAR (Gemini) and CASPER (Codex) reviewers, this skill spawns Claude subagents with cognitive-style framing instead.
Inputs to confirm
Ask for only what is missing:
- target output directory
- report title
- domain or audience (used to load
references/domains/<domain>.md; default:general) - whether existing plots already exist in
plots/ - whether a lightweight plot metadata file should be supplied for better captions or section mapping
- whether section-level reference support is needed for background, methodology, baseline comparisons, benchmark context, or claim-heavy passages
Expected directory shape
The skill works best when the target directory looks roughly like this:
{output_dir}/
report.md
report_versions.json
plots/
*.png
*.pdf | *.svg
plot_manifest.json
_plot_style.py # copy of the helper used by every plot script
*.py # plot generation scripts
data/<stem>.csv # raw data consumed by each script
src/
tests/
notes/ | brainstorm/ | plan/ | results/ | tables/
references/ | bib/ | related_work/
Missing folders are acceptable. Adapt the report to whatever evidence actually exists.
Report workflow conventions
When tailoring this skill to a project that already uses outputs/ directories:
- Prefer a self-contained target like
outputs/{report_slug}/. - Keep
report.md,report_versions.json, andplots/plot_manifest.jsonin the same report root. - Copy or regenerate only artifacts that belong to the current report narrative. Do not mix unrelated experiment outputs.
- Treat
report_v{N}.mdfiles as immutable archives once versioned. - Keep plot paths relative to the report root so the directory can move without edits.
- Reuse existing project metrics, CSV or JSON summaries, tables, and previous report drafts before creating new artifacts.
Single-Harness workflow
The workflow has nine ordered steps. Steps marked (gate) must pass before continuing.
Step 1 — Gather materials
- Inventory
src/,tests/, notes, metrics, tables, any priorreport.mdorreport_v*.md, and any existingreferences/,bib/, or related-work notes. - Read the report template in
references/report_template.md. - Read the domain template at
references/domains/<domain>.mdfor tone and visualization conventions. Fall back toreferences/domains/general.mdwhen the domain is unknown. - If
plots/exists, build or refreshplots/plot_manifest.json:
Passpython skills/research-report/scripts/build_plot_manifest.py \ "{output_dir}/plots" \ --report-root "{output_dir}"--metadata path/to/plot_metadata.jsonfor richer captions or section hints.
Step 2 — Plot-style pre-flight (gate)
Before drafting prose, audit every plot generation script for compliance with the shared style helper. This is the local equivalent of magi's BALTHASAR/CASPER style-compliance check, but enforced statically over scripts instead of begging an LLM to spot regressions.
- Ensure
plots/_plot_style.pyexists. If the project does not have it yet, copy it fromassets/plot_templates/_plot_style.pysofrom _plot_style import ...resolves. - Run the plot-script auditor:
The auditor flags:python skills/research-report/scripts/validate_plot_scripts.py "{output_dir}" --json- scripts that import
matplotlibbut do not import_plot_styleorscienceplots, - scripts that call
plt.style.use(['science', ...])without'no-latex'while also settingtext.usetex=False(silent-fallback bug, pitfall #21), - scripts that override
font.familyorfont.sizeafterapply_style(), - scripts that call
plt.savefig(...)only as PNG (PDF/SVG missing), - scripts that hardcode
dpibelow 300, - scripts that fail to call
assert_english(...)on label/title strings.
- scripts that import
- Resolve every error before continuing. Warnings should be fixed or explicitly justified in the report.
- If a plot is non-compliant, regenerate it via the
assets/plot_templates/*.pyfamily and re-runbuild_plot_manifest.py. - Run the JSON-artifact validator afterwards:
Treat errors as blockers. Treat warnings as items to fix or explicitly mention in the report.python skills/research-report/scripts/validate_artifacts.py "{output_dir}" --json
Step 3 — Map evidence to sections
- Background: problem, context, assumptions, prior notes, and literature context.
- Analysis or discovery summary: exploratory findings, trade-offs, or problem framing.
- Methodology: approach, algorithms, data flow, experimental design, and method citations when external grounding matters.
- Implementation or setup: architecture, components, environment, constraints.
- Results and visualization: quantitative outcomes, comparisons, plots, tables, and baseline or benchmark context when needed.
- Validation: tests, checks, edge cases, failure modes, limitations, and benchmark-protocol references when they clarify interpretation.
- Conclusion: contributions, limitations, next steps.
Plot → section mapping uses the manifest's section_hint:
section_hint |
Report section |
|---|---|
methodology |
§3 Methodology |
results |
§5.1 Primary Results |
comparison |
§5.2 Comparative or Ablation Findings |
validation |
§6 Validation |
testing |
§6 Validation |
Step 4 — Attach literature support where needed
- Use the companion
reference-searchskill whenever a section depends on prior work, external benchmark framing, method lineage, or an externally grounded claim that cannot be supported by local artifacts alone. - Typical mappings:
- background or introduction →
backgroundorsurvey - methodology rationale →
method - comparison targets →
baseline - dataset, metric, or benchmark protocol context →
evaluation - standalone factual claims →
claim-support
- background or introduction →
- Prefer weaving only the strongest 2–5 references into the report prose instead of dumping long bibliographies.
- Only save section-level reference notes when the user asks for them or the project already maintains a
references/or similar folder.
Step 5 — Handle plots
- Reuse existing plots when they already support the narrative.
- Generate missing plots only from real data already present in the workspace.
- Preferred stack:
matplotlib+scienceplots+ the shared helper atassets/plot_templates/_plot_style.py. Copy the helper into the project'splots/directory next to any template script you adapt sofrom _plot_style import ...resolves. - Plot text must be English. Korean or other non-ASCII characters in axis labels, titles, legends, or tick labels typically render as missing-glyph boxes (
□) and the failure is silent. Move localized commentary to the report caption. The sharedassert_english(...)helper enforces this at runtime; call it on every user-facing string before plotting. - LaTeX
%rule. Whentext.usetex=True,%starts a comment and silently truncates the rest of the string ("95%"becomes"95"). The sharedapply_style()keepstext.usetex=Truewhenever LaTeX is usable (soscience/naturerender correctly) and falls back to scienceplots'no-latexstyle only when LaTeX is unavailable. Always route user-controlled strings throughlatex_escape(...)(covers% & # _ $ { }) before passing them to axis labels, legends, or titles. - Output formats: PNG at
dpi=300plus PDF (vector). SVG is an acceptable PDF substitute when PDF is impractical. The sharedsave_figure()enforces this and always closes the figure. - LaTeX is the default rendering path for
scienceandnaturestyles.apply_style()defaults touse_latex=Trueand probes forlatex+dvipngat runtime. If LaTeX is unavailable, the helper automatically appends scienceplots'no-latexstyle modifier and emits aRuntimeWarningso you know the rendering downgraded. Never callplt.style.use(['science'])and then settext.usetex=Falseby hand — that combination silently substitutes DejaVu Sans for Times and the resulting figures are indistinguishable from default matplotlib (scienceplots is loaded but invisible). Always go throughapply_style()so the LaTeX/no-LaTeX decision is made coherently. - Recording the decision:
apply_style()returns a dict containinglatex_active,latex_probe_error, andscienceplots_available. Persist these intoplot_manifest.json(or per-plot metadata) so downstream readers can tell which rendering mode produced the figure. - PDF font embedding: the shared helper sets
pdf.fonttype=42andps.fonttype=42so PDFs embed TrueType fonts. Type-3 fonts are rejected by many journals. - Color palette: Okabe-Ito (colorblind-safe) by default;
TAB10is available as a fallback constant. The cycle is unified across all templates so the same series gets the same color across plots in a report. When series ordering varies between plots, pin colors with an explicitseries -> colordict. - Figure size standards (constants in
_plot_style.py):FIGSIZE_SINGLE= (3.5, 2.6) — single-column journal figure (Nature single-column = 3.5 in)FIGSIZE_ONE_HALF= (5.0, 3.2) — 1.5-columnFIGSIZE_DOUBLE= (7.0, 3.6) — full-width / double-column (Nature double-column = 7.2 in)FIGSIZE_PANEL_WIDE= (7.2, 2.8) — 2-up panel base height
- In-figure title vs. caption: journal-style figures usually keep the title in the caption only. Templates include
ax.set_title(...)for convenience; remove it (or leave empty) when the report caption already states the same thing. - Raw-data contract: save the CSV consumed by each plot script to
plots/data/<stem>.csvand record the path inplot_metadata.source_context. Scripts should be reproducible from a single CSV input. - Keep filenames stable and descriptive; rebuild
plot_manifest.jsonafter any plot change. - Use the templates in
assets/plot_templates/as starting points for learning curves, grouped comparisons, or multi-panel ablations.
Common plot pitfalls to prevent
The most frequent silent failures when generating research plots. The shared _plot_style.py helper guards against most of these; the rest belong to drafting discipline.
| # | Pitfall | Why it bites | Prevention |
|---|---|---|---|
| 1 | Unescaped % under text.usetex=True |
% is a LaTeX comment; "95%" silently becomes "95" |
Default use_latex=False; otherwise wrap text with latex_escape() |
| 2 | Other unescaped LaTeX specials (& # _ $ { }) |
Same class as #1 — silent or noisy errors | latex_escape() covers all of them |
| 3 | Korean / CJK in axis labels, legends, titles | Glyph fallback to □, no warning |
English-only rule + assert_english() runtime check |
| 4 | Type-3 fonts in saved PDFs | Many journals reject; reviewers cannot select text | pdf.fonttype=42, ps.fonttype=42 set in apply_style() |
| 5 | Non-colorblind-safe palette (red+green) | ~8% of male readers cannot distinguish | Okabe-Ito default in apply_style() |
| 6 | Inconsistent series-to-color mapping across plots | "Series A" is blue in fig 1 but red in fig 2 | Single shared cycle; pin colors via dict when ordering varies |
| 7 | legend(loc="best") on dense plots |
Non-deterministic placement, layout shifts run-to-run | Pick an explicit loc=... for production figures |
| 8 | tight_layout() clipping suptitle |
suptitle gets cut off | constrained_layout=True or subplots_adjust(top=0.85) |
| 9 | Forgetting plt.close(fig) in batch generation |
Memory leak; "RuntimeWarning: figures retained" | save_figure() always closes |
| 10 | np.log of zero/negative on log axis |
Silent NaN, missing data | Validate inputs; use symlog if zero is meaningful |
| 11 | Categorical x-axis without set_xticklabels |
Numeric tick labels appear instead of names | Always call set_xticks and set_xticklabels together |
| 12 | Grid drawn over data | Distracting; data legibility hurt | axes.axisbelow=True (set in shared helper) |
| 13 | plt.show() in scripts run by automation |
Blocks pipelines; CI hangs | Templates never call plt.show() |
| 14 | .savefig() after plt.close() (or vice versa) |
Empty/black PDFs | save_figure() enforces correct order |
| 15 | 96 dpi PNG used for print | Pixelation in printed reports | savefig.dpi=300 enforced |
| 16 | Hardcoded font (Times New Roman, etc.) not installed |
Silent fallback to DejaVu — different look from preview | Do not override font.family per-script; use shared helper |
| 17 | transparent=True on a white-background figure |
Background drops, looks wrong on colored pages | Leave default white background |
| 18 | Mismatched DPI between PNG and PDF | Tick label spacing differs across formats | Set savefig.dpi once globally |
| 19 | Empty alt text in  |
Validator warns; accessibility regression | Always pass meaningful alt; copy from manifest caption |
| 20 | Plot manifest path drift after directory rename | Captions reference stale paths; build fails | Keep paths relative to report root; rebuild manifest after moves |
| 21 | science/nature styles WITHOUT LaTeX rendering |
scienceplots' science.mplstyle sets text.usetex: True AND font.family: serif. Override text.usetex=False and Times silently substitutes to DejaVu Sans on machines without Times — figures look like default matplotlib, scienceplots is invisible. |
apply_style() defaults to use_latex=True; auto-probes for latex + dvipng and appends scienceplots' no-latex style to the chain when LaTeX is missing. Never set text.usetex=False after plt.style.use(['science']) without also adding 'no-latex' to the chain. |
| 22 | Silent fallback when scienceplots is missing |
Final figures look unlike previews | Record scienceplots_available from apply_style() in plot metadata |
| 23 | Hardcoded font.family override after apply_style() |
Re-introduces pitfall #21 | Treat the helper output as authoritative; do not touch font.* rcParams in scripts |
Step 6 — Draft the report
- Use
references/report_template.mdas the starting structure. - Embed each plot inline near the paragraph that interprets it. Never write a "list of figures" appendix table — that is an anti-pattern.
- Never use passive references such as "as shown in the figure below" or "see Figure X" without an accompanying concrete quantitative observation in the same paragraph (e.g., specific deltas, percentages, R², slope, p-values, runtime). The figure earns its space by being interpreted.
- When a section depends on prior work or benchmark framing, use curated results from
reference-searchand mention only references you actually reviewed. - Avoid fake citation placeholders or generic "prior work shows" wording without concrete support.
- Avoid orphaned figures and unsupported quantitative claims.
- If a canonical section has no source material, rename or repurpose it instead of leaving a hollow placeholder.
- For dense reports, add
<!-- EVIDENCE BLOCK: ev-N -->markers near the paragraph that consumes evidence idev-N, and list the evidence inventory at the top of the report or inevidence/inventory.jsonso the validator can cross-check.
Report-body math conventions (LaTeX-only)
Every mathematical expression in report.md MUST use LaTeX. Unicode math symbols are not acceptable in the report body — they cause inconsistent rendering across PDF exporters (Typora, Pandoc, Marp, GitHub) and many journal stylesheets refuse to typeset them.
Inline math ($...$) — use for variable names, parameter values, complexity notation, short expressions. Examples:
| Concept | Wrong (Unicode) | Right (LaTeX) |
|---|---|---|
| Greek | α = 0.05, σ₁, λ, θ̂ |
$\alpha = 0.05$, $\sigma_1$, $\lambda$, $\hat{\theta}$ |
| Number sets | ℝⁿ, ℕ, ℤ, ℂ |
$\mathbb{R}^n$, $\mathbb{N}$, $\mathbb{Z}$, $\mathbb{C}$ |
| Operators | ≈, ≤, ≥, ≠, ≪, ≫, ±, ×, ·, ÷ |
$\approx$, $\leq$, $\geq$, $\neq$, $\ll$, $\gg$, $\pm$, $\times$, $\cdot$, $\div$ |
| Logic / sets | ∈, ∉, ⊂, ⊆, ∪, ∩, ∀, ∃ |
$\in$, $\notin$, $\subset$, $\subseteq$, $\cup$, $\cap$, $\forall$, $\exists$ |
| Sub/super | ², ³, ⁴, ⁿ, xᵢ, H₂O |
$^2$, $^3$, $^4$, $^n$, $x_i$, $\mathrm{H}_2\mathrm{O}$ |
| Arrows | →, ←, ↔, ⇒, ⇔ |
$\to$, $\leftarrow$, $\leftrightarrow$, $\Rightarrow$, $\Leftrightarrow$ |
| Calculus | ∂, ∇, ∫, ∑, ∏, √ |
$\partial$, $\nabla$, $\int$, $\sum$, $\prod$, $\sqrt{}$ |
| Infinity / Const. | ∞, π, ℏ, ° |
$\infty$, $\pi$, $\hbar$, $^{\circ}$ |
Display math ($$...$$) — use for key formulas, derivations, loss functions, and main results.
✘ Inline-style display equation (silently breaks Pandoc/Typora):
$$L = \frac{1}{N}\sum_i (y_i - \hat{y}_i)^2$$
✓ Display equation on its own lines (correct):
$$
L = \frac{1}{N}\sum_i (y_i - \hat{y}_i)^2
$$
Always:
- Put
$$on a line by itself, with one blank line above and below the block. - Use
\text{...}for textual labels inside math ($L_{\text{train}}$, not$L_{train}$, which renders as $L \cdot t \cdot r \cdot a \cdot i \cdot n$). - Use
\,,\;,\quadto control spacing inside math; never insert literal spaces or unicode whitespace. - Use
\mathrm{}for upright multi-letter operators (e.g.,$\mathrm{erf}(x)$,$\mathrm{KL}(p \,\|\, q)$).
The validator (validate_artifacts.py) flags single-line $$..$$ and unicode math characters in the report body as errors.
Step 7 — Coverage & gap-detection loop
After the first complete draft, walk every section and answer this checklist before presenting the draft.
| Question | If the answer is bad |
|---|---|
| Does every quantitative claim point to a number, a table cell, a CSV column, or a figure? | Add the missing artifact, weaken the claim, or remove the claim |
| Does every figure get at least one paragraph that quotes specific numbers from it? | Either add the interpretation, or remove the figure |
| Does the methodology cite the canonical reference for any non-trivial method? | Use reference-search and weave the citation into the prose |
| Do comparisons report uncertainty (CI, std, error bars) or at least say why they don't? | Add error bars or an explicit caveat |
Could a reader regenerate every figure from plots/data/*.csv + the script in plots/? |
Restore the data file, point source_script at the right file, or remove the figure |
| Are limitations honest and specific (not "may not generalize")? | Rewrite with the specific failure modes you actually observed |
Gap-fill plot budget (per draft pass):
| Draft depth | Max new plots | Max iterations |
|---|---|---|
min |
2 | 1 |
default |
4 | 2 |
high |
8 | 3 |
If a gap requires data the workspace does not have, do not fabricate. Add an explicit caveat instead.
Step 8 — Dual-subagent traceability review
Spawn two Claude subagents simultaneously (single message, two Agent tool uses, subagent_type: general-purpose). They must run independently — neither sees the other's output. This is the harness-only equivalent of magi's BALTHASAR + CASPER pair.
Subagent A — Scientific Rigor (Creative-Divergent framing):
Prompt template: "Use the Read tool to read
{output_dir}/report.mdand{output_dir}/plots/plot_manifest.json. Review for claim–evidence integrity and identify, per issue, the section, the problematic text or figure, and a concrete fix.
- Orphaned claims — text assertions without a supporting figure, table, metric, or citation.
- Orphaned plots — figures embedded but never discussed.
- Weak links — a claim references a figure that does not actually support the claim.
- Caption quality — captions must be precise, quantitative, publication-ready (state what the figure shows in numbers, not just
Comparison of methods).- Math rendering — flag any unicode math symbol or single-line
$$..$$you encounter. Return structured text. Do not save to a file."
Subagent B — Visualization Quality (Analytical-Convergent framing):
Prompt template: "Use the Read tool to read
{output_dir}/report.md,{output_dir}/plots/plot_manifest.json, and any{output_dir}/plots/*.pyscripts. Review for visualization correctness and identify, per issue, the section, the figure, and a concrete fix.
- Missing visualizations — quantitative claims that would benefit from a chart but have none.
- Plot–narrative mismatch — caption or surrounding text does not match what the plot shows.
- Chart-type fixes — better encodings (bar→box, linear→log, grouped bars→error-bar dot plot) for clarity.
- Reproducibility gaps — plots without a
source_scriptorsource_contextin the manifest.- Style compliance — every script must import
_plot_style(or scienceplots) and callapply_style(); no per-scriptfont.*overrides; PNG@300dpi + PDF; Nature widths; Okabe-Ito palette. Return structured text. Do not save to a file."
After both reviews return, synthesize:
- Consensus issues (flagged by both subagents) → fix first.
- Divergent suggestions → evaluate on merit; apply when defensible.
- Apply revisions:
- orphaned claim → add a supporting plot/table/citation OR weaken the claim,
- orphaned plot → add an interpretation paragraph OR remove the figure,
- weak link → strengthen the connecting prose or replace with a more apt figure,
- chart-type fix → regenerate via
assets/plot_templates/*.py, - reproducibility gap → restore the missing CSV / source script.
- Re-run
validate_artifacts.pyandvalidate_plot_scripts.pyafter all edits.
Anti-consensus discipline. When both subagents agree on an issue, the fix MUST cite at least one independent piece of evidence — a concrete number, a specific section line, a named pitfall — not just "both reviewers agreed".
Step 9 — Versioning & finalization
- If
report_versions.jsonalready exists, archive the current report before overwriting:- read
current_version - copy
report.mdtoreport_v{current_version}.md
- read
- After writing the new
report.md, record the new version:python skills/research-report/scripts/record_report_version.py \ "{output_dir}" \ --summary "Summarize the update" \ --tier 1 --tiercorresponds to the feedback loop tiers below:1— wording, structure, captions, formatting2— plot/figure changes, scale/encoding swaps, manifest updates3— substantive methodology or result changes (re-run experiments, change baselines)
- Add structured change records with repeated
--change '{...json...}'arguments when useful. - Re-run the JSON-artifact validator and the plot-script auditor.
- Report file locations, plot count, validation findings, thin sections, missing figure references, sections that still need stronger literature support, and any known caveats.
User-feedback loop (Tier 1 / 2 / 3)
When the user reviews report.md and asks for changes, classify the request before applying it. The classification keywords mirror magi's tiered feedback loop.
| Tier | Signals | Action |
|---|---|---|
| 1 — Cosmetic | "reword", "rephrase", "move section", "fix typo", "shorten", "expand on", "rename", "reformat", "caption" | Edit report.md directly. Archive previous version, bump tier=1. |
| 2 — Visualization | "add plot", "change chart", "log scale", "bar chart instead", "overlay", "heatmap", "color", "axis", "resize figure", "add error bars" | Generate or modify plot via templates, rebuild manifest, re-run dual-subagent review on the affected sections only, archive previous version, bump tier=2. |
| 3 — Substantive | "rerun", "different method", "add experiment", "change algorithm", "new baseline", "fix the code", "wrong results" | This skill cannot resolve substantive changes alone. Tell the user which experiment / source / test must be re-executed and pause. Do not bump the version. |
If the request is mixed-tier, decompose: apply Tier 1 / Tier 2 first, then escalate Tier 3.
If a request does not clearly match any tier, ask the user to confirm the classification before acting.
Maximum 3 feedback iterations per entry into the loop without explicit user re-approval.
Anti-patterns (flag and fix on sight)
- Ending the report with a "List of Figures" or "Figure Inventory" table that re-lists already-embedded plots. Every figure should be embedded inline beside its interpretation.
- "As shown in the figure below" / "see Figure X" / "the plot illustrates" without a quantitative observation in the same paragraph. Replace with concrete numbers.
- Captions that name only the chart type (
Bar chart of metrics). Captions must state the takeaway in numbers (AdamW lowers val NLL from 0.94 to 0.71 at 200 steps; SGD plateaus at 1.02). - Display equations on a single line (
$$x = y$$inline). Always put$$on its own line with the equation between. - Unicode math characters in the report body (
σ,²,≈,→,ℝ,∈,±,°, ...). Use the LaTeX equivalents from the table above. - Hand-rolled
plt.style.use(['science', 'nature'])in plot scripts that bypassapply_style(). The hand-rolled call leaves Times-vs-DejaVu fallback unhandled (pitfall #21). - "Prior work shows that X" without a citation. Either cite or remove the claim.
- Section bodies that are only bullet points with no narrative. Each section needs at least one paragraph of prose so a reader can follow the argument without reverse-engineering bullets.
Plot metadata file contract
Use a JSON object keyed by plot stem or plot id when default inference is not enough.
{
"training_curves": {
"description": "Training and validation metrics across optimization steps",
"section_hint": "results",
"caption": "Validation loss stabilizes after the early rapid descent phase while the baseline remains consistently higher.",
"source_context": "metrics/train_history.csv",
"source_script": "plots/training_curves.py",
"source_function": "main",
"style": ["science", "nature"],
"palette": "okabe_ito",
"language": "en",
"dpi": 300
}
}
Template resources
references/report_template.md: baseline report structure aligned with the report workflow, with embedded<!-- EVIDENCE BLOCK -->and<!-- WORD_BUDGET -->hints.references/domains/<domain>.md: tone, methodology, and visualization conventions per domain (ai_ml,physics,statistics,mathematics,general).reference-search: companion skill for background references, method citations, baseline references, benchmark context, and claim-support searches while drafting the report.assets/plot_templates/_plot_style.py: shared styling helper (color palette, figure sizes, rcParams,apply_style,save_figure,assert_english,latex_escape). Copy into the project'splots/directory next to any template you adapt.assets/plot_templates/training_curves_template.py: line-plot template for learning curves or time-series diagnostics.assets/plot_templates/comparison_bars_template.py: grouped comparison template for model, dataset, or ablation summaries.assets/plot_templates/multi_panel_ablation_template.py: faceted multi-panel template for ablation or per-regime comparisons.assets/plot_templates/plot_metadata_template.json: starter metadata payload forbuild_plot_manifest.py --metadata.
Quality bar
- Prefer concrete quantitative statements over vague summaries.
- Keep every file path relative to the report root.
- Never fabricate data for missing plots.
- Preserve reproducibility hints: source script, source context, and generation timestamp.
- Keep
plot_idstable once published. - Plot text is English-only; localized commentary belongs in the report caption, not the figure.
- Every plot script goes through
apply_style()andsave_figure()from_plot_style.py— do not hand-roll style/save logic in new scripts. - Every math expression in
report.mdis LaTeX. Unicode math is a validator error. - Every figure is embedded inline next to its interpretation and accompanied by at least one quantitative observation.
When sections are missing
Adapt rather than apologizing:
- no brainstorming artifacts → rename to
Analysis SummaryorProblem Framing - no tests → use
Validation & Limitations - no implementation code → use
Experimental SetuporMaterials
Outputs
report.mdreport_versions.jsonplots/plot_manifest.json- optional archived reports:
report_v{N}.md - optional section-level reference support notes only when the user requests saved citation outputs or the project already uses a references directory
After creating or updating this skill, suggest starting a new session so the new skill is discoverable from session start.