paper-banana - SKILL.md Agent Skill

name: paper-banana description: > This skill should be used when the user asks to "generate a scientific diagram", "create an academic illustration", "make a figure for my paper", "generate a plot for my paper", "create a publication-ready diagram", "illustrate my methodology", "visualize my paper's architecture", "draw a pipeline diagram", "visualize my method", or "create a chart for my research". Also triggers when the user mentions "PaperBanana", "paper figure", "NeurIPS diagram", "conference figure", or "academic figure".

PaperBanana: Academic Illustration Generator

Generate publication-ready scientific diagrams and statistical plots from paper methodology sections and figure captions using a multi-agent pipeline.

Overview

PaperBanana adapts the PaperVizAgent framework into a Claude-orchestrated pipeline. Claude handles the text reasoning roles (planning, styling, critiquing) while a Python script calls the Gemini API for native image generation (Nano Banana).

The pipeline follows five stages: Retrieve (optional) -> Plan -> Style -> Visualize -> Critique (iterative).

Prerequisites

Ensure the following before starting:

Python packages: pip install google-genai Pillow
API key: Set GOOGLE_API_KEY env var (free from https://aistudio.google.com/apikey, no credit card needed, 500 images/day)
Dataset (optional): Run python ${CLAUDE_PLUGIN_ROOT}/scripts/paper_banana.py setup to download PaperBananaBench for reference-driven generation

Core Workflow

Step 1: Gather Input

Collect two pieces of information from the user:

Methodology text: The paper's method section describing the approach
Figure caption: What the figure should depict (e.g., "Figure 1: Overview of our framework")

Also determine:

Task type: diagram (architectural/framework figures) or plot (statistical charts)
Aspect ratio: 1:1, 3:2, 16:9, or 21:9 (default: 16:9 for diagrams, 1:1 for plots)

Step 2: Retrieve References (Optional)

If the PaperBananaBench dataset is available, retrieve relevant reference examples:

python ${CLAUDE_PLUGIN_ROOT}/scripts/paper_banana.py retrieve \
  --task diagram \
  --content "methodology text here" \
  --intent "Figure 1: caption here" \
  --data-dir ./data \
  --output refs.json

Read the returned reference images using the Read tool to use as in-context examples for the planning step.

Step 3: Plan (Claude as Planner Agent)

Generate a detailed figure description. This is the most critical step.

For diagrams, produce a description that covers:

Every visual element and their connections
Background style (typically pure white or very light pastel)
Colors with specific hex codes, line thickness, icon styles
Layout direction (typically left-to-right flow)
Typography choices (sans-serif for labels, serif for math variables)
Do NOT include figure titles/captions in the description

Use this system context for planning:

Given the methodology section and figure caption, produce a detailed description of an illustrative diagram. The description must be extremely detailed: semantically describe each element and their connections; formally specify background style, colors, line thickness, icon styles. Vague specifications produce worse figures.

For plots, the description must include:

Precise mapping of variables to visual channels (x, y, hue)
Every raw data point's coordinates
Exact aesthetic parameters: specific HEX color codes, font sizes, line widths, marker dimensions, legend placement, grid styles

If reference examples were retrieved, use them as few-shot examples to guide the description style and level of detail.

Step 4: Style (Claude as Stylist Agent)

Refine the planned description using the NeurIPS style guidelines.

Read the appropriate style guide:

Diagrams: ${CLAUDE_PLUGIN_ROOT}/skills/paper-banana/references/neurips2025-diagram-style-guide.md
Plots: ${CLAUDE_PLUGIN_ROOT}/skills/paper-banana/references/neurips2025-plot-style-guide.md

Apply these styling rules:

Preserve semantic content: do not alter logic, structure, or data
Preserve existing high-quality aesthetics: only intervene when the description lacks detail or looks outdated
Respect domain diversity: agent papers use illustrative styles, CV papers use spatial styles, theory papers use minimalist styles
Enrich missing details: add specific colors, fonts, line styles from the guidelines
Handle icons carefully: snowflake = frozen/non-trainable, flame = trainable; verify intent before changing

Output only the refined description with no commentary.

Step 5: Visualize (Script)

Save the styled description to a temp file and generate the image:

For diagrams:

python ${CLAUDE_PLUGIN_ROOT}/scripts/paper_banana.py generate \
  --description-file /tmp/description.txt \
  --output figure.png \
  --aspect-ratio 16:9

For plots:

python ${CLAUDE_PLUGIN_ROOT}/scripts/paper_banana.py plot \
  --description-file /tmp/description.txt \
  --output plot.png

The script calls the Gemini API with response_modalities=["IMAGE"] for native image generation (diagrams) or generates and executes matplotlib code (plots).

Step 6: Critique (Claude as Critic Agent, Interactive)

After the image is generated, read it with the Read tool (Claude can view images) and perform a critique.

For diagrams, check:

Fidelity: Does the diagram accurately reflect the methodology? No hallucinated content.
Text QA: Check for typos, nonsensical text, unclear labels
Example validation: Verify molecular formulas, math expressions, etc.
Caption exclusion: The figure caption must NOT appear inside the image
Clarity: Is the flow confusing or layout cluttered?
Legend management: Remove redundant text-based legends

For plots, check:

Data fidelity: All quantitative values must be correct, no hallucinated data
Text QA: Check axis labels, legend entries, annotations
Value validation: Verify axis scales and data points against raw data
Overlap: Check for obscured labels, elements overlapping
Generation failures: If the plot failed to render, simplify the description

Present the critique to the user and ask whether to:

Accept the current image
Revise with suggested improvements (loop back to Step 5 with revised description)
Regenerate from scratch

If revising, produce a JSON critique:

{
  "critic_suggestions": "specific issues found...",
  "revised_description": "the full revised description..."
}

The revised description should primarily modify the original, not rewrite from scratch. Run up to 3 critique rounds maximum.

Environment Variables

Variable	Required	Description
`GOOGLE_API_KEY`	Yes	Free Gemini API key (https://aistudio.google.com/apikey)
`GEMINI_IMAGE_MODEL`	No	Image model (default: `nano-banana-pro-preview`)
`GEMINI_TEXT_MODEL`	No	Text model for plot code (default: `nano-banana-pro-preview`)

Key Tips

More detail = better output. Vague descriptions produce poor figures. Always specify exact colors, positions, and relationships.
Diagrams vs plots: Diagrams use Gemini's native image generation; plots generate matplotlib code and execute it.
Aspect ratios: Use 16:9 for wide pipeline/framework diagrams, 1:1 for square module diagrams, 3:2 for balanced layouts.
Domain awareness: Match the visual style to the paper's domain (see style guides for agent, CV, and theory paper conventions).
Timeout: The default timeout is 300s. For complex diagrams, increase with --timeout 600.

Additional Resources

Reference Files

For detailed styling guidelines, consult:

${CLAUDE_PLUGIN_ROOT}/skills/paper-banana/references/neurips2025-diagram-style-guide.md: Color palettes, shapes, arrows, typography, domain-specific styles for diagrams
${CLAUDE_PLUGIN_ROOT}/skills/paper-banana/references/neurips2025-plot-style-guide.md: Color palettes, axes, typography, chart-type-specific guidelines for plots

Script Reference

The main script at ${CLAUDE_PLUGIN_ROOT}/scripts/paper_banana.py supports:

generate: Render a diagram via Gemini image generation
plot: Generate and execute matplotlib code for plots
retrieve: Search PaperBananaBench for relevant reference examples
setup: Download the PaperBananaBench dataset