synoptic - SKILL.md Agent Skill

name: synoptic description: > Analyze PDF documents and visualize their structure, citations, relationships, and summaries. Use this skill whenever the user wants to understand a PDF's structure, see how sections relate to each other, visualize citation networks, get section-by-section summaries, or see information density across pages. Trigger on phrases like "analyze this PDF", "show me the structure", "visualize this document", "summarize this paper", "how is this document organized", "show citation map", "what are the key sections", or when a user uploads/references a PDF and wants to understand its content at a structural level.

Synoptic: PDF Document Analysis and Visualization

Complete Workflow

Step 1: PDF Text Extraction

Extract text from the input PDF using pdftotext (from poppler-utils) as the primary method:

pdftotext -layout <input.pdf> extracted.txt

If pdftotext is not installed or fails, fall back to the Python extraction script:

python scripts/extract_pdf.py <input.pdf> extracted.txt

Output: extracted.txt — page-delimited plain text of the entire PDF.

Step 2: Local Statistics Calculation (No API Needed)

Run the page statistics script to compute per-page metrics locally:

python scripts/page_stats.py extracted.txt page_stats.json

Output: page_stats.json — per-page statistics including word count, character count, and information density metrics.

Step 3: AI Analysis (Claude Performs This Directly)

Read the contents of extracted.txt and analyze the document. Produce analysis.json conforming to the following schema:

{
  "title": "Document title",
  "structure": {
    "sections": [
      {
        "id": "sec-1",
        "title": "Section Title",
        "level": 1,
        "page": 1,
        "charCount": 1234,
        "summary": "Brief summary of the section content.",
        "children": [
          {
            "id": "sec-1-1",
            "title": "Subsection Title",
            "level": 2,
            "page": 2,
            "charCount": 567,
            "summary": "Brief summary of the subsection.",
            "children": []
          }
        ]
      }
    ]
  },
  "citations": {
    "references": [
      {
        "id": "ref-1",
        "label": "[1]",
        "title": "Referenced work title",
        "authors": "Author A, Author B",
        "year": 2023
      }
    ],
    "inTextCitations": [
      {
        "referenceId": "ref-1",
        "page": 3,
        "context": "Surrounding sentence where the citation appears."
      }
    ]
  },
  "relationships": {
    "edges": [
      {
        "from": "sec-1",
        "to": "sec-2",
        "type": "prerequisite",
        "label": "Section 1 introduces concepts used in Section 2"
      }
    ]
  },
  "summary": {
    "overall": "A 2-3 sentence overall summary of the document.",
    "keywords": ["keyword1", "keyword2", "keyword3"],
    "sections": [
      {
        "id": "sec-1",
        "summary": "Summary of this section."
      }
    ]
  }
}

Relationship types: prerequisite, supports, contradicts, extends, references.

Write the completed analysis to analysis.json.

Step 4: HTML Report Generation

Generate the interactive HTML visualization report:

python scripts/generate_report.py page_stats.json analysis.json synoptic_report.html

Output: synoptic_report.html — a self-contained interactive report with section structure visualization, citation network graph, relationship diagram, summaries, and page-level statistics.

Step 5: Open Report

Open the generated report in the user's default browser:

start synoptic_report.html      # Windows
open synoptic_report.html       # macOS
xdg-open synoptic_report.html   # Linux

PDF Size Handling

Adapt the analysis strategy based on document length:

Small (10 pages or fewer): Analyze the full extracted text in a single pass. All sections, citations, and relationships are identified from the complete text.
Medium (10-50 pages): Chunk the text into manageable segments, analyze each chunk independently, then merge the results into a unified analysis.json. Ensure cross-chunk references and relationships are resolved during the merge step.
Large (50+ pages): Use summarized chunks for the detailed content analysis (summaries, citations, keywords) combined with a structure-only full analysis pass to capture the complete section hierarchy and inter-section relationships. This avoids context limits while preserving structural accuracy.