document-analysis

star 3

Read and analyze documents — PDF, DOCX, Markdown, HTML, CSV, XLSX, JSON, YAML. Provides read_document tool with no output truncation and page-range support for PDFs. Use when the user shares a document or asks to explain, summarize, or extract information from files.

LiboMa By LiboMa schedule Updated 3/26/2026

name: document-analysis description: "Read and analyze documents — PDF, DOCX, Markdown, HTML, CSV, XLSX, JSON, YAML. Provides read_document tool with no output truncation and page-range support for PDFs. Use when the user shares a document or asks to explain, summarize, or extract information from files." metadata: author: agenticops version: "1.0" domain: operations tools: - agenticops.tools.file_tools.read_document

Document Analysis Skill

Overview

When this skill is activated, the read_document tool is dynamically registered on the agent:

Tool Purpose Key Args
read_document Read full document content (no truncation) path, pages

Unlike read_local_file (which truncates at 4K chars for operational safety), read_document returns the complete content so you can fully understand and explain the document.

Supported Formats

Format Library Notes
PDF pymupdf or pypdf Page-range support (pages="1-5")
DOCX python-docx Full paragraph extraction
Markdown built-in Full content, no truncation
HTML built-in Full content, no truncation
CSV built-in Full content, no truncation
JSON/YAML built-in Full content, no truncation
XLSX openpyxl Multi-sheet, all rows

Quick Decision Trees

User Shares a Document

User provides a document (via @path or upload)
  |
  +-- Already injected as context?
  |     +-- Yes (attached via @path or web upload) → analyze directly
  |     +-- No (user mentions a file path) → read_document(path="...")
  |
  +-- Document too large?
  |     +-- PDF: use pages="1-5" to read in chunks
  |     +-- Other: summarize what you can, note truncation
  |
  +-- What does the user want?
        +-- "Explain this" → structured summary
        +-- "Summarize" → executive summary (key points + conclusions)
        +-- "Find X in this" → targeted extraction
        +-- "Compare with Y" → side-by-side analysis

Analyzing PDF Reports

PDF document received
  |
  +-- Large (>10 pages)?
  |     +-- Start with read_document(path, pages="1-3") for overview
  |     +-- Then read specific sections as needed
  |
  +-- What type of document?
  |     +-- Architecture/design doc → focus on components, data flow, decisions
  |     +-- Incident report → focus on timeline, root cause, remediation
  |     +-- Compliance/audit → focus on findings, risk level, recommendations
  |     +-- Cost report → focus on top spenders, trends, anomalies
  |     +-- Runbook/SOP → focus on steps, prerequisites, rollback
  |
  +-- Output format?
        +-- Brief: 3-5 bullet points
        +-- Detailed: section-by-section breakdown
        +-- Actionable: extract TODOs and next steps

Analyzing Spreadsheets

CSV or XLSX received
  |
  +-- Understand structure first
  |     +-- Column headers, row count, data types
  |
  +-- What does the user want?
        +-- "What's in this?" → schema + sample rows + summary stats
        +-- "Find anomalies" → look for outliers, missing data, spikes
        +-- "Trends" → time-series patterns if date column exists
        +-- "Top N" → sort/rank by a metric column

Analysis Workflow

Step 1: Read the Document

read_document(path="/path/to/report.pdf")
# or with page range for large PDFs:
read_document(path="/path/to/report.pdf", pages="1-5")

Step 2: Identify Structure

  • Document type (report, spec, spreadsheet, log dump)
  • Key sections / headings
  • Tables, figures, or data present

Step 3: Analyze Based on User Intent

  • Explain: Walk through each section, clarify technical terms
  • Summarize: Extract key findings, conclusions, action items
  • Extract: Pull specific data points the user asked about
  • Compare: Side-by-side with another document or known state

Step 4: Present Findings

  • Lead with the answer, not the process
  • Use structured format (headers, bullets, tables)
  • Quote specific passages when relevant
  • Note any limitations (scanned PDF with no text, truncated content)

Tool Reference Quick Card

Example Description
read_document(path="report.pdf") Read entire PDF
read_document(path="report.pdf", pages="1-3") Read pages 1-3 only
read_document(path="report.pdf", pages="5") Read page 5 only
read_document(path="spec.docx") Read Word document
read_document(path="data.csv") Read CSV file
read_document(path="metrics.xlsx") Read Excel workbook

Edge Cases

  • Scanned PDFs (image-only): text extraction will return empty — inform the user that OCR is needed
  • Password-protected PDFs: will fail — ask user to provide an unprotected copy
  • Very large spreadsheets: output is truncated to 6000 chars — suggest filtering or specifying columns of interest
  • Mixed content PDFs (text + images): only text is extracted — note that charts/diagrams are not visible
Install via CLI
npx skills add https://github.com/LiboMa/agenticops-chat --skill document-analysis
Repository Details
star Stars 3
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator