name: pdf description: "Use this Canvas-owned clean-room skill for PDF work where layout, extraction, generation, forms, page operations, or visual verification matter. Trigger for reading PDFs beyond plain text, creating PDFs, merging, splitting, rotating, watermarking, extracting tables or images, filling forms, OCR handoff, and final PDF QA." license: "Canvas Notebook Sustainable Use License"
PDF Workflows
This skill covers PDF creation, inspection, extraction, page operations, forms, and visual QA in Canvas Notebook. It is a clean-room Canvas skill; do not copy vendor skill material, hidden prompts, or proprietary helper code into this folder.
Core Contract
- Treat rendered pages as the source of truth for layout-sensitive tasks.
- Use text extraction for content discovery, not as proof that a PDF looks correct.
- Keep source PDFs immutable unless the user explicitly asks to overwrite them.
- Return only the final requested artifact unless the user asks for extracted intermediates.
- Use deterministic scripts for repeated operations, but keep them project- or workspace-local unless they are intentionally added to this skill later.
Recommended Tools
pypdffor merge, split, rotate, metadata, encryption, simple form filling, and page-level operations.pdfplumberfor text, coordinates, tables, and layout-aware inspection.reportlabfor creating new PDFs.- Poppler tools such as
pdfinfo,pdftoppm, andpdftocairofor rendering and page diagnostics. - OCR tools only when available and required; if OCR dependencies are missing, explain the blocker instead of pretending scanned text was read.
Workflow
- Classify the task: read/review, generate, page operation, form filling, extraction, redaction, OCR, or repair.
- Make a working copy for destructive operations.
- Use the simplest reliable library for the operation. Avoid manipulating raw PDF bytes manually.
- Render updated pages to PNG and inspect alignment, margins, clipping, legibility, form values, signatures/watermarks, and page order.
- If a visual issue appears, repair the source operation and re-render.
Common Tasks
- Extract text: start with
pdfplumber; preserve page numbers and note extraction uncertainty for scanned or complex PDFs. - Extract tables: use
pdfplumbertable strategies and validate against rendered pages. - Create a PDF: use
reportlab; define margins, fonts, section hierarchy, tables, and pagination explicitly. - Merge or split: use
pypdf; verify page count and order withpdfinfoand rendered thumbnails. - Fill forms: inspect AcroForm fields first, write a copy, flatten only if requested or necessary, then render filled pages.
- Redact: do not cover text with a black rectangle alone. Remove or rewrite underlying content where possible, then verify extraction and rendering.
Final Checks
- Latest rendered pages match the requested content and order.
- Text is readable at normal zoom and no content is clipped.
- Page size, orientation, margins, headers, footers, and numbering are correct.
- Any extraction, OCR, or redaction limitations are stated clearly.