name: pdf description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale. license: Proprietary. LICENSE.txt has complete terms
PDF Toolkit
Use this skill for PDF extraction, generation, splitting/merging, annotation, form filling, visual conversion, and batch processing.
Choose Path
- Text extraction: try
pdftotextorpypdf; usepdfplumberfor layout and tables. - Merge/split/rotate/encrypt: use
qpdforpypdf. - Create PDFs: use
reportlab; readreference.mdfor advanced layout. - Fill forms: read
forms.mdfirst and use bundled scripts. - Visual analysis: convert pages to images with
scripts/convert_pdf_to_images.pyor Poppler tools. - Scanned PDFs: OCR with
pytesseractafter image conversion.
Bundled References
Open only when needed:
forms.md: required for fillable forms and annotation workflow.reference.md: advancedpypdfium2,pdf-lib, layout, and troubleshooting.
Bundled Scripts
Prefer scripts as black boxes; run --help first:
check_fillable_fields.pyextract_form_field_info.pyfill_fillable_fields.pyfill_pdf_form_with_annotations.pycheck_bounding_boxes.pycreate_validation_image.pyconvert_pdf_to_images.py
Core Patterns
Minimal text extraction:
from pypdf import PdfReader
reader = PdfReader("input.pdf")
text = "\n".join(page.extract_text() or "" for page in reader.pages)
Minimal merge:
from pypdf import PdfReader, PdfWriter
writer = PdfWriter()
for path in ["a.pdf", "b.pdf"]:
for page in PdfReader(path).pages:
writer.add_page(page)
with open("merged.pdf", "wb") as f:
writer.write(f)
Verification
Always verify produced PDFs. For structural operations, reopen with pypdf and
check page count/metadata. For forms or annotations, create validation images
and inspect visible placement.