name: pdf-processing description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
PDF Processing
This skill provides utilities for working with PDF documents.
Quick Start
Use pdfplumber to extract text from PDFs:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
Available Operations
- Text Extraction: Extract text content from PDF pages
- Table Extraction: Extract tabular data from PDFs
- Form Filling: Fill PDF forms with provided data
- Document Merging: Combine multiple PDFs into one
Advanced Features
Form filling: See FORMS.md for complete guide
Utility scripts:
- Run
scripts/analyze_form.pyto extract form fields - Run
scripts/extract_text.pyto extract text from a PDF
Best Practices
- Always validate PDF files before processing
- Handle password-protected PDFs gracefully
- Check for scanned PDFs that may require OCR