pdf-processing

star 1

Extract text and tables from PDF files.

By pcingola schedule Updated 1/24/2026

name: pdf-processing description: Extract text and tables from PDF files. compatibility: Requires PyMuPDF library.

Use this skill when the task involves reading, extracting, or transforming content from PDF documents.

For standard extraction, run the bundled script:

python scripts/extract.py <file.pdf>

The script reads the PDF and outputs structured content with page numbers and detected tables.

Return extracted text with page numbers and any detected tables in a structured format.

If you encounter scanned PDFs or complex layouts, OCR processing may be required.

Install via CLI

npx skills add https://github.com/pcingola/book_agentic_patterns --skill pdf-processing

Repository Details

star Stars 1

call_split Forks 0

navigation Branch main

article Path SKILL.md