iyeque-pdf-reader

star 835

PDF reader skill using PyMuPDF (fitz) for text extraction and metadata retrieval. Supports encrypted PDFs and handles large documents efficiently.

wentorai By wentorai schedule Updated 3/25/2026

name: iyeque-pdf-reader description: PDF reader skill using PyMuPDF (fitz) for text extraction and metadata retrieval. Supports encrypted PDFs and handles large documents efficiently. tags: [pdf, reader, text-extraction, metadata, pymupdf] version: 1.0.0 author: iyeque source: https://clawhub.ai/iyeque/iyeque-pdf-reader requirements: - PyMuPDF (fitz) install: pip install pymupdf

PDF Reader (Iyeque)

PDF reader skill for text extraction and metadata retrieval using PyMuPDF.


Installation

pip install pymupdf

Tool API

The skill provides two commands:

1. extract — Extract Text

Extracts plain text from the specified PDF file.

Parameters:

Parameter Type Required Description
file_path string Path to the PDF file
--max_pages integer Maximum number of pages to extract

Usage:

# Extract all text
python3 skills/pdf-reader/reader.py extract /path/to/document.pdf

# Extract first 5 pages only
python3 skills/pdf-reader/reader.py extract /path/to/document.pdf --max_pages 5

Output: Plain text content from the PDF.


2. metadata — Get Document Info

Retrieve metadata about the document.

Parameters:

Parameter Type Required Description
file_path string Path to the PDF file

Usage:

python3 skills/pdf-reader/reader.py metadata /path/to/document.pdf

Output: Structured JSON with document metadata.


Metadata Fields

Field Description
title Document title
author Document author
subject Document subject
creator Software that created the PDF
producer PDF producer software
creationDate Creation date
modDate Modification date
format PDF format version
encryption Encryption info (if any)

Example Output

Extract Output

Plain text content from the PDF...

Metadata Output

{
  "title": "Annual Report 2024",
  "author": "John Doe",
  "creationDate": "D:20240115120000Z",
  "creator": "Microsoft Word",
  "producer": "Adobe PDF Library",
  "format": "PDF 1.7"
}

Features

  • Fast extraction using PyMuPDF (fitz)
  • Metadata retrieval including creation/modification dates
  • Page limiting with --max_pages for large documents
  • Encrypted PDF support (password required if applicable)
  • Error handling for corrupted or malformed PDFs

When to Use This Skill

  • Extract text content from PDF files
  • Get document metadata (title, author, dates)
  • Process multiple PDFs for analysis
  • Quick PDF content preview
  • Academic paper text extraction

Python API Alternative

You can also use PyMuPDF directly in Python:

import fitz  # PyMuPDF

# Open PDF
doc = fitz.open("document.pdf")

# Get metadata
print(doc.metadata)

# Extract text from all pages
for page in doc:
    text = page.get_text()
    print(text)

# Extract from specific page
page = doc[0]  # First page (0-indexed)
text = page.get_text()

Notes

  • Uses PyMuPDF (imported as fitz) for fast, reliable PDF processing
  • Supports encrypted PDFs (will modify if password required)
  • Handles large PDFs efficiently with --max_pages option
  • Returns error message if file not found or invalid PDF

Error Handling

Error Cause Solution
File not found Invalid path Check file path
Not a valid PDF Corrupted file Verify file integrity
Encrypted PDF Password protected Provide password if required
Install via CLI
npx skills add https://github.com/wentorai/Research-Claw --skill iyeque-pdf-reader
Repository Details
star Stars 835
call_split Forks 112
navigation Branch main
article Path SKILL.md
More from Creator