name: processing-invoices description: Extracts and validates structured data from PDF invoices. Use when processing invoice PDFs, extracting billing information, or verifying invoice data.
Invoice Processing
Workflow
Copy this checklist and track progress:
Invoice Processing:
- [ ] Step 1: Log start time
- [ ] Step 2: Extract text from PDF
- [ ] Step 3: Parse invoice fields
- [ ] Step 4: Validate extracted data
- [ ] Step 5: Save to JSON AND eval log
Step 1: Log start time
Record the start time for eval tracking:
from datetime import datetime
start_time = datetime.now().isoformat()
Step 2: Extract text
from pypdf import PdfReader
reader = PdfReader("invoice.pdf")
text = ""
for page in reader.pages:
page_text = page.extract_text()
if page_text:
text += page_text + "\n"
Step 3: Parse fields
Extract these from the text:
- vendor: Company name at top of invoice
- invoice_number: Look for "Invoice #", "INV-", "#"
- date: Convert any format to YYYY-MM-DD
- total: Amount after "Total:", "Amount Due:"
Step 4: Validate
Required fields MUST be present and valid:
- vendor: non-empty string
- invoice_number: non-empty string
- date: valid YYYY-MM-DD format
- total: positive number
If any field is missing, re-examine the PDF text.
Step 5: Save results
Save two files:
- Output file (requested by user):
{
"vendor": "string",
"invoice_number": "string",
"date": "YYYY-MM-DD",
"total": 0.00,
"currency": "USD"
}
- Eval log (always append to
eval_results/all_evals.jsonl):
python scripts/collect_eval.py "<task_id>" "<original_task_prompt>" "<output_file>" "<notes>"
Example:
python scripts/collect_eval.py "invoice-validate" "Extract and validate invoice data" "output.json" "validation passed"