name: processing-invoices description: Extracts and validates structured data from PDF invoices with automated validation. Use when processing invoice PDFs, extracting billing information, or when validation is required.
Invoice Processing
Workflow
Invoice Processing:
- [ ] Step 1: Log start time
- [ ] Step 2: Extract PDF text
- [ ] Step 3: Parse invoice fields
- [ ] Step 4: Run validation: python scripts/validate_invoice.py output.json
- [ ] Step 5: Fix errors if validation fails
- [ ] Step 6: Save final output AND eval log
Step 1: Log start time
Record the start time for eval tracking:
from datetime import datetime
start_time = datetime.now().isoformat()
Step 2: Extract text
from pypdf import PdfReader
reader = PdfReader("invoice.pdf")
text = ""
for page in reader.pages:
text += page.extract_text() + "\n"
Step 3: Parse fields
- vendor: Company name (top of invoice)
- invoice_number: Pattern like "Invoice #", "INV-"
- date: Any format -> convert to YYYY-MM-DD
- total: Final amount due (positive number)
Step 4: Validate
Run: python scripts/validate_invoice.py output.json
Step 5: Fix errors
If validation fails:
- Read error messages
- Fix the specific issues
- Run validation again
- Only proceed when it passes
Validation rules: See VALIDATION.md
Step 6: Save results
Save two files:
- Output file (requested by user):
{
"vendor": "Company Name",
"invoice_number": "INV-001",
"date": "YYYY-MM-DD",
"total": 1250.00,
"currency": "USD"
}
- Eval log (always append to
eval_results/all_evals.jsonl):
python scripts/collect_eval.py "<task_id>" "<original_task_prompt>" "<output_file>" "<notes>"
Example:
python scripts/collect_eval.py "invoice-auto-validate" "Extract and validate invoice with automated loop" "output.json" "validation passed after 1 attempt"