freight-doc-processor

star 0

Automatically detect, OCR-parse, and verify POD (Proof of Delivery) and BOL (Bill of Lading) documents received as email attachments. Matches documents to load records, flags discrepancies, and triggers invoicing. Use when a broker receives delivery documents, needs POD confirmation, wants to verify delivery details, or needs to process BOLs. Triggered by phrases like "missing PODs", "check documents", "verify delivery", "resend POD", "process BOL", or automatically when email with PDF/image arrives.

wasay1200 By wasay1200 schedule Updated 3/9/2026

name: freight-doc-processor description: Automatically detect, OCR-parse, and verify POD (Proof of Delivery) and BOL (Bill of Lading) documents received as email attachments. Matches documents to load records, flags discrepancies, and triggers invoicing. Use when a broker receives delivery documents, needs POD confirmation, wants to verify delivery details, or needs to process BOLs. Triggered by phrases like "missing PODs", "check documents", "verify delivery", "resend POD", "process BOL", or automatically when email with PDF/image arrives.

Freight Document Processor

Detects, OCR-parses, and verifies POD and BOL documents from email attachments. Matches to load records and confirms delivery or flags issues.

Setup

# Install dependencies
pip3 install pdfplumber pytesseract Pillow

# Install tesseract OCR engine:
# macOS: brew install tesseract
# Ubuntu/Debian: sudo apt-get install tesseract-ocr
# CentOS/RHEL: sudo yum install tesseract

Usage

Detect document type

cd skills/freight-doc-processor/scripts
python3 doc_detector.py --file /path/to/document.pdf

OCR extract text

python3 ocr_extractor.py --file document.pdf --type POD
python3 ocr_extractor.py --file scan.png --type BOL --json

Match to load record

python3 doc_matcher.py --bol-number BOL-12345
python3 doc_matcher.py --shipper "ABC Corp"

Document Types

Type Description Typical Fields
POD Proof of Delivery Delivery date, receiver name, signature, exceptions
BOL Bill of Lading Shipper, consignee, commodity, weight, units
RATE_CON Rate Confirmation Agreed rate, terms, carrier MC, load details
INVOICE Invoice Amount due, invoice number, payment terms

Broker Text Commands

Command Action
missing PODs List loads awaiting POD confirmation
resend POD [load ID] Request POD from carrier
check documents Show recently processed documents
verify delivery [load ID] Confirm if load is marked delivered

OCR Accuracy

  • Digital PDFs: 95%+ accuracy with pdfplumber
  • Scanned documents: 70-90% depending on scan quality
  • Poor quality: May flag as unreadable, requests clearer copy

Example SMS Outputs

POD Confirmed:

✅ POD RECEIVED - Load #12345
Delivered: Dec 15 at 3:45pm
Receiver: J. Smith | Chicago, IL
Units: 24 pallets ✓ | No exceptions

Ready for invoicing. Reply 'invoice' to generate.

POD with discrepancies:

⚠️ DISCREPANCY - Load #12345
BOL shows: 48 pallets | POD shows: 46 pallets
Shortage noted by receiver

Reply 'override' to invoice 46 units
Reply 'resolve' to contact carrier

Unmatched document:

📄 DOCUMENT RECEIVED
Type: POD (confidence: medium)
Could not match to any load record

Reply 'match [load ID]' to link this document

Processing Chain

Email attachment arrives
        ↓
doc_pipeline.py (orchestrator)
        ↓
  1. Docling (local, free, preferred)
        ↓ fails or low confidence
  2. Mistral OCR (cloud API fallback)
        ↓ fails
  3. Alert broker for manual review

Scripts Reference

doc_pipeline.py — Main orchestrator (start here)

python3 doc_pipeline.py --demo                    # Test with sample POD (no credentials)
python3 doc_pipeline.py --file document.pdf       # Process one file
python3 doc_pipeline.py --scan                    # Process all new files in attachments folder
python3 doc_pipeline.py --scan --json             # JSON output

docling_processor.py — Primary OCR (local, no API key needed)

pip3 install docling
python3 docling_processor.py --file document.pdf
python3 docling_processor.py --file scan.jpg --type POD --json

mistral_ocr.py — Fallback OCR (cloud)

export MISTRAL_API_KEY=your_key_here   # get at console.mistral.ai
python3 mistral_ocr.py --file document.pdf --json

doc_detector.py — Classify document type

python3 doc_detector.py --file /path/to/document.pdf

ocr_extractor.py — Extract text fields (Tesseract-based)

python3 ocr_extractor.py --file document.pdf --type POD
python3 ocr_extractor.py --file scan.png --type BOL --json

doc_matcher.py — Match document to load record

python3 doc_matcher.py --bol-number BOL-12345
python3 doc_matcher.py --shipper "ABC Corp"

Install

# Core (required for Docling path)
pip3 install docling requests

# Optional: Tesseract-based fallback
pip3 install pdfplumber pytesseract Pillow
brew install tesseract   # macOS

Integrations

See INTEGRATIONS.md for full details.

Service Purpose Status
Docling Primary processor — local, free ✅ Built
Mistral OCR Cloud fallback ✅ Built
Azure Document Intelligence Enterprise alternative Documented only
AWS Textract AWS cloud alternative Documented only
Tesseract OCR Legacy local fallback ✅ Built (ocr_extractor.py)

Cron Setup

# Run every 5 minutes to process new attachments
*/5 * * * * cd /path/to/freight-doc-processor/scripts && python3 doc_pipeline.py --scan >> /tmp/doc-pipeline.log 2>&1

Storage

  • Documents: ~/.freight-broker/attachments/
  • OCR cache: ~/.freight-broker/ocr_cache/
  • Processed log: ~/.freight-broker/processed_docs.json
Install via CLI
npx skills add https://github.com/wasay1200/freight-broker-ai --skill freight-doc-processor
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator