name: fidelity-1099-parser description: Extract and process Fidelity Tax Reporting Statements (1099-B and 1099-DIV). Use for parsing Fidelity consolidated 1099 PDFs to extract sales transactions, dividends, and distributions. Generates estate accounting schedules with monthly totals using auditor terminology (Total Income, Total Disbursements, Net FMV Change). Triggers on Fidelity 1099, tax statement, estate accounting, brokerage tax forms, 1099-B sales, 1099-DIV dividends.
Fidelity 1099 Parser
Parse Fidelity Tax Reporting Statement PDFs to extract sales and dividend transactions, with accurate 1099-B section classification and 1099-DIV box mapping.
Quick Start
Generate Estate Report
# Auto-generates YYYY-XXXX-Estate-Report.xlsx from PDF filename
python scripts/generate_estate_report.py path/to/fidelity_1099.pdf
# Or specify custom output path
python scripts/generate_estate_report.py path/to/fidelity_1099.pdf output_report.xlsx
Output filename is normalized to YYYY-XXXX-Estate-Report.xlsx format (e.g., 2020-9147-Estate-Report.xlsx).
Programmatic Usage
from scripts.generate_estate_report import generate_report
generate_report("path/to/fidelity_1099.pdf") # Auto-generates output filename
# Or use individual extractors:
from scripts.extract_sales import extract_sales_from_pdf
from scripts.extract_dividends import extract_dividends_from_pdf
sales = extract_sales_from_pdf("fidelity_1099.pdf")
dividends, totals = extract_dividends_from_pdf("fidelity_1099.pdf")
Extraction Workflow
1. Sales Transactions (1099-B)
Extract sales from text using regex patterns:
# Symbol line format: COMPANY NAME,SYMBOL,CUSIP (3 fields)
# Also handles 2-field format: DESCRIPTION,CUSIP (for exchanged securities)
symbol_pattern_3field = r'^([A-Z0-9][A-Z0-9 &\-\.\'#/\*!()\$]+?),([A-Z]{1,5}),([A-Z0-9]{9})$'
symbol_pattern_2field = r'^([A-Z][A-Z0-9 &\-\.\'\*#/()\$]+),([A-Z0-9]{9})$'
# Sale line format: Type Quantity DateAcquired DateSold Proceeds CostBasis
sale_pattern = r'^(Sale|Cash In Lieu|Merger|Redemption|Exchange)\s+([\d,.]+)\s+(\d{2}/\d{2}/\d{2})\s+(\d{2}/\d{2}/\d{2})\s+([\d,.]+)\s+([\d,.]+)'
# Unknown term pattern: Type Quantity Unknown DateSold Proceeds Unknown
unknown_term_pattern = r'^(Sale|Cash In Lieu|...)\s+([\d,.]+)\s+Unknown\s+(\d{2}/\d{2}/\d{2})\s+([\d,.]+)\s+Unknown'
Classify by section headers:
- Box A: "Short-Term Transactions...Basis Reported to IRS"
- Box B: "Short-Term Transactions...Basis NOT Reported"
- Box D: "Long-Term Transactions...Basis Reported to IRS"
- Box E: "Long-Term Transactions...Basis NOT Reported"
- Unknown Term: "Transactions...Term is Unknown"
2. Dividends (1099-DIV)
Critical: Must exclude Capital Gains Distribution pages from dividend extraction.
def is_dividend_section(text):
"""Check if page contains dividends (not capital gains)."""
if "Dividends and Distributions" not in text:
return False
# CRITICAL: Exclude Capital Gains pages
lines = text.split('\n')[:25]
for line in lines:
if "Total Capital Gains Distributions Detail" in line:
return False
if "Capital Gain Distr" in line and "2a" in line:
return False
return True
Use word-level extraction with x-position column mapping:
words = page.extract_words()
for word in words:
x = word['x0']
if 100 <= x < 200: # Box 1a Ordinary Dividends
elif 340 <= x < 420: # Box 1b Qualified Dividends
elif 480 <= x < 575: # Box 11 Exempt Interest
elif 575 <= x < 700: # Box 12 Private Activity Bond
elif 700 <= x < 800: # Box 7 Foreign Tax Paid
Critical: Handle !C prefix (CORRECTED entries):
# Matches: "01/10/20", "!C 02/11/20", "!C02/11/20"
date_pattern = r'^!?C?\s*(\d{2}/\d{2}/\d{2})(?:\s|$)'
3. Symbol Tracking
Track current symbol while iterating through text lines. Reset on separator (dashes):
current_symbol = None
for line in lines:
if SEPARATOR_PATTERN.match(line): # Dashes reset context
current_symbol = None
continue
if symbol_pattern.match(line):
current_symbol = extract_symbol(line)
elif date_pattern.match(line) and current_symbol:
# Associate this transaction with current_symbol
Validation
The extraction automatically validates against 1099 summary totals from the PDF:
from scripts.validate_extraction import validate_extraction
report = validate_extraction(pdf_path, sales, dividends, div_totals)
print(report) # Shows pass/fail for each field with $0.02 tolerance
Validated fields:
- 1099-B: Proceeds, Cost Basis, Gain/Loss for each section (Box A, B, D, E, Unknown Term)
- 1099-DIV: Box 1a, 1b, 7, 11, 12 totals
Common Validation Issues and Fixes
| Issue | Symptom | Fix |
|---|---|---|
| Capital Gains in Dividends | Box 1a over-reported by ~$2,500+ | Update is_dividend_section() to exclude pages with "Total Capital Gains Distributions Detail" header |
| Missing !C entries | Box totals under-reported | Ensure date pattern handles !C prefix with/without space |
| Wrong symbol association | Amounts on wrong symbol | Reset symbol context on separator lines (dashes) |
| Column value mismatch | Wrong box amounts | Recalibrate x-position boundaries from actual PDF |
| 2-field securities missed | Missing exchanged securities | Add pattern for DESCRIPTION,CUSIP format (no symbol) |
Debugging Validation Failures
- Identify the discrepancy: Note which field failed and the difference amount
- Search for the amount: Look for that exact dollar amount in the PDF
- Check page context: Is it on a Capital Gains page? Dividend page? Sales page?
- Trace the extraction: Add debug output to see which pages/lines are being processed
- Fix the filter/pattern: Update section detection or regex patterns
Example debug for Box 1a failure ($2,661.91 over):
# Found $2,499.87 AEMSX on page 226 - a Capital Gains Distribution page
# Fix: Added check for "Total Capital Gains Distributions Detail" header
Output Sheets
Generated Excel report with 7 sheets, designed for auditor review. Uses auditor terminology:
- Total Income = Dividends + Interest
- Total Disbursements = Sales Proceeds
- Net FMV Change = Realized Gain/Loss
Sheet Structure
1. START HERE - Instructions and navigation guide
- Source document info and generation timestamp
- Step-by-step audit workflow
- 1099-B and 1099-DIV box definitions
2. Reconciliation - Formula-based validation (auditor's first stop)
- SUMIF formulas calculate totals from transaction detail sheets
- Compares against expected values from 1099 summary pages
- Shows ✓ PASS / ✗ FAIL for each field with $0.02 tolerance
- Overall status: "✓ ALL DATA RECONCILED" or "✗ RECONCILIATION FAILED"
- Formulas reference sheets
'6. Sales Detail'and'7. Dividend Detail'
3. Annual Summary - High-level totals by 1099 category
- 1099-B: Proceeds, Cost Basis, Gain/Loss by section (Box A, B, D, E)
- 1099-DIV: Box 1a, 1b, 7, 11, 12 totals
4. Monthly Detail - Monthly breakdown for abbreviated reporting
- Columns: Month, Dividend Income, Interest Income, Total Income, Total Disbursements, Net FMV Change, Foreign Tax Paid
- Annual totals row
- Column definitions section
5. Monthly by Symbol - Monthly breakdown by security symbol
- Same columns as Monthly Detail with Symbol and Description added
- Month subtotals with green highlighting
- Annual totals
6. Sales Detail - Individual 1099-B transactions
- Columns: Symbol, Description, CUSIP, Type, Quantity, Date Acquired, Date Sold, Proceeds, Cost Basis, Gain/Loss, 1099-B Section
- Section values: "Short-Term Box A", "Long-Term Box D", "Unknown Term (Box B/E)"
7. Dividend Detail - Individual 1099-DIV transactions
- Columns: Date, Symbol, Description, CUSIP, Corrected, Box 1a Ordinary, Box 1b Qualified, Box 11 Exempt, Box 12 PAB, Box 7 Foreign Tax
Excel Formula References
Reconciliation sheet formulas reference these columns:
- Sales Detail: Column H=Proceeds, I=Cost Basis, J=Gain/Loss, K=1099-B Section
- Dividend Detail: Column F=Box 1a, G=Box 1b, H=Box 11, I=Box 12, J=Box 7
Output Filename Convention
Reports are automatically named YYYY-XXXX-Estate-Report.xlsx:
YYYY= Tax year extracted from PDF filenameXXXX= Last 4 digits of account number
Examples:
2020-Executor-9147-Consolidated-Form-1099.pdf→2020-9147-Estate-Report.xlsx2020-Trust-Under-Agreement-9431-Consolidated-Form-1099.pdf→2020-9431-Estate-Report.xlsxY97-127622_2020-Individual-7622-Consolidated-Form-1099.pdf→2020-7622-Estate-Report.xlsx
Resources
scripts/
extract_sales.py- Extract and classify 1099-B sales transactionsextract_dividends.py- Extract 1099-DIV dividends with x-position mappinggenerate_estate_report.py- Create 7-sheet Excel report with validationvalidate_extraction.py- Validate extracted totals against PDF summary pages
references/
column_positions.md- X-position boundaries for dividend columns1099_box_reference.md- IRS 1099-B and 1099-DIV box meanings
Dependencies
pip install pdfplumber openpyxl