fidelity-1099-parser

star 0

Extract and process Fidelity Tax Reporting Statements (1099-B and 1099-DIV). Use for parsing Fidelity consolidated 1099 PDFs to extract sales transactions, dividends, and distributions. Generates estate accounting schedules with monthly totals using auditor terminology (Total Income, Total Disbursements, Net FMV Change). Triggers on Fidelity 1099, tax statement, estate accounting, brokerage tax forms, 1099-B sales, 1099-DIV dividends.

spinjob By spinjob schedule Updated 2/15/2026

name: fidelity-1099-parser description: Extract and process Fidelity Tax Reporting Statements (1099-B and 1099-DIV). Use for parsing Fidelity consolidated 1099 PDFs to extract sales transactions, dividends, and distributions. Generates estate accounting schedules with monthly totals using auditor terminology (Total Income, Total Disbursements, Net FMV Change). Triggers on Fidelity 1099, tax statement, estate accounting, brokerage tax forms, 1099-B sales, 1099-DIV dividends.

Fidelity 1099 Parser

Parse Fidelity Tax Reporting Statement PDFs to extract sales and dividend transactions, with accurate 1099-B section classification and 1099-DIV box mapping.

Quick Start

Generate Estate Report

# Auto-generates YYYY-XXXX-Estate-Report.xlsx from PDF filename
python scripts/generate_estate_report.py path/to/fidelity_1099.pdf

# Or specify custom output path
python scripts/generate_estate_report.py path/to/fidelity_1099.pdf output_report.xlsx

Output filename is normalized to YYYY-XXXX-Estate-Report.xlsx format (e.g., 2020-9147-Estate-Report.xlsx).

Programmatic Usage

from scripts.generate_estate_report import generate_report
generate_report("path/to/fidelity_1099.pdf")  # Auto-generates output filename

# Or use individual extractors:
from scripts.extract_sales import extract_sales_from_pdf
from scripts.extract_dividends import extract_dividends_from_pdf

sales = extract_sales_from_pdf("fidelity_1099.pdf")
dividends, totals = extract_dividends_from_pdf("fidelity_1099.pdf")

Extraction Workflow

1. Sales Transactions (1099-B)

Extract sales from text using regex patterns:

# Symbol line format: COMPANY NAME,SYMBOL,CUSIP (3 fields)
# Also handles 2-field format: DESCRIPTION,CUSIP (for exchanged securities)
symbol_pattern_3field = r'^([A-Z0-9][A-Z0-9 &\-\.\'#/\*!()\$]+?),([A-Z]{1,5}),([A-Z0-9]{9})$'
symbol_pattern_2field = r'^([A-Z][A-Z0-9 &\-\.\'\*#/()\$]+),([A-Z0-9]{9})$'

# Sale line format: Type Quantity DateAcquired DateSold Proceeds CostBasis
sale_pattern = r'^(Sale|Cash In Lieu|Merger|Redemption|Exchange)\s+([\d,.]+)\s+(\d{2}/\d{2}/\d{2})\s+(\d{2}/\d{2}/\d{2})\s+([\d,.]+)\s+([\d,.]+)'

# Unknown term pattern: Type Quantity Unknown DateSold Proceeds Unknown
unknown_term_pattern = r'^(Sale|Cash In Lieu|...)\s+([\d,.]+)\s+Unknown\s+(\d{2}/\d{2}/\d{2})\s+([\d,.]+)\s+Unknown'

Classify by section headers:

  • Box A: "Short-Term Transactions...Basis Reported to IRS"
  • Box B: "Short-Term Transactions...Basis NOT Reported"
  • Box D: "Long-Term Transactions...Basis Reported to IRS"
  • Box E: "Long-Term Transactions...Basis NOT Reported"
  • Unknown Term: "Transactions...Term is Unknown"

2. Dividends (1099-DIV)

Critical: Must exclude Capital Gains Distribution pages from dividend extraction.

def is_dividend_section(text):
    """Check if page contains dividends (not capital gains)."""
    if "Dividends and Distributions" not in text:
        return False
    # CRITICAL: Exclude Capital Gains pages
    lines = text.split('\n')[:25]
    for line in lines:
        if "Total Capital Gains Distributions Detail" in line:
            return False
        if "Capital Gain Distr" in line and "2a" in line:
            return False
    return True

Use word-level extraction with x-position column mapping:

words = page.extract_words()
for word in words:
    x = word['x0']
    if 100 <= x < 200:      # Box 1a Ordinary Dividends
    elif 340 <= x < 420:    # Box 1b Qualified Dividends
    elif 480 <= x < 575:    # Box 11 Exempt Interest
    elif 575 <= x < 700:    # Box 12 Private Activity Bond
    elif 700 <= x < 800:    # Box 7 Foreign Tax Paid

Critical: Handle !C prefix (CORRECTED entries):

# Matches: "01/10/20", "!C 02/11/20", "!C02/11/20"
date_pattern = r'^!?C?\s*(\d{2}/\d{2}/\d{2})(?:\s|$)'

3. Symbol Tracking

Track current symbol while iterating through text lines. Reset on separator (dashes):

current_symbol = None
for line in lines:
    if SEPARATOR_PATTERN.match(line):  # Dashes reset context
        current_symbol = None
        continue
    if symbol_pattern.match(line):
        current_symbol = extract_symbol(line)
    elif date_pattern.match(line) and current_symbol:
        # Associate this transaction with current_symbol

Validation

The extraction automatically validates against 1099 summary totals from the PDF:

from scripts.validate_extraction import validate_extraction
report = validate_extraction(pdf_path, sales, dividends, div_totals)
print(report)  # Shows pass/fail for each field with $0.02 tolerance

Validated fields:

  • 1099-B: Proceeds, Cost Basis, Gain/Loss for each section (Box A, B, D, E, Unknown Term)
  • 1099-DIV: Box 1a, 1b, 7, 11, 12 totals

Common Validation Issues and Fixes

Issue Symptom Fix
Capital Gains in Dividends Box 1a over-reported by ~$2,500+ Update is_dividend_section() to exclude pages with "Total Capital Gains Distributions Detail" header
Missing !C entries Box totals under-reported Ensure date pattern handles !C prefix with/without space
Wrong symbol association Amounts on wrong symbol Reset symbol context on separator lines (dashes)
Column value mismatch Wrong box amounts Recalibrate x-position boundaries from actual PDF
2-field securities missed Missing exchanged securities Add pattern for DESCRIPTION,CUSIP format (no symbol)

Debugging Validation Failures

  1. Identify the discrepancy: Note which field failed and the difference amount
  2. Search for the amount: Look for that exact dollar amount in the PDF
  3. Check page context: Is it on a Capital Gains page? Dividend page? Sales page?
  4. Trace the extraction: Add debug output to see which pages/lines are being processed
  5. Fix the filter/pattern: Update section detection or regex patterns

Example debug for Box 1a failure ($2,661.91 over):

# Found $2,499.87 AEMSX on page 226 - a Capital Gains Distribution page
# Fix: Added check for "Total Capital Gains Distributions Detail" header

Output Sheets

Generated Excel report with 7 sheets, designed for auditor review. Uses auditor terminology:

  • Total Income = Dividends + Interest
  • Total Disbursements = Sales Proceeds
  • Net FMV Change = Realized Gain/Loss

Sheet Structure

  1. 1. START HERE - Instructions and navigation guide

    • Source document info and generation timestamp
    • Step-by-step audit workflow
    • 1099-B and 1099-DIV box definitions
  2. 2. Reconciliation - Formula-based validation (auditor's first stop)

    • SUMIF formulas calculate totals from transaction detail sheets
    • Compares against expected values from 1099 summary pages
    • Shows ✓ PASS / ✗ FAIL for each field with $0.02 tolerance
    • Overall status: "✓ ALL DATA RECONCILED" or "✗ RECONCILIATION FAILED"
    • Formulas reference sheets '6. Sales Detail' and '7. Dividend Detail'
  3. 3. Annual Summary - High-level totals by 1099 category

    • 1099-B: Proceeds, Cost Basis, Gain/Loss by section (Box A, B, D, E)
    • 1099-DIV: Box 1a, 1b, 7, 11, 12 totals
  4. 4. Monthly Detail - Monthly breakdown for abbreviated reporting

    • Columns: Month, Dividend Income, Interest Income, Total Income, Total Disbursements, Net FMV Change, Foreign Tax Paid
    • Annual totals row
    • Column definitions section
  5. 5. Monthly by Symbol - Monthly breakdown by security symbol

    • Same columns as Monthly Detail with Symbol and Description added
    • Month subtotals with green highlighting
    • Annual totals
  6. 6. Sales Detail - Individual 1099-B transactions

    • Columns: Symbol, Description, CUSIP, Type, Quantity, Date Acquired, Date Sold, Proceeds, Cost Basis, Gain/Loss, 1099-B Section
    • Section values: "Short-Term Box A", "Long-Term Box D", "Unknown Term (Box B/E)"
  7. 7. Dividend Detail - Individual 1099-DIV transactions

    • Columns: Date, Symbol, Description, CUSIP, Corrected, Box 1a Ordinary, Box 1b Qualified, Box 11 Exempt, Box 12 PAB, Box 7 Foreign Tax

Excel Formula References

Reconciliation sheet formulas reference these columns:

  • Sales Detail: Column H=Proceeds, I=Cost Basis, J=Gain/Loss, K=1099-B Section
  • Dividend Detail: Column F=Box 1a, G=Box 1b, H=Box 11, I=Box 12, J=Box 7

Output Filename Convention

Reports are automatically named YYYY-XXXX-Estate-Report.xlsx:

  • YYYY = Tax year extracted from PDF filename
  • XXXX = Last 4 digits of account number

Examples:

  • 2020-Executor-9147-Consolidated-Form-1099.pdf2020-9147-Estate-Report.xlsx
  • 2020-Trust-Under-Agreement-9431-Consolidated-Form-1099.pdf2020-9431-Estate-Report.xlsx
  • Y97-127622_2020-Individual-7622-Consolidated-Form-1099.pdf2020-7622-Estate-Report.xlsx

Resources

scripts/

  • extract_sales.py - Extract and classify 1099-B sales transactions
  • extract_dividends.py - Extract 1099-DIV dividends with x-position mapping
  • generate_estate_report.py - Create 7-sheet Excel report with validation
  • validate_extraction.py - Validate extracted totals against PDF summary pages

references/

  • column_positions.md - X-position boundaries for dividend columns
  • 1099_box_reference.md - IRS 1099-B and 1099-DIV box meanings

Dependencies

pip install pdfplumber openpyxl
Install via CLI
npx skills add https://github.com/spinjob/fidelity-1099-parser --skill fidelity-1099-parser
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
Occupations
More from Creator