ir-financials-extractor

star 1

Extract H1 2025 and H2 2024 financial data from any listed company's investor relations website using Playwright browser automation. Use when asked to get half-year financials, interim results, semi-annual reports, or compare H1/H2 periods for publicly traded companies. Handles cookie consent, navigation to IR sections, and derives H2 data from FY minus H1 when no standalone H2 report exists.

brainqub3 By brainqub3 schedule Updated 12/30/2025

name: ir-financials-extractor description: Extract H1 2025 and H2 2024 financial data from any listed company's investor relations website using Playwright browser automation. Use when asked to get half-year financials, interim results, semi-annual reports, or compare H1/H2 periods for publicly traded companies. Handles cookie consent, navigation to IR sections, and derives H2 data from FY minus H1 when no standalone H2 report exists.

IR Financials Extractor

Extract H1 2025 + H2 2024 financials from any listed company IR site (no login required).

Goal

Given a public listed company website, navigate to Investor Relations, locate:

  • H1 2025 report (or "Interim results 2025 / Half-year 2025")
  • H2 2024 report (or a source that clearly represents the second half of FY 2024)

Then record key financial details for both periods in a two-column financials table.

Constraints

  • No logins required
  • ONLY use sources from the official company website - the URL will be provided by the user. Do NOT use third-party financial news sites, aggregators, or external databases. All data must come directly from the company's own IR pages, PDFs, or official announcements hosted on their domain.
  • Be robust to different naming conventions and layouts

Guardrails (MANDATORY)

1. Domain Restriction Enforcement

CRITICAL: You MUST stay within the user-specified domain(s) at all times.

  • Extract the domain from the user-provided URL (e.g., www.example.com from https://www.example.com/path)
  • NEVER navigate to any URL outside the specified domain(s)
  • If a link redirects to an external domain, DO NOT follow it - record it as "External link - not followed" in sources.md
  • Before every navigation, verify the target URL is within the allowed domain(s)
  • Subdomains of the main domain are permitted (e.g., investors.example.com is allowed if example.com was specified)

Domain validation checklist before each navigation:

✓ Is the URL on the user-specified domain or its subdomain?
✓ If not, DO NOT navigate - log the blocked URL and continue

2. Tool Usage Restrictions

PROHIBITED TOOLS - DO NOT USE:

  • WebFetch - Never use for any purpose during this skill
  • WebSearch - Never use for any purpose during this skill

REQUIRED: Use Playwright browser automation tools for all data gathering.

All navigation, page reading, interactions, and evidence capture must be performed through Playwright MCP tools. This ensures:

  • All navigation is visible and auditable
  • Domain restrictions can be enforced
  • Screenshots provide verifiable evidence
  • No external sources can be accessed inadvertently

Rationale: WebFetch and WebSearch can access any URL on the internet, bypassing domain restrictions. Playwright browser automation ensures all navigation is constrained to user-approved domains.

3. Screenshot Evidence Requirements

MANDATORY: Every piece of extracted financial data must have screenshot evidence.

  • Take a screenshot of each page containing financial data before extracting values
  • Save ALL screenshots to the designated reports/<company_name>/screenshots/ directory
  • Use descriptive, sequential filenames (e.g., 01-h1-2025-financials.png)

Required screenshot evidence (financial material only):

Content Type Screenshot Required
H1 2025 results/financials page ✅ Yes
H2 2024 results/financials page ✅ Yes
FY 2024 results page (if used for derivation) ✅ Yes
H1 2024 results page (if used for derivation) ✅ Yes
Any financial data table or summary ✅ Yes
PDF pages containing extracted numbers ✅ Yes

Evidence chain: Every value in Table B must be traceable to a specific screenshot file referenced in the Evidence column.

4. Compliance Verification

Before completing the extraction, verify:

  • All URLs in sources.md are within the user-specified domain(s)
  • No WebFetch or WebSearch tools were used
  • Every financial value has a corresponding screenshot in the evidence folder
  • All screenshots are saved in reports/<company_name>/screenshots/
  • Table B Evidence column references specific screenshot filenames

Output Directory Structure

All outputs must be saved in an organized directory structure. Before starting extraction, create the required directories.

reports/
└── <company_name>/
    ├── screenshots/
    │   ├── 01-h1-2025-results.png
    │   ├── 02-h1-2025-financials.png
    │   ├── 03-fy-2024-results.png
    │   ├── 04-fy-2024-financials.png
    │   ├── 05-h1-2024-results.png
    │   ├── 06-h1-2024-financials.png
    │   └── ...
    ├── report.md              # Final report with Tables A and B
    └── sources.md             # List of all source URLs used

Directory Setup

At the start of each extraction, run:

mkdir -p reports/<company_name>/screenshots

Replace <company_name> with a sanitized version of the company name (lowercase, hyphens instead of spaces, e.g., "Marks and Spencer" → "marks-and-spencer").

Screenshot Naming Convention

Save screenshots with sequential numbering and descriptive names. Only screenshot pages containing financial data:

  • 01-h1-2025-results.png - H1 2025 results page
  • 02-h1-2025-financials.png - H1 2025 financial highlights
  • 03-fy-2024-results.png - FY 2024 results page
  • 04-fy-2024-financials.png - FY 2024 financial highlights
  • 05-h1-2024-results.png - H1 2024 results page (for derivation)
  • 06-h1-2024-financials.png - H1 2024 financial highlights

Use Playwright screenshot tool with the filename parameter pointing to the appropriate path:

filename: "reports/<company_name>/screenshots/01-h1-2025-results.png"

Screenshot Hygiene Rules

  1. Always save to the correct directory - Never save screenshots to the root or working directory
  2. Use sequential numbering - Prefix with two-digit numbers (01, 02, 03...) to maintain order
  3. Use descriptive names - Include what the screenshot shows (e.g., "h1-2025-financials" not "screenshot3")
  4. Capture financial evidence only - Take screenshots of:
    • Financial highlights/summary tables
    • Results announcement pages with key figures
    • Any page you extract numbers from
  5. PNG format preferred - Use PNG for clarity of text and numbers

Inputs (provided at runtime)

  • company_name (string) - Name of the company
  • company_website_url (required) - Company's official website URL. This is mandatory as all data must be sourced exclusively from the official company website.

Required Output (must produce two markdown tables)

Table A: Source metadata (columns = periods)

Field H1 2025 H2 2024
Document title
Period covered (exact dates if shown)
Currency + units (e.g. £m, $m)
Source URL (final page or PDF)
Evidence pointer (PDF page + table name, or HTML section heading)
Notes (reported vs adjusted, any caveats)

Table B: Financials matrix (rows = line items, columns = periods)

Line item H1 2025 H2 2024 Notes (reported/adjusted/derived) Evidence (page/section)
Revenue
Operating profit (or EBIT)
Profit before tax
Net profit attributable (or profit for period)
EPS basic
Free cash flow (or closest stated proxy)
Net debt / (net cash)
Dividend (declared/paid for period)

Workflow

Step 0: Setup output directories

Before any browser navigation, create the output directory structure:

  1. Sanitize the company name (lowercase, replace spaces with hyphens)
  2. Create directories using Bash:
    mkdir -p reports/<company_name>/screenshots
    

Step 1: Reach the official IR area

  1. Navigate to company_website_url

    • The URL must be provided by the user. If not provided, ask the user for the official company website URL before proceeding.
  2. Take a snapshot to understand the page structure

  3. Clear blockers:

    • Accept cookies if a consent banner appears (look for "Accept", "Accept all", "I agree" buttons)
    • Close modals or popups
    • Dismiss region/country prompts
  4. Find "Investors / Investor Relations / Results / Reports" via:

    • Top navigation bar
    • Hamburger/mobile menu (may need to click to expand)
    • Footer links
    • Site search functionality (keywords: investor, results, reports, financial)
  5. Click on the IR link to navigate to the Investor Relations section

Success check: You land on an IR section showing Results, Reports, Announcements, or Presentations.

Step 2: Find H1 2025

Within IR, look under sections like:

  • "Results & presentations"
  • "Reports"
  • "Financial results"
  • "Announcements"
  • "Regulatory news"

Scan link titles for H1 2025 indicators:

  • "H1 2025"
  • "Half-year 2025"
  • "Interim results 2025"
  • "Six months ended [date] 2025"
  • "Interim report 2025"
  • "1H 2025"
  • "First half 2025"

Open the most authoritative source:

  • Prefer a results release/announcement + any linked PDF
  • Capture the final URL of the page/PDF used
  • Take screenshots of financial data:
    filename: "reports/<company_name>/screenshots/01-h1-2025-results.png"
    filename: "reports/<company_name>/screenshots/02-h1-2025-financials.png"
    

Step 3: Find H2 2024 (handle the common "no standalone H2" case)

First, try to find an explicit H2/second-half document:

  • "H2 2024"
  • "Second half 2024"
  • "Six months ended [date] 2024" (corresponding to H2 of their fiscal year)
  • "2H 2024"

If no standalone H2 exists, use the derivation fallback:

  1. Find FY 2024 results (full-year results announcement or annual report)

    • Take screenshot: reports/<company_name>/screenshots/03-fy-2024-results.png
    • Take screenshot of financials: reports/<company_name>/screenshots/04-fy-2024-financials.png
  2. Find H1 2024 interim results

    • Take screenshot: reports/<company_name>/screenshots/05-h1-2024-results.png
    • Take screenshot of financials: reports/<company_name>/screenshots/06-h1-2024-financials.png
  3. Derive additive H2 metrics using formula:

    H2 2024 = FY 2024 - H1 2024
    
  4. In Table A Notes and Table B Notes, clearly mark values as Derived and cite both FY and H1 sources

Important: Do NOT derive balance-sheet endpoints (like net debt) unless the document explicitly provides an H2 end-of-period figure. Balance sheet items are point-in-time, not cumulative.

Step 4: Extract numbers with evidence

For each period (or derived period):

  1. Identify the clearest source:

    • "Financial highlights" section
    • Consolidated income statement
    • Key figures / at a glance section
    • Results summary table
  2. For each line item, record:

    • Value (the actual number)
    • Currency and units (e.g., £m, $bn, EUR millions)
    • Type: Whether it is reported or adjusted/underlying (do not mix without labelling)
  3. For every captured number, store an evidence pointer:

    • PDF: page number + table name (e.g., "p.3 Financial highlights")
    • HTML: section heading + nearby table label (e.g., "Key figures section, Results summary table")

Step 5: Compile and save the output

  1. Create Table A with all source metadata
  2. Create Table B with all financial line items
  3. Ensure every cell has proper evidence pointers
  4. Mark derived values clearly with "Derived: FY2024 - H1 2024" in Notes column
  5. If any line items couldn't be found, note "Not disclosed" or "Not found"
  6. Save the final report to reports/<company_name>/report.md
  7. Save a sources file listing all URLs used to reports/<company_name>/sources.md

The final report.md should include:

  • Company name and extraction date
  • Table A (Source metadata)
  • Table B (Financials matrix)
  • Reference to screenshots folder for evidence

Browser Navigation Tips

Cookie Consent Patterns

Common button texts to look for:

  • "Accept", "Accept all", "Accept cookies"
  • "I agree", "Agree", "OK"
  • "Allow", "Allow all"
  • "Got it", "Understood"

Finding IR Links

Common navigation patterns:

  • Top nav: "Investors", "Investor Relations", "Shareholders"
  • Footer: "For Investors", "Investor Information"
  • May be under "About" or "Company" dropdown
  • Corporate sites often have separate IR subdomain (e.g., investors.company.com)

Results Page Patterns

Look for:

  • Tabbed interfaces (Results, Reports, Presentations)
  • Year filters or dropdowns
  • Date-sorted list of announcements
  • "Archive" sections for older reports

PDF Handling

When you encounter a PDF:

  • Note the URL for the Source URL field
  • If the PDF viewer is embedded, try to identify page numbers
  • Look for "Financial highlights" typically on pages 1-5
  • Income statement/P&L usually in first 10-15 pages

Example Usage

User: "Extract H1 2025 and H2 2024 financials for Unilever from https://www.unilever.com"

Expected workflow:

  1. Create directories: mkdir -p reports/unilever/screenshots
  2. Navigate to unilever.com
  3. Find and click Investors link
  4. Navigate to Results section
  5. Locate H1 2025 interim results, take screenshots → 01-h1-2025-results.png, 02-h1-2025-financials.png
  6. Locate FY 2024 and H1 2024 (to derive H2 2024), take screenshots → 03-fy-2024-results.png, 04-fy-2024-financials.png, 05-h1-2024-results.png, 06-h1-2024-financials.png
  7. Extract all financial metrics with evidence
  8. Save reports/unilever/report.md with Tables A and B
  9. Save reports/unilever/sources.md with all source URLs

Error Handling

  • If no URL provided: Ask the user to provide the official company website URL before proceeding. Do NOT search for or guess URLs.
  • If H1 2025 not yet published: Report "H1 2025 results not yet available as of [date]"
  • If company uses different fiscal year: Note the fiscal year end date and adjust period names accordingly
  • If no IR section found on the provided website: Report the issue to the user rather than searching external sources
  • If page requires login: Report "Login required - cannot access without authentication"
  • If numbers appear in multiple currencies: Note primary reporting currency and any conversions
Install via CLI
npx skills add https://github.com/brainqub3/os_browser_automation_skills --skill ir-financials-extractor
Repository Details
star Stars 1
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator