fix-scraper

star 121

Diagnose and fix a broken scraper mapping. Use when extraction tests fail, a scraper returns wrong data, or a website has changed its HTML structure.

RealEstateWebTools By RealEstateWebTools schedule Updated 2/17/2026

name: fix-scraper description: Diagnose and fix a broken scraper mapping. Use when extraction tests fail, a scraper returns wrong data, or a website has changed its HTML structure. argument-hint: [scraper-name]

Fix a Broken Scraper Mapping

You are helping the user diagnose and fix a broken scraper mapping. This follows the workflow in astro-app/docs/scraper-maintenance-guide.md.

Inputs

$ARGUMENTS should be a scraper name (e.g. rightmove, idealista). If not provided, ask the user which scraper is broken, or run the full test suite to find failures.

Step-by-step workflow

Phase 1: Identify the problem

  1. Run the scraper's validation tests to see which fields are failing:

    cd astro-app && npx vitest run test/lib/scraper-validation.test.ts -t "<scraper_name>"
    
  2. If no scraper name was given, run the full validation suite and identify which scrapers have failures:

    cd astro-app && npx vitest run test/lib/scraper-validation.test.ts
    
  3. Read the test output and identify the specific fields that are failing. Note the expected vs actual values.

Phase 2: Diagnose root cause

  1. Read the scraper mapping at config/scraper_mappings/<name>.json. For each failing field, identify the extraction strategy (cssLocator, scriptRegEx, urlPathPart, scriptJsonPath, jsonLdPath).

  2. Read the HTML fixture at astro-app/test/fixtures/<name>.html. Search for the DOM elements the mapping targets.

  3. Check the manifest at astro-app/test/fixtures/manifest.ts to understand expected values.

  4. Identify the root cause using this checklist:

    Symptom Likely cause
    Field returns 0 or empty string CSS selector doesn't match any element
    Field returns concatenated garbage Selector matches multiple elements, missing cssCountId
    Field is wrong type (string vs int) Field defined in two sections, last one wins
    splitTextCharacter produces wrong result Cheerio normalizes <br> to \n, not \r
    scriptRegEx returns empty Pattern doesn't match any script tag content
    stripString doesn't strip enough Only removes one exact substring occurrence
    JSON-LD / scriptJsonPath returns empty Structure changed or variable name changed

Phase 3: Fix

  1. Edit the mapping JSON at config/scraper_mappings/<name>.json. Only change the specific fields that are broken. Common fixes:

    • Update CSS selectors to match current DOM structure
    • Add cssCountId when selector matches multiple elements
    • Fix splitTextCharacter (use \n not \r)
    • Update scriptRegEx pattern
    • Add fallback strategies via fallbacks array
    • Move field to correct section if type is wrong
  2. Update expected values in astro-app/test/fixtures/manifest.ts if the site content has legitimately changed.

  3. If the fixture HTML is outdated, offer to recapture it:

    cd astro-app
    npm run capture-fixture -- --file <new_html_file> --url <source_url> --name <scraper_name> --force
    

Phase 4: Verify

  1. Run the specific scraper's tests:

    cd astro-app && npx vitest run test/lib/scraper-validation.test.ts -t "<scraper_name>"
    
  2. Run the full test suite to check for regressions:

    cd astro-app && npx vitest run
    
  3. Present a summary of changes made and offer to commit.

Key references

  • Mapping format: DESIGN.md (Scraper Mapping Schema section)
  • Extraction pipeline order: defaultValues -> images -> features -> intFields -> floatFields -> textFields -> booleanFields
  • Maintenance guide: astro-app/docs/scraper-maintenance-guide.md
  • All strategies: cssLocator, scriptRegEx, urlPathPart, scriptJsonPath, scriptJsonVar, jsonLdPath, jsonLdType, flightDataPath
  • Post-processing: cssAttr, xmlAttr, cssCountId, splitTextCharacter, splitTextArrayId, stripString, stripPunct, stripFirstChar
  • Fallback chains: fallbacks array on any field mapping
Install via CLI
npx skills add https://github.com/RealEstateWebTools/property_web_scraper --skill fix-scraper
Repository Details
star Stars 121
call_split Forks 24
navigation Branch main
article Path SKILL.md
More from Creator
RealEstateWebTools
RealEstateWebTools Explore all skills →