crawler-scraper

star 267

Use when adding new scraping targets to the crawler

hiroppy By hiroppy schedule Updated 2/13/2026

name: crawler-scraper description: Use when adding new scraping targets to the crawler

Crawler Scraper Skill

Checklist (MUST complete all)

  • Add URL to packages/meta/src/urls.ts
  • Create scraper function in apps/crawler/src/scrapers/
  • Define types in packages/db/src/types.ts
  • Add repository if new data storage needed
  • Integrate into apps/crawler/src/index.ts
  • Add unit tests for parsers

File Locations

Purpose Location
URLs packages/meta/src/urls.ts
Scrapers apps/crawler/src/scrapers/*.ts
Types packages/db/src/types.ts
Repositories packages/db/src/repositories/
Parsers apps/crawler/src/parsers.ts
Entry point apps/crawler/src/index.ts

Template

// apps/crawler/src/scrapers/my-feature.ts
import type { MyData } from "@mf-dashboard/db/types";
import type { Page } from "playwright";
import { mfUrls } from "@mf-dashboard/meta";
import { debug } from "../logger.js";
import { parseJapaneseNumber } from "../parsers.js";

export async function getMyData(page: Page): Promise<MyData> {
  debug("Getting my data from /path...");

  await page.goto(mfUrls.myFeature, {
    waitUntil: "domcontentloaded",
  });
  await page.waitForTimeout(2000);

  // Scraping logic here...
  const rows = page.locator("table tbody tr");
  const count = await rows.count();

  const results: MyItem[] = [];
  for (let i = 0; i < count; i++) {
    const row = rows.nth(i);
    const text = await row
      .locator("td")
      .first()
      .textContent({ timeout: 1000 })
      .catch(() => "");

    results.push({
      // parsed data
    });
  }

  return { items: results };
}

URL Registration

// packages/meta/src/urls.ts
export const mfUrls = {
  // existing urls...
  myFeature: "https://moneyforward.com/path/to/feature",
};

Testing

Test Types

Type When to Use Location
Unit Service-independent logic (parsers, data transformations) *.test.ts next to source
E2E Requires actual account/service interaction vitest.config.ts e2e project

Rules (MUST follow)

  • NEVER write tests that depend on actual data (personal financial data changes constantly)
  • Unit tests: Use hardcoded mock strings, not real scraped data
  • E2E tests: Only verify page navigation and element existence, not actual values

Running Tests

  • Unit: pnpm --filter @mf-dashboard/crawler test
  • E2E: pnpm --filter @mf-dashboard/crawler test:e2e
  • Local manual testing: SKIP_REFRESH=true pnpm --filter @mf-dashboard/crawler start
  • Debug scripts go in debug/ directory
  • Screenshots saved to debug/ directory

Notes

  • Use parseJapaneseNumber() for Japanese currency format (e.g., "1,234円" → 1234)
  • Use debug() from logger for debug output
  • Handle missing elements gracefully with .catch(() => defaultValue)
  • Always use { timeout: 1000 } for individual element queries to avoid hanging
Install via CLI
npx skills add https://github.com/hiroppy/mf-dashboard --skill crawler-scraper
Repository Details
star Stars 267
call_split Forks 31
navigation Branch main
article Path SKILL.md
More from Creator