name: crawler-scraper
description: Use when adding new scraping targets to the crawler
Crawler Scraper Skill
Checklist (MUST complete all)
File Locations
| Purpose |
Location |
| URLs |
packages/meta/src/urls.ts |
| Scrapers |
apps/crawler/src/scrapers/*.ts |
| Types |
packages/db/src/types.ts |
| Repositories |
packages/db/src/repositories/ |
| Parsers |
apps/crawler/src/parsers.ts |
| Entry point |
apps/crawler/src/index.ts |
Template
// apps/crawler/src/scrapers/my-feature.ts
import type { MyData } from "@mf-dashboard/db/types";
import type { Page } from "playwright";
import { mfUrls } from "@mf-dashboard/meta";
import { debug } from "../logger.js";
import { parseJapaneseNumber } from "../parsers.js";
export async function getMyData(page: Page): Promise<MyData> {
debug("Getting my data from /path...");
await page.goto(mfUrls.myFeature, {
waitUntil: "domcontentloaded",
});
await page.waitForTimeout(2000);
// Scraping logic here...
const rows = page.locator("table tbody tr");
const count = await rows.count();
const results: MyItem[] = [];
for (let i = 0; i < count; i++) {
const row = rows.nth(i);
const text = await row
.locator("td")
.first()
.textContent({ timeout: 1000 })
.catch(() => "");
results.push({
// parsed data
});
}
return { items: results };
}
URL Registration
// packages/meta/src/urls.ts
export const mfUrls = {
// existing urls...
myFeature: "https://moneyforward.com/path/to/feature",
};
Testing
Test Types
| Type |
When to Use |
Location |
| Unit |
Service-independent logic (parsers, data transformations) |
*.test.ts next to source |
| E2E |
Requires actual account/service interaction |
vitest.config.ts e2e project |
Rules (MUST follow)
- NEVER write tests that depend on actual data (personal financial data changes constantly)
- Unit tests: Use hardcoded mock strings, not real scraped data
- E2E tests: Only verify page navigation and element existence, not actual values
Running Tests
- Unit:
pnpm --filter @mf-dashboard/crawler test
- E2E:
pnpm --filter @mf-dashboard/crawler test:e2e
- Local manual testing:
SKIP_REFRESH=true pnpm --filter @mf-dashboard/crawler start
- Debug scripts go in
debug/ directory
- Screenshots saved to
debug/ directory
Notes
- Use
parseJapaneseNumber() for Japanese currency format (e.g., "1,234円" → 1234)
- Use
debug() from logger for debug output
- Handle missing elements gracefully with
.catch(() => defaultValue)
- Always use
{ timeout: 1000 } for individual element queries to avoid hanging