name: job-scraper description: > Scrapes Danish job sites for new positions matching your profile. Deduplicates across runs. Triggers on: job scrape, find jobs, search jobs, new jobs, job search, scrape jobs, /scrape allowed-tools: Read, Write, Edit, Glob, Grep, WebFetch, WebSearch, Agent, AskUserQuestion
Job Scraper
How It Works
This skill searches multiple Danish job sites using targeted queries based on your profile, deduplicates against previously seen jobs and the application tracker, and presents new matches with a quick fit assessment.
Invocation
The user triggers this skill by saying things like:
- "Find new jobs"
- "Scrape for jobs"
- "Any new positions?"
- "/scrape"
Optional arguments:
- A focus area, e.g. "/scrape data science" or "/scrape geophysics"
- "broad" to run all search categories, e.g. "/scrape broad"
Execution Steps
Step 0: Load State
- Read
job_scraper/seen_jobs.json(create if missing - start with{"seen": {}}) - Read
job_search_tracker.csvto extract already-applied companies+roles - Read
search-queries.md(this directory) for the search strategy
Step 1: Search
Run WebSearch queries from search-queries.md. By default, run the top 3 priority categories. If the user said "broad", run all categories.
If the user specified a focus area (e.g. "data science"), prioritize queries from that category.
For each search:
- Use
WebSearchwith site-specific queries (jobindex.dk, linkedin.com/jobs, karriere.dk, etc.) - Target your configured geographic area
- Look for postings from the last 14 days
Step 2: Fetch & Parse
For each promising result from Step 1:
- Use
WebFetchto retrieve the job posting page - Extract: job title, company, location, posting date (or "recent"), URL, key requirements (brief), application deadline (if listed)
- Skip if the URL or company+title combo already exists in
seen_jobs.json - Skip if the company+role already appears in
job_search_tracker.csv
Step 3: Quick Fit Assessment
For each new job, do a rapid fit check (NOT the full evaluation from 04-job-evaluation.md - just a quick signal):
- High match: Role directly involves your core skills
- Medium match: Role is adjacent to your experience
- Low match: Role requires significant skills you lack
Step 4: Deduplicate & Store
- Add ALL fetched jobs (new and skipped) to
seen_jobs.jsonwith structure:
{
"seen": {
"<url_or_company_title_key>": {
"title": "...",
"company": "...",
"url": "...",
"first_seen": "YYYY-MM-DD",
"fit": "high/medium/low",
"status": "new/skipped/evaluated"
}
}
}
- Only present jobs NOT already in the seen list or tracker.
Step 5: Present Results
Present new jobs in a table sorted by fit (high first):
## New Job Matches - YYYY-MM-DD
Found X new positions (Y high, Z medium, W low match).
| # | Fit | Title | Company | Location | Deadline | URL |
|---|-----|-------|---------|----------|----------|-----|
| 1 | High | ... | ... | ... | ... | [Link](...) |
### High-Match Highlights
For each high-match job, add 2-3 bullet points:
- Why it matches your profile
- Key requirements to check
- Any red flags
After presenting, ask:
"Want me to evaluate any of these in detail? Just give me the number(s)."
If the user picks a number, invoke the job-application-assistant skill workflow (fit evaluation first, then CV + cover letter if approved).
Step 6: Update Tracker (Optional)
If the user decides to apply to any job, add a row to job_search_tracker.csv.
Important Rules
- Never fabricate job postings. Only present jobs found via actual WebSearch/WebFetch results.
- Respect deduplication. Always check seen_jobs.json AND job_search_tracker.csv before presenting.
- Focus on configured geographic area. Skip jobs that require relocation or are clearly outside commute range.
- Only open positions. Skip postings with expired deadlines or those marked as closed.
- Be efficient with WebFetch. Don't fetch every search result - use titles and snippets to pre-filter before fetching.
- Parallel searches. Use the Agent tool or parallel WebSearch calls to speed up the search phase.