name: cv-import description: "Extract a PDF/DOCX CV, enhance it with LLM, and transform it into the qwik-resume-editor JSON schema (Resume v2.0.0). Uses markitdown if available, falls back to LLM extraction." allowed-tools: [ Read, Write, Bash ]
CV Import — PDF to Resume JSON
Import any CV/resume PDF into the qwik-resume-editor JSON format in three stages: Extract → Enhance → Transform.
Usage
/cv-import /path/to/cv.pdf # writes src/data/default-resume.json (default)
/cv-import /path/to/cv.pdf --output resume.json # write to a custom file instead
Stage 0 — Validate Input
- Confirm the file path was provided. If not, ask the user:
"Please provide the path to your CV PDF." - Check the file exists:
test -f "<filepath>" && echo "exists" || echo "not found" - Accept
.pdf,.docx,.doc,.txt. Warn (but proceed) for other extensions.
Stage 1 — Extract Text
1a. Check for markitdown
markitdown --version 2>/dev/null && echo "MARKITDOWN_OK" || echo "MARKITDOWN_ABSENT"
1b-A. If markitdown is available
markitdown "<filepath>"
Capture the full stdout as the extracted markdown. markitdown handles multi-column PDFs and strips headers/footers well.
1b-B. If markitdown is absent — LLM extraction
Use the Read tool to open the PDF directly. Claude can read PDF files natively. If that fails (non-text PDF), instruct
the user:
markitdown is not installed. Install it for best results:
pip install markitdown
# or: pipx install markitdown
Falling back to native PDF reading via Claude…
After reading, extract all visible text, preserving approximate section order. Do not summarise at this stage — get everything.
Stage 2 — Enhance
After extraction, use LLM reasoning to:
- Identify sections — find headings that correspond to: Summary/Objective, Experience/Work History, Education, Skills, Languages, Certifications, Awards/Honors, References.
- Normalise dates — convert any format (
Jan 2020,01/2020,2020) to"YYYY-MM". Use""(empty string) for "Present" / "Current" / "Now". - Clean description text — convert bullet points to HTML
<ul><li>…</li></ul>. Wrap paragraphs in<p>…</p>. Preserve bold/italic if detectable. - Infer contact types — classify each contact detail:
- email address →
type: "email" - phone number →
type: "tel" - URL (http/https, LinkedIn, GitHub, portfolio) →
type: "url" - plain text (city, country) →
type: "string"
- email address →
- Infer language levels — map free-text proficiency to the enum:
Basic | Elementary | Intermediate | Conversational | Proficient | Advanced | Native - Score expertise — if expertise/competency levels are present, map to 0–100.
- Flag uncertain fields — add a
_reviewkey next to uncertain values with a note, e.g."_review_start": "date was '2020' — assumed January".
Stage 3 — Transform to Resume JSON
Produce a valid Resume object (schema version "2.0.0").
ID generation
Use this pattern for every id field: <prefix>_<7 alphanumeric chars>.
Generate sequential IDs: c_0000001, sec_0000001, exp_0000001, edu_0000001, grp_0000001, lng_0000001,
cer_0000001, awd_0000001, ref_0000001.
JSON schema
{
"version": "2.0.0",
"header": {
"name": "Full Name",
"title": "Professional Title / Headline",
"contacts": [
// Order: email, phone, LinkedIn, website, location last
{ "id": "c_0000001", "type": "email", "label": "you@example.com" },
{ "id": "c_0000002", "type": "tel", "label": "+1 234 567 8900" },
{ "id": "c_0000003", "type": "url", "label": "linkedin.com/in/you", "href": "https://linkedin.com/in/you" },
{ "id": "c_0000004", "type": "string", "label": "City, Country" }
]
},
"theme": { "paletteId": "orange-navy" },
"sections": [
{
"id": "sec_0000001",
"type": "summary",
"title": "Summary",
"visible": true,
"data": { "text": "<p>Professional summary as HTML…</p>" }
},
{
"id": "sec_0000002",
"type": "experience",
"title": "Experience",
"visible": true,
"data": {
"items": [
{
"id": "exp_0000001",
"title": "Job Title",
"company": "Company Name",
"location": "City, Country",
"start": "2021-03",
"end": "",
"description": "<ul><li>Achievement or responsibility</li></ul>"
}
]
}
},
{
"id": "sec_0000003",
"type": "education",
"title": "Education",
"visible": true,
"data": {
"items": [
{
"id": "edu_0000001",
"degree": "Bachelor of Science in Computer Science",
"school": "University Name",
"start": "2015-09",
"end": "2019-06"
}
]
}
},
{
"id": "sec_0000004",
"type": "skills",
"title": "Skills & Knowledge",
"visible": true,
"data": {
"groups": [
{
"id": "grp_0000001",
"label": "Programming Languages",
"skills": ["Python", "TypeScript", "Go"]
}
]
}
},
{
"id": "sec_0000005",
"type": "languages",
"title": "Languages",
"visible": true,
"data": {
"items": [
{ "id": "lng_0000001", "name": "English", "level": "Native" },
{ "id": "lng_0000002", "name": "Mandarin", "level": "Conversational" }
]
}
},
{
"id": "sec_0000006",
"type": "certifications",
"title": "Licenses & Certifications",
"visible": true,
"data": {
"items": [
{ "id": "cer_0000001", "name": "AWS Solutions Architect", "issuer": "Amazon Web Services" }
]
}
}
// Only include section types present in the CV.
// Available types: summary | experience | education | skills | languages |
// certifications | awards | expertise | references
]
}
Section ordering heuristic
Default order (omit sections not found in the CV):
summaryexperienceeducationskillsexpertiselanguagescertificationsawardsreferences
Skills grouping heuristic
If the CV lists skills as a flat list, group by inferred category:
- Programming Languages
- Frameworks & Libraries
- Tools & Platforms
- Databases
- Soft Skills / Other
If no clear grouping exists, use a single group "label": "Skills".
Expertise section
Only emit an expertise section if the CV explicitly shows percentage bars, star ratings, or numbered proficiency
levels. Map to 0–100:
- 5/5 stars → 100, 4/5 → 80, 3/5 → 60, 2/5 → 40, 1/5 → 20
- Percentages: use directly.
{
"id": "sec_000000X",
"type": "expertise",
"title": "Industry Expertise",
"visible": true,
"data": {
"items": [
{ "id": "xp_0000001", "label": "Machine Learning", "level": 80 }
]
}
}
Stage 4 — Output
- Write
src/data/default-resume.jsonin the project root (default behaviour). This is the file the app bundles as the seed shown on first visit and on Reset — no editor login required to see the result.- If
--output <file>was specified, write to that path instead.
- If
- Print the full JSON to the conversation so the user can review it.
- Summarise what was imported in a short table:
| Field | Value |
|---|---|
| Name | … |
| Sections found | experience (N jobs), education (N), skills (N groups), … |
| Dates normalised | N |
| Fields to review | N (marked with _review_* keys) |
Next steps — tell the user exactly these two steps to preview the resume without logging in:
Open
http://localhost:5173in the browser, open DevTools console (F12 / Cmd+Option+I), paste this one line, and press Enter:localStorage.removeItem("qwik-resume-editor:v2"); location.reload();This clears any cached resume from localStorage so the app falls back to the newly written
default-resume.json. The page reloads and shows the imported resume on the public home page — no password needed.
Error handling
| Problem | Action |
|---|---|
| File not found | Report path, ask user to confirm |
| PDF is scanned / image-only | Warn "no text layer detected"; suggest installing markitdown with OCR support (pip install markitdown[pdf]) or manual entry |
| Date cannot be parsed | Use "" and add "_review_date": "original: <raw text>" |
| Section heading ambiguous | Pick the closest match; add "_review_section": "guessed from: <heading text>" |
| markitdown exits with error | Fall back to LLM extraction; report the error message |
| Contact type unclear | Default to "string" type; note it in the summary |
Quality checks before output
- All
idfields are unique within the document - All dates are
"YYYY-MM"or""— no other formats -
typefields match the allowedSectionTypeenum exactly -
levelfields inlanguagesuse the exact enum values:Basic | Elementary | Intermediate | Conversational | Proficient | Advanced | Native -
levelinexpertiseitems is a number 0–100 -
descriptionfields contain HTML, not plain text with hyphens - No section type appears more than once in
sections -
versionis exactly"2.0.0" -
theme.paletteIdis one of:orange-navy | navy-black | green-charcoal | purple-slate | magenta-graphite | mono - JSON is valid (no trailing commas, no
//comments in the actual output)