name: ocr description: OCR image files using the Qwen VL model via the bundled scripts/ocr.py (lives in this skill's directory, NOT your cwd). Use when the user asks to extract text from an image, perform OCR on a photo or screenshot, or recognize characters in an image file (.jpg, .jpeg, .png, .gif, .webp). Requires QWEN_API_KEY and QWEN_BASE_URL (env vars, or a .env in the skill directory).
OCR Skill
Extract text from images using scripts/ocr.py (Qwen VL OCR model).
Path note: this skill's files live in the directory containing this
SKILL.md — written as <skill-dir> below. It is shown on the
Skill directory: line when the skill is activated (the activation message
also lists the script's absolute path). The script is at
<skill-dir>/scripts/ocr.py, NOT in your current working directory.
Prerequisites
QWEN_API_KEY and QWEN_BASE_URL, read in this order (first found wins):
- process environment variables
.envin your current working directory<skill-dir>/.env(recommended place to keep them)~/.env(user-wide fallback)
QWEN_API_KEY=your_key
QWEN_BASE_URL=https://your-base-url
No dependency setup needed: the script carries inline metadata (PEP 723), so
uv run resolves openai / python-dotenv automatically in any directory.
Do NOT run uv add — your cwd is usually not a Python project.
Usage
uv run "<skill-dir>/scripts/ocr.py" <image_file>
Substitute <skill-dir> with the absolute skill directory before running —
it is NOT a shell variable. Output is printed to stdout. If credentials are missing, the script exits
with an error telling you where to put them.
Workflow
- Confirm the image file path with the user if not provided
- Run the script with the image path
- Present the extracted text to the user
- If the user wants to save the output, write it to a
.txtfile
Notes
- Blurry or overexposed single characters are replaced with
? - Supported formats:
.jpg.jpeg.png.gif.webp