name: glm-ocr-skill description: OCR and layout-parsing skill for reading local files or remote PDF/PNG/JPG/JPEG links and producing markdown plus local image assets. Use this skill whenever the agent needs to read PDF content, OCR images/screenshots, or perform document layout parsing. homepage: https://github.com/cs-qyzhang/glm-ocr-skill metadata: { "openclaw": { "emoji": "👓", "requires": { "bins": ["python3"], "env": ["GLM_API_KEY"] }, "primaryEnv": "GLM_API_KEY" }
}
OCR File/Image Extractor
Use this skill when the task needs:
- Reading PDF text content
- OCR for screenshots/images
- Layout-aware parsing into Markdown
Use scripts/glm_ocr_extract.py as the default execution path.
First-Time Setup
Before first use, check whether <skill-dir>/.env exists.
If .env does not exist:
- Treat the skill as not initialized.
- Copy
.env.exampleto.env. - Instruct the user to manually edit the newly created
.envfile, and display the exact absolute file path for their reference.
Example command:
cp .env.example .env
Run
python3 scripts/glm_ocr_extract.py <local-file-or-url> [--output-dir <dir>]
Input
- Local files:
pdf,png,jpg,jpeg - Remote links:
http(s)URLs to the same file types
Outputs
result.md: markdown with remote image links rewritten to local relative pathsimages/: downloaded image assets referenced byresult.mdresponse.json: raw OCR API response for debugging or structured post-processing, including layout block details such asbbox_2d,label, andcontentresult.raw.md: original markdown returned by OCR service
Env
Require GLM_API_KEY in environment variables or .env.