glm-ocr-skill - SKILL.md Agent Skill

name: glm-ocr-skill description: OCR and layout-parsing skill for reading local files or remote PDF/PNG/JPG/JPEG links and producing markdown plus local image assets. Use this skill whenever the agent needs to read PDF content, OCR images/screenshots, or perform document layout parsing. homepage: https://github.com/cs-qyzhang/glm-ocr-skill metadata: { "openclaw": { "emoji": "👓", "requires": { "bins": ["python3"], "env": ["GLM_API_KEY"] }, "primaryEnv": "GLM_API_KEY" }

}

OCR File/Image Extractor

Use this skill when the task needs:

Reading PDF text content
OCR for screenshots/images
Layout-aware parsing into Markdown

Use scripts/glm_ocr_extract.py as the default execution path.

First-Time Setup

Before first use, check whether <skill-dir>/.env exists.

If .env does not exist:

Treat the skill as not initialized.
Copy .env.example to .env.
Instruct the user to manually edit the newly created .env file, and display the exact absolute file path for their reference.

Example command:

cp .env.example .env

Run

python3 scripts/glm_ocr_extract.py <local-file-or-url> [--output-dir <dir>]

Input

Local files: pdf, png, jpg, jpeg
Remote links: http(s) URLs to the same file types

Outputs

result.md: markdown with remote image links rewritten to local relative paths
images/: downloaded image assets referenced by result.md
response.json: raw OCR API response for debugging or structured post-processing, including layout block details such as bbox_2d, label, and content
result.raw.md: original markdown returned by OCR service

Env

Require GLM_API_KEY in environment variables or .env.