name: px-image2pptx description: > Convert static images (slides, posters, infographics) to editable PowerPoint files. Pipeline: OCR detects text → classical CV textmask detects ink pixels → mask-clip ANDs with OCR bboxes (preserves illustrations) → LAMA inpaints clean background → python-pptx assembles editable text boxes with auto-scaled fonts and detected colors. Trigger on 'convert image to pptx', 'make slide editable', 'image to powerpoint', 'extract text from slide as editable', 'reconstruct slide', or when the user has a slide/poster image and wants an editable .pptx file. metadata: author: Jade Liu source: https://github.com/JadeLiu-tech/px-image2pptx risk: low
px-image2pptx: Image to Editable PowerPoint
What It Does
Converts a static image into an editable .pptx file where every text element is a selectable, editable text box over a clean inpainted background. The full pipeline runs in 8-16 seconds:
- OCR (PaddleOCR) — detects text regions with bounding boxes and content
- Textmask (classical CV) — finds text ink pixels via adaptive thresholding
- Mask-clip — ANDs textmask with OCR bboxes to preserve non-text elements
- Inpaint (LAMA) — reconstructs masked regions with neural inpainting
- Assemble — places editable text boxes with auto-scaled fonts and detected colors
When to Use This
| Scenario | Use px-image2pptx? |
|---|---|
| Slide with text on solid/flat background | Yes — best results |
| Slide with photo background | Yes — uses inpainting (warn about overlap areas) |
| Slide with solid background | Yes — use --skip-inpaint for speed |
| Chinese/multilingual slide | Yes — ch OCR handles both Chinese and English |
| Poster or infographic with text | Yes — works well if text is separate from graphics |
| Dense chart with axis labels on bars | Caution — removing text damages chart structure |
| Text directly overlapping a photo | Caution — inpainting artifacts likely on complex areas |
| Extract individual assets as PNGs | No — use px-asset-extract |
| Read text from image without PPTX | No — use OCR directly |
| Edit an existing .pptx file | No — use the pptx skill |
Installation
git clone https://github.com/JadeLiu-tech/px-image2pptx.git
cd px-image2pptx
pip install -e ".[all]"
Usage
CLI
# Full pipeline
px-image2pptx <image> -o <output.pptx>
# Chinese text
px-image2pptx <image> -o <output.pptx> --lang ch
# Skip inpainting (solid-bg slides)
px-image2pptx <image> -o <output.pptx> --skip-inpaint
# Pre-computed OCR (skips PaddleOCR dependency)
px-image2pptx <image> -o <output.pptx> --ocr-json text_regions.json
# Keep intermediate files
px-image2pptx <image> -o <output.pptx> --work-dir ./debug/
Python API
# One-line conversion
from px_image2pptx import image_to_pptx
report = image_to_pptx("slide.png", "output.pptx")
# With options
report = image_to_pptx(
"slide.png", "output.pptx",
lang="ch", # Chinese OCR
sensitivity=16, # textmask sensitivity
dilation=12, # textmask dilation
skip_inpaint=False, # True for solid-bg slides
work_dir="./debug/", # save intermediates
)
# Step-by-step
from px_image2pptx.ocr import run_ocr
from px_image2pptx.textmask import compute_masks
from px_image2pptx.inpaint import inpaint
from px_image2pptx.assemble import assemble_pptx
regions = run_ocr("slide.png", lang="ch")
tight, clipped, dilated = compute_masks(cv2.imread("slide.png"), regions)
background = inpaint(cv2.cvtColor(cv2.imread("slide.png"), cv2.COLOR_BGR2RGB), dilated)
report = assemble_pptx("slide.png", regions, "output.pptx", "background.png", tight)
CLI Options
| Option | Default | Description |
|---|---|---|
-o, --output |
output.pptx |
Output PPTX path |
--ocr-json |
Pre-computed OCR JSON (skips OCR) | |
--lang |
auto |
OCR language: auto, en, ch |
--sensitivity |
16 |
Textmask sensitivity (lower = more) |
--dilation |
12 |
Textmask dilation pixels |
--min-font |
8 |
Min font size in points |
--max-font |
72 |
Max font size in points |
--skip-inpaint |
Skip LAMA inpainting | |
--work-dir |
Save intermediate files |
OCR JSON Format
{
"text_regions": [
{
"id": 0,
"text": "Hello World",
"confidence": 0.95,
"bbox": {"x1": 100, "y1": 50, "x2": 400, "y2": 90}
}
]
}
Key Design Decisions
Font auto-scaling (90-94% of bbox width)
Uses PIL ImageFont for accurate Latin text measurement, 1.0x em for CJK. Bidirectional: shrinks if text overflows, grows if text is too small.
Text color from tight mask
Samples ink pixel colors from the pre-dilation mask (actual ink, no background bleed). Falls back to contrast-based sampling for light-on-dark text where textmask misses.
Column-aware line grouping
Two-pass: vertical proximity, then horizontal gap split (3x median height). Prevents merging left/right column text into one wide text box.
OCR-guided masking
Raw textmask detects ANY dark pixels (including illustration outlines, borders). Mask-clip ANDs with OCR bboxes so only confirmed text gets inpainted. Removes 30-90% of false positive mask pixels depending on image content.
Dependencies
| Package | Extra | Purpose |
|---|---|---|
| Pillow, numpy, opencv-python, python-pptx | core | Always needed |
| paddleocr, paddlepaddle | [ocr] |
Text detection |
| torch, simple-lama-inpainting | [inpaint] |
Background reconstruction |
Models
Models are downloaded automatically on first use (~370 MB total).
| Model | Size | License |
|---|---|---|
| PP-OCRv5_server_det (text detection) | 84 MB | Apache 2.0 |
| PP-OCRv5_server_rec (text recognition) | 81 MB | Apache 2.0 |
| big-lama (inpainting) | 196 MB | Apache 2.0 |
Performance
| Step | Time | Model |
|---|---|---|
| OCR (PaddleOCR PP-OCRv5) | 2-5s | 165 MB |
| Textmask + clip | 1-3s | None |
| Inpaint (LAMA) | 4-8s | 196 MB |
| Assembly | <0.2s | None |
| Total | 8-16s | ~370 MB |
Limitations — When to Warn the User
Before running the pipeline, assess the input image and set expectations:
| Input characteristic | Impact | What to tell the user |
|---|---|---|
| Text on solid/flat background | Best results | No caveats needed |
| Text on textured background (paper, fabric) | Good results | LAMA handles repeating textures well |
| Text overlapping photos or illustrations | Inpainting artifacts likely | Warn: "areas where text covers photos may show blurring or hallucinated details" |
| Chart/diagram with axis labels or legends | Removing text damages the chart | Warn: "text tightly coupled with the chart cannot be cleanly separated — consider --skip-inpaint" |
| Light text on dark background | Blockier inpainting | Warn: "white-on-dark text uses box masks instead of tight ink masks, which may leave rectangular artifacts" |
| WebP image | OCR will fail silently (0 regions) | Convert to PNG before processing: Image.open("input.webp").save("input.png") |
| Very large image (>4000px long side) | Inpainting may take minutes | Suggest --skip-inpaint or downscaling first |
| Decorative/handwritten fonts | Text content is correct, but typeface won't match | Warn: "fonts are reconstructed as Arial/Helvetica — decorative fonts are not preserved" |
| Centered or justified text | Layout will be left-aligned | Warn: "text alignment and paragraph formatting are not preserved" |
When NOT to use this skill
- The user wants to extract individual assets (icons, logos, illustrations) → use
px-asset-extract - The user wants to read text from an image without creating a PPTX → use OCR directly
- The input is already a .pptx file → use the
pptxskill to edit it directly - The image has no text (pure photo, illustration) → there's nothing for this pipeline to do