name: pptx description: Read, create, or edit PowerPoint .pptx decks — build slides from an outline, extract slide text/speaker notes, edit shapes/tables/charts, replace images, or export to PDF/images. Use whenever a .pptx (or .ppt) file is an input or output, or the user mentions a deck, slides, or a presentation. tags:
- tool
- office requires: sandbox: shell
pptx
Work with PowerPoint .pptx files using python-pptx (preinstalled). A .pptx
is a ZIP of XML parts; python-pptx handles the structure so you rarely touch XML.
Drop to raw OOXML only for the few things the library can't express (see Advanced).
All work happens in the sandbox exec shell, in the current workspace dir where
uploaded files land. Write a short Python heredoc or temp .py and run it.
After exec completes, use the Generated artifacts URL from the tool result in
the final answer so the user can download the deck.
Mental model
- A presentation has slides; each slide is built from a layout; layouts
live on slide masters. Layouts define placeholders (title, body,
picture, etc.) by
idxand type. - A slide holds shapes: placeholders, text boxes, pictures, tables, charts.
- Shapes with text expose
.text_frame→.paragraphs→.runs. A run is the unit that carries formatting (font, size, bold, color). - Units are EMU. Use the helpers:
from pptx.util import Inches, Pt, Emu.
Read / extract
from pptx import Presentation
prs = Presentation("deck.pptx")
print(len(prs.slides), prs.slide_width, prs.slide_height) # EMU dims
for i, slide in enumerate(prs.slides, 1):
print(f"--- slide {i} (layout: {slide.slide_layout.name}) ---")
for shape in slide.shapes:
if shape.has_text_frame:
print(shape.text_frame.text) # \n-joined paragraphs
elif shape.has_table:
for row in shape.table.rows:
print([c.text for c in row.cells])
if slide.has_notes_slide:
notes = slide.notes_slide.notes_text_frame.text
if notes:
print("NOTES:", notes)
Iterate slide.placeholders to see placeholder idx / placeholder_format.type.
For a fast text-only dump, just collect shape.text_frame.text across slides.
Create from an outline
List the layouts first — indices vary by template. With the default template, layout 0 = Title, 1 = Title+Content, 5 = Title Only, 6 = Blank.
from pptx import Presentation
from pptx.util import Inches, Pt
prs = Presentation() # or Presentation("template.pptx") to inherit a theme
for idx, lay in enumerate(prs.slide_layouts):
print(idx, lay.name, [(p.placeholder_format.idx, p.name) for p in lay.placeholders])
# Title slide
s = prs.slides.add_slide(prs.slide_layouts[0])
s.shapes.title.text = "My Deck"
s.placeholders[1].text = "Subtitle" # idx from the listing above
# Title + bullets
s = prs.slides.add_slide(prs.slide_layouts[1])
s.shapes.title.text = "Agenda"
tf = s.placeholders[1].text_frame
tf.text = "First point" # first paragraph
for line, lvl in [("Second", 0), ("Sub-point", 1)]:
p = tf.add_paragraph(); p.text = line; p.level = lvl
prs.save("out.pptx")
Always set text via placeholders/shapes — never hand-write bullet glyphs (•);
indentation/bullets come from the layout via paragraph.level.
Add a free text box or picture on any slide:
tb = s.shapes.add_textbox(Inches(1), Inches(1), Inches(8), Inches(1))
r = tb.text_frame.paragraphs[0].add_run(); r.text = "Hi"; r.font.size = Pt(28); r.font.bold = True
s.shapes.add_picture("logo.png", Inches(0.5), Inches(0.5), height=Inches(1)) # omit w to keep ratio
Edit existing
Edit at the run level to preserve a run's formatting; rewriting
text_frame.text collapses to one run and drops inline formatting.
for slide in prs.slides:
for shape in slide.shapes:
if not shape.has_text_frame: continue
for para in shape.text_frame.paragraphs:
for run in para.runs:
if "{{NAME}}" in run.text:
run.text = run.text.replace("{{NAME}}", "Frank")
To delete a shape/placeholder: sp = shape._element; sp.getparent().remove(sp).
If the template has more slots than your data, remove the extra shapes entirely
rather than leaving empty placeholders.
Replace an image in place (keep size/position)
python-pptx has no direct setter; swap the bytes of the related image part. Read
the picture's r:embed rId off its <a:blip>, then overwrite the part's blob.
from pptx.oxml.ns import qn
for shape in slide.shapes:
if shape.shape_type == 13: # MSO_SHAPE_TYPE.PICTURE
blip = shape._element.find(".//" + qn("a:blip"))
rid = blip.get(qn("r:embed"))
with open("new.png", "rb") as f:
shape.part.related_part(rid)._blob = f.read()
Tables and charts
from pptx.util import Inches
tbl = s.shapes.add_table(rows=2, cols=2, left=Inches(1), top=Inches(1),
width=Inches(6), height=Inches(2)).table
tbl.cell(0,0).text = "Header"
from pptx.chart.data import CategoryChartData
from pptx.enum.chart import XL_CHART_TYPE
cd = CategoryChartData(); cd.categories = ["Q1","Q2","Q3"]
cd.add_series("Sales", (4.5, 5.5, 6.2))
s.shapes.add_chart(XL_CHART_TYPE.COLUMN_CLUSTERED, Inches(1), Inches(1),
Inches(8), Inches(4.5), cd)
Design (only when the user wants a polished deck, not a data dump)
- Pick a topic-specific palette and one accent; don't default to generic blue. One color should dominate. Dark title/closing slides, light content slides.
- Vary layouts across slides (two-column, stat callout, quote, section divider) — repeating one bullet layout reads as low-effort. Give most slides a visual element (image/chart/shape), not just title + bullets.
- Type scale: title 36-44pt, section headers 20-24pt, body 14-16pt. Bold headers and inline labels. Left-align body; center only titles. Keep >=0.5in margins.
- Set
text_frame.word_wrap = Trueand watch for overflow; long replacement text may spill. After rendering (see Export), inspect the images critically — assume there are overlap/overflow/contrast bugs and fix them before declaring done.
Export to PDF / images (optional, needs LibreOffice)
command -v soffice >/dev/null || { echo "soffice not available — cannot export"; exit 0; }
soffice --headless --convert-to pdf out.pptx
command -v pdftoppm >/dev/null && pdftoppm -jpeg -r 150 out.pdf slide && ls -1 "$PWD"/slide-*.jpg
Read the printed JPG paths with your image-view capability to verify visually.
Re-run conversion after every edit — the PDF won't reflect a changed .pptx
otherwise. If soffice (or pdftoppm) is absent, say so and skip export; do NOT
pip-install.
.ppt → .pptx
Legacy .ppt is a different binary format — python-pptx cannot open it. Convert
first if soffice exists, else tell the user it can't be processed:
command -v soffice >/dev/null && soffice --headless --convert-to pptx old.ppt || echo "need soffice for .ppt"
Advanced: raw OOXML (last resort)
Use only for things python-pptx can't do (e.g. exact-fidelity slide duplication,
gradient fills, theme color edits, untyped XML elements). python-pptx already
exposes each shape's XML via shape._element (lxml) — prefer surgical lxml edits
there over a full unzip when you can. For part-level surgery, unzip → edit the
XML part → re-zip with stdlib zipfile.
Package map: slide order in ppt/presentation.xml <p:sldIdLst>; slides in
ppt/slides/slideN.xml with rels in ppt/slides/_rels/slideN.xml.rels; layouts/
masters under ppt/slideLayouts, ppt/slideMasters; media in ppt/media/;
part types in [Content_Types].xml.
Critical invariants if you add/edit parts by hand — break one and PowerPoint reports the file as corrupt:
- Every new part is declared in
[Content_Types].xml(an<Override>for slides; a<Default>per media extension like png/jpeg). - Every cross-part link goes through a
_rels/*.rels<Relationship>; r:id refs in XML must resolve. Adding a slide means: write the part + its.rels, add the content-type override, add a<Relationship>inpresentation.xml.rels, and a<p:sldId>in<p:sldIdLst>. - IDs must be unique:
<p:sldId>ids, and shape ids (<p:cNvPr id=...>) within a slide.sldLayoutId/sldMasterIdare globally unique. - Whitespace: any
<a:t>with leading/trailing spaces needsxml:space="preserve". - Parse/serialize with lxml or
defusedxml; never naive string munging that mangles namespaces or pretty-prints into text nodes.
Minimal text edit by zip surgery (zip members can't be overwritten in place — rebuild the archive, swapping the one part):
import zipfile
target = "ppt/slides/slide1.xml"
with zipfile.ZipFile("in.pptx") as zin:
xml = zin.read(target).decode().replace("Old title", "New title")
with zipfile.ZipFile("out.pptx", "w", zipfile.ZIP_DEFLATED) as zout:
for item in zin.namelist():
zout.writestr(item, xml.encode() if item == target else zin.read(item))
Verify before done
Reopen the output with Presentation("out.pptx") and assert slide count / key
text — a clean reopen catches most corruption. For decks meant to look good,
also export and visually inspect (above). Check templates for leftover
placeholder text (xxxx, lorem, [insert ...]) and fix before declaring done.