name: paleography-analysis description: Reusable paleographic manuscript-analysis workflow for genealogy source images. Use before tasks involving handwritten wills, parish registers, deeds, court books, FamilySearch manuscript images, paleographic transcription, ambiguous letterforms, crop/snippet generation, image enhancement sheets, comparator sweeps, or recurring recognition lessons for Gurney/Gournay variants.
Paleography Analysis
Start Here
Use this workflow before manuscript transcription or crop work. The goal is to spend local compute on generous candidate images and spend AI attention on reading, comparison, and evidence discipline.
Required preflight:
.\tools\bootstrap_python_toolchain.cmd -CheckOnly
If packages are missing and the user has approved environment setup:
.\tools\bootstrap_python_toolchain.cmd
Use the .cmd wrapper from PowerShell. It runs the repo script with a per-process execution-policy bypass, avoiding the local Windows policy block on direct .ps1 invocation.
Use only the repo-local interpreter for Python commands:
.\.venv\Scripts\python.exe tools\paleography_image_workbench.py ...
Image Workflow
- Identify the source image and provenance: record URL/ARK/DGS/image number, repository citation, and whether the image is a local-only master.
- Run
infoto capture dimensions before guessing coordinates. - Generate a full-page or page-half grid before detailed cropping.
- Use an approximate target region to produce a crop ladder with line strips, not one hand-tuned crop.
- Use enhancement sheets for reading only; cite/transcribe from the primary image and keep uncertainty visible.
Useful commands:
.\.venv\Scripts\python.exe tools\paleography_image_workbench.py info image.jpg
.\.venv\Scripts\python.exe tools\paleography_image_workbench.py grid image.jpg --out snippets\image-grid.png
.\.venv\Scripts\python.exe tools\paleography_image_workbench.py crop-ladder image.jpg --box 1200,1800,2200,420 --out-dir snippets --prefix packet-01-target --scale 2 --line-strips --tile-max-width 900
.\.venv\Scripts\python.exe tools\paleography_image_workbench.py sheet image.jpg --box 1200,1800,2200,420 --out snippets\packet-01-target-sheet.png --scale 2 --tile-max-width 900
.\.venv\Scripts\python.exe tools\paleography_image_workbench.py manifest-summary snippets\packet-01-target-manifest.md --selected-only
For PDFs:
.\.venv\Scripts\python.exe tools\paleography_image_workbench.py render-pdf source.pdf --page 1 --dpi 300 --out images\source-page-001.png
Crop Discipline
Prefer three tiers:
- Locator crop: wider/taller context, usually 3-5 lines, for orientation and comparator hunting.
- Reading strip: one target line with enough padding for ascenders, descenders, word spacing, and nearby ink.
- Token crop: a small ambiguous word or letter, always paired with the reading strip.
Default posture:
- Prefer too much context for analysis; make tight crops only after the reading is stable.
- Do not keep playing coordinate hot-and-cold. If a crop is off, generate a grid or a crop ladder.
- If a strip contains multiple lines, run
line-stripsorcrop-ladder --line-stripsand choose from the contact sheet. - If automatic line stripping collapses dense or slanted manuscript into one large strip, use manual bands from the grid:
line-strips --line-count Nwhen the target box has a clear number of lines.line-strips --line-height Nwhen the line spacing is regular.band-ladder --band label:x,y,width,height --band label:x,y,width,heightwhen you want hand-picked full-image boxes from the grid.
- If segmentation is uncertain, keep multiple labelled candidates rather than picking a single "best" crop.
- Record coordinates through the generated manifest so crops can be reproduced.
- Use
--tile-max-width 800to1100for fast review sheets; use--tile-max-width 0only when a full-width contact sheet is worth the file size. - Use
manifest-summaryto paste selected crop names and coordinates into packet reports instead of hand-copying them.
Transcription Posture
- Treat the primary image as the authority.
- Treat machine transcripts, OCR, FTS snippets, and poor inherited transcriptions as locator aids only unless the image confirms them.
- Use poor transcripts to find names, neighboring words, approximate page position, and formulaic language.
- Preserve uncertain letters with brackets or alternatives, and attach confidence to the specific reading, not to the whole document.
- Use comparator words from the same hand, same page, or same film before relying on general letterform memory.
- Separate what the line says from what the genealogical finding implies.
Outputs
During staging, keep generated snippets near the image packet, normally under sources/intake/paleography-staging/ or the current intake session folder.
After a batch is promoted, follow .claude/skills/familysearch-fulltext-research/SKILL.md for disposition of packet reports, master images, working crops, and staging-path cleanup.
Substantive readings and findings belong in the relevant research file or sources/corpus_supplement/, not only in image manifests. Manifests are reproducibility aids, not research destinations.
Continual Improvement
At the end of each paleography task, decide whether a reusable lesson was learned. If yes, append a concise entry to references/recognition-notes.md.
Good entries include:
- collection, film, sourceId, or image range
- date range and document type
- hand or scribe clue, if known
- successful enhancement settings
- useful comparator words
- Gurney/Gournay variant or false friend
- failed reading to avoid rediscovering
Do not add one-off guesses or unresolved impressions. Keep durable reading lessons here; keep genealogical conclusions in research files.
Reference
Read references/recognition-notes.md when a task involves a previously seen collection, film, hand, Gurney-name variant, or recurring contrast problem.