name: search-external-sites model: claude-sonnet-4-6 description: Generates search URLs for external genealogy sites (Ancestry, MyHeritage, FindMyPast, FindAGrave, Newspapers.com) and walks the user through the click-capture-analyze workflow. Logs each search to research.json (including nil results) and triages results from captured PDFs before passing records to record-extraction. GPS Step 1 — Reasonably Exhaustive Research (external site execution). Use when the user says "search Ancestry", "search MyHeritage", "search FindMyPast", "search FindAGrave", "search Newspapers.com", when a plan item targets a non-FamilySearch repository, or when the user uploads a PDF capture from an external genealogy site. Do NOT use when the target is FamilySearch (use search-records), when the user wants to plan what to search (use research-plan), or when the user wants to analyze a single record already in context (use record-extraction). allowed-tools: - place_search - external_links_search - research_log_append - research_append
Search External Sites
Narration: Read researcher_profile.narration_guidance from research.json and apply it as your narration style for this invocation. If absent, default to a one-line preamble per action.
Places: When resolving or writing places, follow references/places-guidance.md — resolve with place_search and record the standardPlace (and standard_place on persisted facts/assertions/events).
Ancestry, MyHeritage, FindMyPast, FindAGrave, and Newspapers.com have no public APIs and prohibit automated access. So this skill never loads a page itself — it builds a pre-filled search URL, the user clicks it in their own authenticated browser, captures the page as a PDF, and uploads it back. The agent supplies the genealogical expertise; the user's browser supplies the access.
Getting the search parameters right is the core of the task: a URL with the wrong name encoding, a missing date window, or the wrong collection sends the user to a dead end.
References to load when the moment arrives:
references/repository-types.md— before your first search, so you know why a negative online result never proves a record doesn't exist.references/search-strategy-external.md— Boolean techniques, spelling variants, and zero-hit recovery.references/evaluating-compiled-sources.md— the nine criteria for Find A Grave, member trees, and other user-contributed content.
The loop
- Generate a search URL with pre-filled parameters.
- Click — the user opens it in their authenticated browser.
- Capture — the user saves the page as PDF and uploads it.
- Analyze — you read the PDF, triage results, and hand promising records to record-extraction.
Repeat for each external-site plan item.
Before you search
Check subscriptions. Read researcher_profile.subscriptions in
research.json. Use it as a tie-breaker, never as a gate.
| Site | Subscription value |
|---|---|
| Ancestry.com | Ancestry |
| MyHeritage.com | MyHeritage |
| FindMyPast.com | FindMyPast |
| FindAGrave.com | free to search; FindAGrave-Plus adds features |
| Newspapers.com | Newspapers.com |
- If a plan item is repository-agnostic, prefer a site the researcher subscribes to — that search is immediately actionable.
- If the user explicitly names an unsubscribed site, generate the URL anyway and add one line: "You don't have a [SITE] subscription on file — the link will hit a login wall or a limited-results preview. Continue, or pick a subscribed site?" Flag, don't block.
- Profile absent or
subscriptions: ["none"]→ treat all sites equally, don't pester about subscriptions. - FindAGrave is always worth generating — basic search is free.
Classify the target. Is the collection an index (a pointer, not proof), a digitized original (carries evidentiary weight), or user-contributed content (a lead only)? Read the collection description — titles mislead about scope and completeness.
Supported sites
| Site | URL pattern | Notes |
|---|---|---|
| Ancestry.com | ancestry.com/search/collections/{id}/?params |
Largest indexed collection. Paid subscription |
| MyHeritage.com | myheritage.com/research?action=query¶ms |
Independent indexing. Paid subscription |
| FindMyPast.com | findmypast.com/search/results?params |
Strong UK/Ireland coverage. Paid subscription |
| FindAGrave.com | findagrave.com/memorial/search?params |
Cemetery records. Free. User-contributed — treat as compiled source |
| Newspapers.com | newspapers.com/search/?query=params |
Historical newspapers. Ancestry-owned. Paid subscription |
Steps
1. Find the plan item
Read research.json plans[] and pick the item(s) targeting an external
repository. Note the record type, the place, and the year window — you'll
match all three against the curated links below.
2. Resolve the place and fetch curated links
place_search({ placeName: "<place name>" })
Take the standardPlace from the response — do not guess it — and pass it
to external_links_search with the plan item's year window:
external_links_search({ standardPlace: "<standardPlace>", startYear: <year>, endYear: <year> })
What it returns. A flat results[] of { url, linkText } mixing
every curated site for the place (Ancestry, MyHeritage, FindMyPast,
FindAGrave, Newspapers.com, archives, FS wiki lists). It does not group by
site, classify by record type, dedupe, or break out collection IDs — the
collection ID is embedded in the URL path when present.
Consume it like this:
- Filter by host — keep links for your target site, e.g.
result.url.includes("ancestry.com"). - Dedupe by URL — FS repeats the same URL once per record-type category. Collapse duplicates.
- Match
linkTextto the plan item's record type.linkTextnames the collection in plain English ("Pennsylvania Wills and Probate Records"). This match is what step 3 acts on.
totalForPlace vs results.length:
results.length > 0→ a curated URL exists; go to Case A.- empty but
totalForPlace > 0→ FS curates this place but nothing overlaps your year window; widen the window or fall back (Case B). totalForPlace === 0→ no curated links here at all; fall back (Case B).
3. Build the URL
Case A — a curated URL exists for the target site.
First, confirm the curated link actually fits the plan item. Compare its
linkText record type against what you're searching for: a probate plan
item needs a probate/wills/estate collection, not a census or vital-records
one. If the only curated link is for a different record type, do not
present it as the search — that silently sends the user to the wrong
collection. Either pick a curated link whose linkText matches, or, if
none matches, fall back to Case B and say which record type you were
looking for.
When the record type matches, use that URL as the base and append your
search parameters — it already encodes the collection scope (Ancestry URLs
carry /search/collections/{id}/), so don't template the collection ID
separately:
{returned_url}?name=Patrick_Flynn&birth=1845&birthplace=Pennsylvania
If the base already has a query string, append with & instead of ?.
Case B — no curated URL fits (or none for the site).
Tell the user plainly: "No FamilySearch-curated link for [site] in [year window] — using the site-wide search instead." Then build from the site-wide template. These search the whole site index, not a scoped collection:
Ancestry.com
https://www.ancestry.com/search/?name={first}_{last}&birth={year}&birthplace={place}&residence={year}_{place}&father={first}_{last}&mother={first}_{last}&spouse={first}_{last}
name— given and surname, underscore-separatedbirth,death,marriage— event yearbirthplace,deathplace— location stringresidence— year and place, underscore-separatedfather,mother,spouse— relative names
MyHeritage.com
https://www.myheritage.com/research?action=query&first={first}&last={last}&birth_year={year}&birth_place={place}&death_year={year}&death_place={place}&father_first={first}&father_last={last}&mother_first={first}&mother_last={last}
FindMyPast.com
https://www.findmypast.com/search/results?firstname={first}&lastname={last}&yearofbirth={year}&keywordsplace={place}&eventyear={year}&fatherfirstname={first}&motherfirstname={first}
FindAGrave.com
https://www.findagrave.com/memorial/search?firstname={first}&lastname={last}&birthyear={year}&deathyear={year}&location={place}
Newspapers.com
https://www.newspapers.com/search/?query={first}+{last}&dr_year={year}&dr_place={place}
Parameter strategy (full guidance in
references/search-strategy-external.md):
- Unusual name → start broad (surname + place only).
- Common name → start narrow (add dates, relatives, a specific collection).
- Include only parameters you're confident about; omit uncertain ones.
- Add relative names when you have them (Ancestry weights them heavily).
- Widen with spelling variants or wildcards when a search returns little.
4. Log the search, then present the URL
The log entry is what makes the search part of the research record, so
write it before you hand over the URL — this step is not finished
until both exist. If you present the URL and stop, research.json shows
nothing happened: downstream skills, the user's next session, and the
"reasonably exhaustive" audit trail all go blind. The capture may never
come back; the log is the proof the search was launched.
Call research_log_append — it assigns the log_NNN id, stamps
performed, validates the whole project before persisting, and writes
research.json atomically (the log is append-only — see "Re-invocation"):
research_log_append({
projectPath: <absolute path of the current working directory>,
planItemId: "<pli_XXX or null>",
tool: "external_site",
query: { /* the params you encoded in the URL */ },
outcome: "partial",
resultsExamined: 0,
notes: "URL generated; awaiting user capture.",
externalSite: {
site: "<ancestry|myheritage|findmypast|findagrave|newspapers>",
urlGenerated: "<the exact URL you present below>",
captureReceived: false,
captureFilename: null
}
})
projectPath is the project folder you are already operating in (the
directory that contains research.json). Pass its absolute path.
External-site searches retain no result sidecar, so do not pass
stagedResultsRef.
outcome: "partial" + captureReceived: false mark it in-flight. When a
capture comes back you append a new entry (step 6) — never edit this
one.
If the call returns { ok: false, errors }, surface the errors and fix
the inputs rather than retrying blindly or hand-writing the entry; nothing
was written. On success the response carries the logId it assigned.
Then present the URL:
Search: 1850 Census on Ancestry for Patrick Flynn
Click this link to search: Ancestry — 1850 Census, Patrick Flynn
After the page loads:
- Scroll to the bottom of the page and back to the top (forces lazy-loaded results into view)
- Press Cmd+P (Mac) or Ctrl+P (Windows)
- Select "Save as PDF"
- Save the file and upload it here
If the page asks you to log in, please log in first, then click the link again.
5. Triage the captured PDF
When the user uploads a PDF, triage formally before any extraction:
List each result with its key attributes — name, age/birth year, location, record type, any visible record ID.
Classify the source — index (flag that the original must be located), digitized original, or user-contributed compiled source (apply
references/evaluating-compiled-sources.md).Rate each match against the subject's known attributes:
- Strong — name matches, age within ±3 years, correct jurisdiction.
- Possible — name variant, age close, same state / different county.
- No match — wrong gender, wrong decade, wrong state.
Present a numbered list with the rating and the reason for each, then ask which record to examine. For example:
I found 15 results. Three are strong:
- Patrick Flynn, age 5, in Thomas Flynn's household, Schuylkill County, PA — strong match
- Patrick Flyn, age 6, Allegheny County, PA — possible (spelling variant, different county)
- P. Flynn, age 4, Philadelphia, PA — possible (initial only)
Results 4–15 don't match (wrong ages/locations). Examine record #1?
For user-contributed sources, add a note separating photographed evidence (a headstone image) from contributor-entered text (dates, family links).
On selection, request the individual record. "Click result #1 to open the full record page, then save it as a PDF and upload it." That single-record PDF goes to record-extraction.
Don't send the raw search-results PDF straight to record-extraction — the user picks which records are worth examining.
6. Log results, including nil results
Every search gets logged in enough detail to reproduce it — site,
collection, all parameters, filters, results examined. When a capture
comes back, append a new research_log_append entry (never edit the
in-flight one from step 4). Set query to the same params, set
externalSite.captureReceived: true, and choose outcome from your
triage:
- Results found →
outcome: "positive",resultsExamined: <n>,externalSite.captureFilenameset;notessummarize the matches. - Nil result →
outcome: "negative". A search that legitimately finds nothing is a finding, not a failure. Innotesrecord what collection was searched, its known coverage gaps, and whether the absence is conclusive or whether undigitized/unindexed records may still exist ("not found online" ≠ "does not exist"). Never skip the log because "there was nothing to record." - No access (subscription/login wall the user can't pass) →
outcome: "error"with the reason; suggest the fallback plan item.
research_log_append({
projectPath: <absolute path of the current working directory>,
planItemId: "<pli_XXX or null>",
tool: "external_site",
query: { /* the same params you encoded in the URL */ },
outcome: "<positive|negative|error>",
resultsExamined: <n>,
notes: "<one-line summary of what the capture returned>",
externalSite: {
site: "<ancestry|myheritage|findmypast|findagrave|newspapers>",
urlGenerated: "<the URL you generated in step 4>",
captureReceived: true,
captureFilename: "<the uploaded PDF's filename, or null>"
}
})
If the call returns { ok: false, errors }, surface the errors and
correct the inputs — nothing was written.
Before calling a site exhausted on zero results, try at least two
variations (name variant, broader place, dropped parameter) — log each as
its own entry (references/search-strategy-external.md, "Exit criteria").
When online avenues are spent, remember undigitized records may still live
in courthouses, parish archives, and historical societies.
7. Update status and suggest the next step
Set the plan item to completed once the search is logged, found or not,
via research_append (it validates and writes atomically):
research_append({
projectPath: <absolute path of the current working directory>,
section: "plan_items",
op: "update",
planId: "<pl_XXX, the parent plan>",
entryId: "<pli_XXX, the plan item you searched>",
fields: { status: "completed" }
})
If it returns { ok: false, errors }, surface the errors rather than
hand-editing the status. Then offer the natural next move:
- More plan items → "Shall I continue with the next search?"
- A record worth examining → "Capture the full record page for result #1?"
- All done → "All planned searches are complete — evaluate whether research is exhaustive?"
- Nil result → "No matches on [site]; the plan's fallback is [next item]. Proceed?"
- Index hit → "This is an index entry — shall we locate the original image?"
- Compiled source → "This is user-contributed; add a plan item to verify it against originals?"
Handling capture problems
| Problem | Solution |
|---|---|
| PDF shows a login page | "Please log in to [site], then click the link again" |
| PDF cuts off results (lazy loading) | "Scroll to the bottom and back to the top before printing to PDF" |
| PDF missing images/thumbnails | "Record images may not print — screenshot the document viewer separately" |
| PDF links aren't clickable | Construct record URLs from visible record IDs/database names instead of extracted links |
| User can't access the site | Log outcome: "error" with the access limitation; move to the fallback plan item |
User-contributed sources
Find A Grave memorials, public member trees, and crowd-sourced indexes are
compiled sources. Apply the nine criteria in
references/evaluating-compiled-sources.md. In short: separate
photographed evidence from contributor-entered text, never cite them as
primary, and use them as leads — add a plan item to find the originals
they point to.
Re-invocation behavior
This skill writes only to research.json: a new append-only log[] entry
and the status on the matching plans[].items[]. It does not write
source or assertion entries — record-extraction does that when the user
returns a single-record capture.
Re-running a search is itself a logged event by design (the log is the
exhaustive-search audit trail), so always append a new log_ entry and
update the plan item's status. Never modify or delete a prior log_
entry; two runs of the same search correctly produce two entries.