name: daily-qa description: Daily 5am QA pass — 14-day sweep (including today) + data repair + flags research + QA email
Daily QA Pass
Reads today's Google Sheet tab, sweeps MassCourts for missed cases, fixes all flagged data, and emails a QA report.
Run directly in the current conversation — do NOT spawn subagents.
Architecture Overview
Python script (daily_list.py) — runs at 3:00am weekdays on Seth's Mac Mini:
- Scrapes all 11 MA probate counties + Servicemembers (SM) + Tax Lien (TL) on Land Court + foreclosure notices, writing RAW rows to the four STAGING ledgers. NO per-case ArcGIS during the scrape (that per-case cascade was the old 12-19 min bottleneck — now removed).
- PDFs → PNG via pymupdf, read by Claude Haiku for field extraction.
- Records skip / no-image decisions into staging so they aren't re-processed each run.
- THEN builds today's dated tab from staging by running
promote.py: ONE batched ArcGIS pass + dedup-by-key + write. Deterministic (plain script, no agent), so the list is reliably ready ~3:10am without depending on the 5am QA. - Log saved to
seth-memory/logs/daily_list_YYYY-MM-DD.log.
Claude QA (this skill) — runs at 5:00am weekdays:
- The tab is normally already built by the 3am job, so this is effectively a repairs pass.
- Phase 0 here is a fallback: it rebuilds the tab ONLY if today's tab is missing/empty (i.e. the 3am build failed); if the tab already has rows, it skips the build and goes to repairs.
- (The old 14-day MassCourts sweep is RETIRED — the 3am run covers it against staging.)
- Fixes all ⚠️ flagged rows: ArcGIS miss, owner ≠ defendants, likely renter.
- SOS lookups for entity owners; fills missing zips, reformats phones, cleans names.
- Records skips as a staging status (the "Skipped" tab is retired).
- Sends QA report email to jrabb@joehomebuyer.com.
The Python script is the primary data collector. This skill is the safety net + data repair layer.
⚠️ What the QA Pass Actually Does — Read This First
The QA pass has six jobs. All six run on every pass. Not four. Not "what I can easily automate." All six.
- ArcGIS consistency check — for every row, verify that the ArcGIS owner in E actually makes sense given the address in G/H. If the script wrote a wrong parcel, catch it here.
- Research all flags — every ⚠️ row gets researched and resolved. No row leaves with a flag that wasn't actively investigated. A decision is made: keep, skip, or update with a note.
- LLC/Trust/entity research — every entity in E or C/D gets a SOS lookup for the real person's name. Write C/D. Do not leave entity tokens like "Llc" or "Caribbean" in C/D.
- Fill all missing zips, cities, states — J, O, AE get filled. H gets corrected if it's a unit number or county name. Don't leave blanks that are trivially resolvable.
- Name cleanup — fix C/D trimming, casing, comma-split, typos, entity bleed-through.
- Cell format cleanup — phone numbers, zip truncation, G/L address cleanup (a/k/a, commas, notice text extraction).
"Couldn't fix" is only acceptable when every strategy has been genuinely tried and failed — not when the lookup is hard, not when it requires SOS, not when the address is ambiguous. Try everything first. If you truly can't resolve it, say what you tried.
⚠️ First-Pass Completeness Rule
Do everything on the first pass. Jordan should never have to prompt a second pass.
Zips (J, O, AE): Use the CITY_TO_ZIP table to fill any blank zip from the city. Cities like Buzzards Bay (02532), Norton (02766), Billerica (01821), Freetown (02717), Seekonk (02771), Conway (01341), etc. are all deterministic — fill them immediately, no lookup needed.
ArcGIS (missing E): Try ALL of these strategies before giving up:
LIKE '%NUM%NAME%' AND ZIP='zip'LIKE '%NUM%NAME%' AND CITY='city'LIKE '%NAME%' AND ZIP='zip'LIKE '%NAME%' AND CITY='city'UPPER(OWNER1) LIKE '%LASTNAME%'— search by defendant's last name if address fails If any of these return a result with an address number that matches (or owner name that matches), write E and clear the ⚠️ ArcGIS miss flag.
SOS lookup (entity owner): If E is an LLC/INC/TRUST and C/D is blank or wrong, do the SOS lookup in the same pass. Don't flag it — fix it. See Step 5b below.
Status flags: After filling E via ArcGIS, always clear the ⚠️ ArcGIS miss flag from col F. If ArcGIS owner ≠ defendant last name, set ⚠️ owner ≠ defendants instead.
Street address cleanliness (G and L): Strip anything that isn't the street number + street name + unit. Specifically:
- Remove city/town name if it appears after a comma (e.g.
"6 East Hodges Street, Norton"→"6 East Hodges Street") - Remove condo complex or building names (e.g.
"19 Holly Circle, Unit B, Tradewinds 20 Condominium"→"19 Holly Circle Unit B") - Remove a/k/a aliases and lot numbers (e.g.
"31 Woo Drive a/k/a 31 Woo Drive, Lot 10"→"31 Woo Drive") - Remove village qualifiers in parentheses (e.g.
"20 East Public Street, Assonet (Freetown)"→"20 East Public Street") - Remove commas before unit numbers (e.g.
"41 Boston Road, Unit 144"→"41 Boston Road Unit 144") - Remove duplicated street number (e.g.
"42 42 Union Street"→"42 Union Street") - Strip secondary address joined with "and" (e.g.
"10 Cranberry Road and 0 Elderberry Road"→"10 Cranberry Road") Apply the same rules to both G (Street) and L (CustStreet).
Step 0 — Fetch today's daily log from GitHub
Before reading the sheet, fetch the daily list log from GitHub and parse it for a run summary. Include this summary in the QA email at the end.
Log URL pattern:
https://raw.githubusercontent.com/jrabb-lgtm/JHB-Claude/main/seth-memory/logs/daily_list_{YYYY-MM-DD}.log
Use Chrome MCP — navigate the browser tab to the raw URL, then read via javascript_tool:
window.location.href = `https://raw.githubusercontent.com/jrabb-lgtm/JHB-Claude/main/seth-memory/logs/daily_list_${new Date().toISOString().slice(0,10)}.log`;
Wait ~2s for load, then:
const text = document.body.innerText;
const lines = text.split('\n');
const startLine = lines.find(l => l.includes('Daily List starting'));
const endLine = lines.filter(l => l.includes('Done.')).pop();
const startTime = startLine ? startLine.substring(0,19) : null;
const endTime = endLine ? endLine.substring(0,19) : null;
// Expected start is ~03:00; flag if after 04:00
const lateStart = startTime && parseInt(startTime.slice(11,13)) >= 4;
const rowsOk = lines.filter(l => l.includes('✓')).length;
const noImage = lines.filter(l => l.includes('No image found') || l.includes('No Images tab')).length;
const arcgisMiss = lines.filter(l => l.includes('ArcGIS miss flag') || l.includes('No ArcGIS match')).length;
const bondRE = lines.filter(l => l.includes('bond RE flag')).length;
const skipped = lines.filter(l => l.includes('append_skipped') || l.includes('Skipping') || l.includes('skipping')).length;
const errors = lines.filter(l => l.includes('[ERROR]')).length;
const warnings = lines.filter(l => l.includes('[WARNING]')).length;
const notableWarnings = lines.filter(l =>
(l.includes('[ERROR]') || l.includes('[WARNING]')) &&
!l.includes('No Search href') &&
!l.includes('ArcGIS')
);
window._logSummary = {
startTime, endTime, lateStart,
rowsOk, noImage, arcgisMiss, bondRE, skipped, errors, warnings,
notableWarnings: notableWarnings.slice(0, 20),
logFound: text.length > 100
};
"parsed"
If 404 or very short response → logFound: false, skip summary.
Step 1 — Determine today's tab name
from datetime import date
pull_str = date.today().strftime("%m/%d/%Y") # zero-padded: "05/04/2026"
Phase 0 — Promote staging → today's tab (redesign 2026-05-30)
Note (2026-05-31): the tab is normally already built by the 3am job. The scraper now runs
promote.pyitself right after scraping, so today's tab is built deterministically before this QA ever starts — the list is ready for upload without depending on this agent. Phase 0 here is a fallback: it will find nostagedrows (the 3am build consumed them) and no-op, UNLESS the 3am build failed, in which case it rebuilds them. Either way, run it, then proceed to repairs.
This runs FIRST. It reads the four staging ledgers, resolves ownership in ONE batch ArcGIS query,
builds the canonical 43-column rows, writes them to today's dated tab, and stamps each staging row's
status. written rows go in clean; needs-review rows go in with a ⚠️ flag in col F so the
existing repair/research steps below pick them up exactly as they do today.
Uses the ACTIVE deployment (the one with the staging actions) for staging + tab-build. The mechanical
promote uses the tested qa_promote / arcgis_batch logic so the 43-col placement is bug-proof.
SAFE NO-OP: if staging has no staged rows (e.g. the scraper's STAGING_ENABLED isn't on yet),
Phase 0 prints "no staged rows" and the QA falls through to its normal dated-tab repair below. So this
is safe to run any day during the transition.
FIRST — guard: is the tab already built? The 3am job normally builds today's tab itself, so on a normal day Phase 0 here is a no-op. Check before doing anything:
import os, requests
ACTIVE_URL = "https://script.google.com/macros/s/AKfycbx2nAPJS1yXZ1EaWge49JHkk80XQdtpFk9-ybNoPK0rqKPghCRZMCDgnQFoUCRShBXkPQ/exec"
rows = requests.post(ACTIVE_URL, timeout=60, json={"secret": os.environ.get("APPS_SCRIPT_SECRET",""),
"action": "read_tab", "tabName": pull_str}).json().get("rows", [])
print(f"today's tab already has {len(rows)} rows")
- If it already has rows → the 3am build succeeded. SKIP the entire Phase 0 build below and go straight to Step 2 (repairs). Do NOT rebuild — that would wipe a good tab.
- If it's empty/missing → the 3am build failed. Run the fallback build below, then repairs.
Fallback build — run as a detached BACKGROUND job and poll for completion — never inline.
A ~100-row build makes many sheet round-trips and can exceed a single command's time
limit. The standalone promote.py does the entire build (read staging → dedup staged rows by key → batched ArcGIS
→ build today's tab → stamp statuses) and prints PROMOTE COMPLETE (or PROMOTE FAILED)
when finished, so polling a short command is all that's needed.
Step A — launch it detached (returns immediately):
cd ~/daily-list 2>/dev/null || cd ~/jhb-repo 2>/dev/null || cd ~/JHB-Claude
LOG="$HOME/daily-list/qa_phase0_$(date +%F).log"; mkdir -p "$HOME/daily-list"
nohup python3 promote.py > "$LOG" 2>&1 &
echo "launched promote.py → $LOG"
Step B — poll until done. Run this short command repeatedly (each sleeps only 15s,
well under any timeout). Stop when you see 2 min):__DONE__; give up after ~8 tries (
LOG="$HOME/daily-list/qa_phase0_$(date +%F).log"
sleep 15; tail -n 6 "$LOG"; grep -q "PROMOTE COMPLETE\|PROMOTE FAILED" "$LOG" && echo "__DONE__"
Interpreting the result:
PROMOTE COMPLETE: wrote N rows→ Phase 0 done; the dated tab is built and staging statuses are stamped. Continue to Step 2.PROMOTE COMPLETE: 0 rows (nothing staged)→ nothing to promote (e.g. STAGING_ENABLED off); fall through to the normal dated-tab QA below.PROMOTE FAILED: ...→ read the traceback in$LOG, fix, and re-run Step A.
After Phase 0, continue below (those steps read today's tab and repair/research the ⚠️ rows).
The column index reference in Step 2 below is the canonical 43-column layout
(Preforeclosure Case = index 19 / col T, Auction Date = index 20 / col U).
Phase 0b — Integrity checks (run right after Phase 0)
Run these two checks as soon as the tab is built, before the row-by-row repairs.
1. Divergence check (existing) — flags rows whose Owner Name (E) doesn't contain the lead's Last Name (D):
python3 ~/JHB-Claude/qa_divergence_check.py
2. Sequence integrity + image-skip heal (added 2026-06-12) — finds holes in the SM/TL
case-number sequence (cases the scraper dropped at the collection step) and re-arms
image-failure skipped rows back to no-image so the 21-day sweep re-pulls them:
python3 ~/JHB-Claude/qa_sequence_check.py --heal
Parse its stdout:
SEQUENCE_GAPS: N— followed by oneSM gap: <case#>/TL gap: <case#>line per missing case. Each gap is a case the scraper dropped. Look it up with the case-number MassCourts recipe (in CLAUDE.md) and add it to the tab like any flagged row.REARMED_IMAGE_SKIPS: M— followed by onere-armed -> no-image: <case#>line per row re-armed. These will re-pull automatically on the next sweep; no manual action needed.
Capture both numbers and the case lists for the QA email (see Step 6 — Sequence integrity).
Step 2 — Read today's tab via Apps Script
import requests, os
APPS_SCRIPT_URL = os.environ.get("APPS_SCRIPT_URL", "")
APPS_SCRIPT_SECRET = os.environ.get("APPS_SCRIPT_SECRET", "")
resp = requests.post(APPS_SCRIPT_URL, json={
"secret": APPS_SCRIPT_SECRET,
"action": "read_tab",
"tabName": pull_str,
}, timeout=30)
rows = resp.json().get("rows", [])
print(f"Read {len(rows)} data rows from tab {pull_str!r}")
Column index reference (0-based) — 43 columns (CANONICAL):
0=A Pull Date 1=B File Date 2=C First Name 3=D Last Name
4=E Owner Name 5=F Status 6=G Street 7=H City
8=I State 9=J Zip 10=K Notes 11=L Cust Street
12=M Cust City 13=N Cust State 14=O Cust Zip 15=P Sales Date
16=Q Probate Type 17=R Probate Case# 18=S Tax Foreclosure 19=T Preforeclosure Case
20=U Auction Date 21=V Campaign Lists 22=W Lead Record Type 23=X Owner 1 Phone
24=Y Owner 1 Deceased 25=Z Owner 1 Notes 26=AA Rel1 Name 27=AB Rel1 Street
28=AC Rel1 City 29=AD Rel1 State 30=AE Rel1 Zip 31=AF Rel1 Phone
32=AG Rel1 CNAM 33=AH Rel1 Email 34=AI Rel1 Relationship 35=AJ Rel1 Notes
36=AK Rel2 Name 37=AL Rel2 Street 38=AM Rel2 City 39=AN Rel2 State
40=AO Rel2 Zip 41=AP Rel2 Phone 42=AQ Rel2 Email
⚠️ Note vs the OLD 42-col layout: Preforeclosure Case is now at 19 (T), Auction Date at
20 (U), and the Relative blocks shifted +1 (Rel1 Name = 26, Rel1 Phone = 31, Rel2 Phone = 41).
Row type detection (43-col):
row[17](R) has a probate case # → probate (E = ArcGIS OWNER1)row[19](T) has an## SM ######case # → SM (E = ArcGIS OWNER1)row[18](S) has a## TL ######case # → TLrow[20](U) has an auction date → FC
Step 2b — 14-Day Sweep — ❌ RETIRED, DO NOT RUN. Skip directly to Step 3.
This entire section is retired (2026-05-31). The 3am Python run now has its own 14-day sweep that dedups against the staging ledger, so re-running a 14-day MassCourts sweep here is redundant and slow. Do not execute any of the code in this section. Leave
sweep_stats = {"probate": 0, "sm": 0, "tl": 0}andsweep_new_rows = [], then go straight to Step 3. The rest of this section is kept only for historical reference.
(retired sweep — ignore)
Setup:
from datetime import date, timedelta
today = date.today()
sweep_from = today - timedelta(days=14)
sweep_to = today # QA includes today; Python only goes through yesterday
sweep_from_str = sweep_from.strftime("%-m/%-d/%Y")
sweep_to_str = sweep_to.strftime("%-m/%-d/%Y")
existing_cases = set()
for row in rows:
if row[17]: existing_cases.add(row[17].strip())
if row[10]:
k = row[10].strip()
if " - " in k:
existing_cases.add(k.split(" - ")[0].strip())
# CRITICAL (staging model): also treat everything already in the STAGING ledgers as
# "existing", so this sweep never re-pulls a case we've already seen — even if the old
# dated tabs were deleted. Staging is the single source of truth for what's been checked.
# (The 3am Python sweep already re-checks no-image cases, so include those here too to
# avoid double-processing them.)
for _tab in ("probate", "sm", "tl"):
try:
_res = acall({"action": "get_staging_keys", "tab": _tab})
for _r in (_res or {}).get("rows", []):
_k = str(_r.get("key", "")).strip()
if _k:
existing_cases.add(_k)
except Exception as _e:
print(f" WARN: could not load {_tab} staging keys for sweep dedup: {_e}")
sweep_new_rows = []
sweep_stats = {"probate": 0, "sm": 0, "tl": 0}
Probate Sweep — all 11 counties (BA BR DU ES FR MI NA NO PL SU WO)
For each county, run TWO passes: Formal Probate + Informal Probate.
- Navigate to
https://www.masscourts.org/eservices/home.page - Case Search → Search by Case Type → select Dept=Probate and Family Court, Div=[County]
- Case Type = Formal/Informal Probate, Filing Date From/To = sweep range, Submit
- Collect case hrefs, filter against
existing_cases - For each new case → fetch PDF (MPC 160) → send pages 0-1 to Claude Haiku → extract fields
2captcha for MassCourts CAPTCHA (use button.click() NOT form.submit()):
const sitekey = "6Ld9CuYlAAAAAPTI6mtGcd0843Pj2knpu9VsmS7Y";
// Submit job to 2captcha, poll for token, then:
document.getElementById('g-recaptcha-response').value = token;
document.querySelector('input[type="submit"], button[type="submit"]').click();
// NEVER use form.submit() — server rejects it even with a valid token
Build 42-element probate row (matches Python build_probate_row exactly):
row = [""] * 42
row[0] = today.strftime("%m/%d/%Y") # Pull Date
row[1] = file_date
row[2] = dec_first # C
row[3] = dec_last # D
row[4] = dec_full_name # E — decedent name (NOT ArcGIS OWNER1)
row[5] = lead_status # F — ⚠️ flag or blank
row[6] = prop_street # G
row[7] = prop_city # H
row[8] = prop_state or "MA" # I
row[9] = prop_zip # J
row[10] = case_num # K
row[16] = initiating_action # Q — Probate Type
row[17] = case_num # R — Probate Case#
row[25] = pet1_name # Z — Rel1 Name
row[26] = pet1_street # AA
row[27] = pet1_city # AB
row[28] = pet1_state # AC
row[29] = pet1_zip # AD
row[30] = pet1_phone # AE
row[32] = pet1_email # AG
row[33] = pet1_rel # AH
Bond gate: bond > 0 → skip renter check. Bond = 0 or blank → run ArcGIS; no match → ⚠️ ArcGIS miss; entity owner ≠ decedent last name → ⚠️ likely renter.
SM Sweep — Land Court, Servicemembers
Build row matching build_servicemembers_row:
row[10] = f"{case_num} - Servicemembers" # K — MUST include " - Servicemembers"
row[18] = "Yes" # S
# row[4] = ArcGIS OWNER1 (not defendant name)
TL Sweep — Land Court, Tax Lien
Same as SM but row[10] = f"{case_num} - Tax Lien". PDF label = "Complaint eFiled." Page 0 = cover, page 1 = complaint with "Assessed to:" and address.
Write sweep rows
if sweep_new_rows:
requests.post(APPS_SCRIPT_URL, json={
"secret": APPS_SCRIPT_SECRET,
"action": "append_rows",
"tabName": pull_str,
"rows": sweep_new_rows,
}, timeout=60)
# Re-read tab so Steps 3+ process the new rows
rows = requests.post(APPS_SCRIPT_URL, json={
"secret": APPS_SCRIPT_SECRET,
"action": "read_tab",
"tabName": pull_str,
}, timeout=30).json().get("rows", [])
Include in QA email: "Sweep: +N probate, +N SM, +N TL cases added"
(End of retired sweep section — resume here.)
Step 3 — Identify issues for each row
Always check:
| Column | Check | Fix |
|---|---|---|
| C (2) | More than 2 tokens? | Trim to First M. |
| D (3) | More than 2 tokens AND last not suffix? | Trim to first token |
| E (4) | Blank? | ArcGIS re-lookup using G+H |
| E (4) | Entity name AND C blank? | SOS lookup → C/D |
| J (9) | >5 digits? | Truncate to first 5 |
| J (9) | Blank AND H present? | CITY_TO_ZIP lookup |
| O (14) | >5 digits? | Truncate to first 5 |
| AF (31) | Not xxx-xxx-xxxx? |
Reformat |
| AP (41) | Not xxx-xxx-xxxx? |
Reformat |
Suffix whitelist for D: Jr, Jr., Sr, Sr., II, III, IV, V, Esq, Esq.
Foreclosure rows — past auction date check:
auction_dt = parse_auction_date(row[20]) # col U (index 20 = col U after Campaign Lists)
if auction_dt and auction_dt < today:
couldnt_fix.append(f"Row {sheet_row}: auction {row[20]} IS IN THE PAST — exclude")
Step 4 — Apply simple fixes
cell_fixes = {}
CITY_TO_ZIP = {
"ABINGTON":"02351","ACTON":"01720","ACUSHNET":"02743","AMHERST":"01002",
"AMESBURY":"01913","ANDOVER":"01810","ATHOL":"01331","ATTLEBORO":"02703",
"AYER":"01432","BARNSTABLE":"02630","BARRE":"01005","BERLIN":"01503",
"BEVERLY":"01915","BILLERICA":"01821","BLACKSTONE":"01504","BOSTON":"02101",
"BREWSTER":"02631","BRIDGEWATER":"02324","BROCKTON":"02301","BURLINGTON":"01803",
"CAMBRIDGE":"02139","CANTON":"02021","CENTERVILLE":"02632","CHATHAM":"02633",
"CHELMSFORD":"01824","CHICOPEE":"01013","CLINTON":"01510","COHASSET":"02025",
"DARTMOUTH":"02747","DEDHAM":"02026","DENNIS":"02638","DIGHTON":"02715",
"DORCHESTER":"02122","DOUGLAS":"01516","DRACUT":"01826","EASTON":"02334",
"EASTHAM":"02642","FALL RIVER":"02721","FALMOUTH":"02540","FITCHBURG":"01420",
"FRAMINGHAM":"01702","FRANKLIN":"02038","GARDNER":"01440","GLOUCESTER":"01930",
"GRAFTON":"01519","GREENFIELD":"01301","GROTON":"01450","HANSON":"02341",
"HARDWICK":"01037","HARVARD":"01451","HARWICH":"02645","HAVERHILL":"01830",
"HINGHAM":"02043","HOLBROOK":"02343","HOLDEN":"01520","HOLYOKE":"01040",
"HOPEDALE":"01747","HUDSON":"01749","JAMAICA PLAIN":"02130","LANCASTER":"01523",
"LAWRENCE":"01840","LEOMINSTER":"01453","LEXINGTON":"02420","LOWELL":"01851",
"LUNENBURG":"01462","LYNN":"01901","MALDEN":"02148","MANSFIELD":"02048",
"MARLBOROUGH":"01752","MARION":"02738","MARSHFIELD":"02050","MATTAPOISETT":"02739",
"MAYNARD":"01754","MEDFIELD":"02052","MEDFORD":"02155","MEDWAY":"02053",
"MENDON":"01756","METHUEN":"01844","MIDDLEBORO":"02346","MILFORD":"01757",
"MILLIS":"02054","MILLBURY":"01527","MILTON":"02186","NATICK":"01760",
"NEEDHAM":"02492","NEW BEDFORD":"02740","NEWBURYPORT":"01950","NEWTON":"02458",
"NORFOLK":"02056","NORTH ATTLEBORO":"02760","NORTH READING":"01864",
"NORTHAMPTON":"01060","NORTHBOROUGH":"01532","NORTHBRIDGE":"01534",
"NORWELL":"02061","NORWOOD":"02062","NORTON":"02766","OXFORD":"01540",
"PAXTON":"01612","PEABODY":"01960","PEPPERELL":"01463","PITTSFIELD":"01201",
"PLYMOUTH":"02360","QUINCY":"02169","RANDOLPH":"02368","RAYNHAM":"02767",
"READING":"01867","REHOBOTH":"02769","ROCKLAND":"02370","ROXBURY":"02119",
"RUTLAND":"01543","SALEM":"01970","SANDWICH":"02563","SCITUATE":"02066",
"SEEKONK":"02771","SHREWSBURY":"01545","SOMERVILLE":"02145","SPRINGFIELD":"01103",
"STONEHAM":"02180","STOUGHTON":"02072","SWANSEA":"02777","TAUNTON":"02780",
"TEWKSBURY":"01876","TOWNSEND":"01469","UXBRIDGE":"01569","WAKEFIELD":"01880",
"WALTHAM":"02451","WARE":"01082","WAREHAM":"02571","WATERTOWN":"02472",
"WELLESLEY":"02481","WESTBOROUGH":"01581","WESTFORD":"01886","WESTPORT":"02790",
"WEYMOUTH":"02188","WHITMAN":"02382","WILMINGTON":"01887","WOBURN":"01801",
"WORCESTER":"01610","WRENTHAM":"02093","YARMOUTH":"02675","YARMOUTH PORT":"02675",
"STERLING":"01564","ORANGE":"01364","GRANBY":"01033","SOUTH HADLEY":"01075",
"HAMPDEN":"01036","MONSON":"01057","WALES":"01081","BELCHERTOWN":"01007",
"LUDLOW":"01056","PALMER":"01069","WILBRAHAM":"01095","BUZZARDS BAY":"02532",
"CONWAY":"01341","FREETOWN":"02717","BILLERICA":"01821","NORTON":"02766",
}
SUFFIXES = {"jr","jr.","sr","sr.","ii","iii","iv","v","esq","esq."}
for i, row in enumerate(rows):
sheet_row = i + 2
fixes = {}
import re
# C: trim middle name to initial
first = row[2].strip() if len(row) > 2 else ""
if first:
parts = first.split()
if len(parts) >= 2 and len(parts[1].rstrip('.')) > 1:
fixes["C"] = f"{parts[0]} {parts[1][0].upper()}."
# D: comma-split or token trim
last = row[3].strip() if len(row) > 3 else ""
if last:
entity_kw = ("LLC","INC","TRUST","BANK","FUND","SOCIETY","A/K/A","HEIRS","DEVISEES","ESTATE","LEGAL REPRESENTATIVES")
if "," in last and not any(kw in last.upper() for kw in entity_kw):
d_part, c_part = last.split(",", 1)
d_part = d_part.strip().title()
c_part = re.sub(r'\s*\([^)]*(?:HEIRS|DEVISEES|ESTATE|LEGAL|REPRESENTATIVES)[^)]*\)', '', c_part.strip(), flags=re.IGNORECASE).strip().title()
if len(c_part.split()) <= 3 and d_part:
fixes["D"] = d_part
if c_part and not first:
fixes["C"] = c_part
else:
parts = last.split()
if not any(kw in last.upper() for kw in ("LLC","INC","TRUST","BANK","FUND","SOCIETY","A/K/A")):
if len(parts) >= 3 and parts[-1].lower() not in SUFFIXES:
fixes["D"] = parts[0]
elif len(parts) == 2 and parts[-1].lower() not in SUFFIXES and len(parts[-1]) > 1 and len(parts[0]) > 2 and "-" not in parts[-1]:
fixes["D"] = parts[0]
# J: lead zip
zip_j = row[9].strip() if len(row) > 9 else ""
city = row[7].strip() if len(row) > 7 else ""
if zip_j and len(zip_j.replace("-","")) > 5:
fixes["J"] = zip_j[:5] if zip_j[:5].isdigit() else zip_j.split("-")[0]
elif not zip_j and city:
lz = CITY_TO_ZIP.get(city.upper(), "")
if lz: fixes["J"] = lz
# O: customer zip
zip_o = row[14].strip() if len(row) > 14 else ""
if zip_o and len(zip_o.replace("-","")) > 5:
fixes["O"] = zip_o[:5] if zip_o[:5].isdigit() else zip_o.split("-")[0]
# AE (30): Rel1 zip
zip_ae = row[30].strip() if len(row) > 30 else ""
if zip_ae and len(zip_ae.replace("-","")) > 5:
fixes["AE"] = zip_ae[:5] if zip_ae[:5].isdigit() else zip_ae.split("-")[0]
# AF (31): Rel1 phone
ph = row[31].strip() if len(row) > 31 else ""
if ph:
digits = re.sub(r"\D","",ph)
if digits.startswith("1") and len(digits)==11: digits=digits[1:]
if len(digits)==10:
fmt = f"{digits[:3]}-{digits[3:6]}-{digits[6:]}"
if fmt != ph: fixes["AF"] = fmt
# AP (41): Rel2 phone
ph2 = row[41].strip() if len(row) > 41 else ""
if ph2:
digits = re.sub(r"\D","",ph2)
if digits.startswith("1") and len(digits)==11: digits=digits[1:]
if len(digits)==10:
fmt = f"{digits[:3]}-{digits[3:6]}-{digits[6:]}"
if fmt != ph2: fixes["AP"] = fmt
if fixes:
cell_fixes[sheet_row] = fixes
Write via update_cells (col must be 1-based integer — use col_num() helper):
def col_num(letter):
n = 0
for c in letter.upper(): n = n * 26 + (ord(c) - ord('A') + 1)
return n
# C=3, D=4, E=5, J=10, O=15, AE=31, AF=32, AP=42
Step 5 — ArcGIS consistency check (every row)
For every row where E is non-blank and F has no renter flag:
- Number mismatch (street number in G ≠ ArcGIS SITE_ADDR number) → re-query and correct E
- Entity in E, no F flag → set F=
⚠️ entity owner — see K, do SOS lookup (Step 5b) - D (last name) not in E (word-boundary check) → set F=
⚠️ owner ≠ defendants
ArcGIS endpoint: https://services1.arcgis.com/hGdibHYSPO59RG1h/arcgis/rest/services/Massachusetts_Property_Tax_Parcels/FeatureServer/0/query
Always use LIKE '%NUM%NAME%' wildcards (not prefix match) — ArcGIS stores double spaces in addresses.
Always verify returned SITE_ADDR number matches G before writing to E.
Step 5b — SOS lookup for all entity owners
Run for every row where E ends with LLC/INC/CORP/REALTY/PROPERTIES/HOLDINGS/TRUST/etc. AND C/D is blank or nonsense.
- Navigate to
https://corp.sec.state.ma.us/CorpWeb/CorpSearch/CorpSearch.aspx - Search entity name, open most recent Annual Report PDF via PDF.js
- Find Section 7 (SOC SIGNATORY) or Section 8 (REAL PROPERTY) — Section 6 is often blank
- Parse
FIRST [MI] LAST→ write C=First, D=Last, K="Owner: [ENTITY]"
PDF.js pattern:
const script = document.createElement('script');
script.src = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.11.174/pdf.min.js';
document.head.appendChild(script);
// ... wait 3s, then fetch PDF and extract text via getTextContent()
Step 5c — Research all flagged rows
Skipping a case (staging model — the "Skipped" tab is RETIRED): Do NOT call append_skipped.
Instead, record the skip on the case's staging row and remove it from the daily tab:
Set the staging status (self-contained — the staging deployment is ACTIVE_URL; secret may be ""):
import os, requests ACTIVE_URL = "https://script.google.com/macros/s/AKfycbx2nAPJS1yXZ1EaWge49JHkk80XQdtpFk9-ybNoPK0rqKPghCRZMCDgnQFoUCRShBXkPQ/exec" requests.post(ACTIVE_URL, timeout=60, json={ "secret": os.environ.get("APPS_SCRIPT_SECRET", ""), "action": "set_status", "tab": TAB, "updates": [{"key": CASE_KEY, "status": "skipped:<reason>", "status_reason": "<detail>"}]})TAB=probate|sm|tl|fcCASE_KEY= the case number (probate = col R, TL = col S, SM = col T); for FC use thestreet|city|zipaddress key (lowercased).
Remove skipped rows from the dated tab using ONLY the existing
read_tab/clear_tab/append_rowsactions — never a new custom action. For every skip, first set col F =✅ skipped — see staging(viaupdate_cells). Then ONCE, at the very end of the QA (after all other cell fixes are written so the read picks them up), rebuild the tab keeping only non-skipped rows:rows = call({"action": "read_tab", "tabName": pull_str})["rows"] def _is_skipped(r): return str((r + [""] * 43)[5]).startswith("✅ skipped") keep = [r for r in rows if not _is_skipped(r)] if len(rows) - len(keep) > 0: # only if there are skips to remove call({"action": "clear_tab", "tabName": pull_str}) # clears data rows, KEEPS header call({"action": "append_rows", "tabName": pull_str, "rows": keep}) after = call({"action": "read_tab", "tabName": pull_str})["rows"] assert len(after) == len(keep), f"rebuild mismatch: {len(after)} != {len(keep)}"This mirrors how
promote.pybuilds the tab (clear_tab + append_rows), so it is safe and canonical, and it preserves every QA edit becausekeepis read AFTER those edits are written. Works for FC rows too (they're matched by the col-F marker, not a case number).⚠️ NEVER add a custom Apps Script action and redeploy the web app to do this. The deployed web app (the "Utilities" project) has its editor HEAD routinely OVERWRITTEN by throwaway helper functions, so the editor does NOT contain the full web-app source. Deploying a "New version" from that editor would publish a web app with only the stray helper + your snippet and WIPE every live action (create_tab / append_rows / update_cells / read_tab / set_status / …), breaking the entire daily list.
The skip now lives in staging (one place), not a separate Skipped tab. The dated tab no longer keeps skipped rows.
⚠️ ArcGIS miss
- Probate + bond=$0: check MassCourts docket for $0 real estate bond → confirmed renter → skip via staging (see "Skipping a case" above: status
skipped:renter), then remove the row from the daily tab - Otherwise: retry ArcGIS with all 5 strategies → statewide name search → Registry check → decide: found/renter/genuinely unknown
⚠️ owner ≠ defendants
- Registry of Deeds check (grantee + grantor)
- 18-month rule: sold <18mo = bank error/skip; sold >18mo = sub2 gone bad/keep
- MLS fallback if registry inconclusive
⚠️ likely renter
HARD RULE: Every ⚠️ likely renter row MUST be skipped during QA — skip via staging
(see "Skipping a case" above: status skipped:renter) and remove the row from the daily tab.
- Entity building owner with no surname match → immediate skip, no further research
- Ambiguous → ArcGIS statewide → Registry → decide
Step 5d — Re-lookup missing data
Missing E (ArcGIS owner): Use arcgis_owner_lookup(street, city, zip5) with 5-strategy fallback. Strip parentheticals from city. Try parent municipality if village name fails (e.g. "No. Easton" → try "EASTON"). Never use name-based fallback for probate rows.
Missing auction date (col U, index 20): Navigate masspublicnotices.org, search by address, use PDF.js + 2captcha to read full notice if needed. Auction dates rarely visible in snippet — need full PDF.
Missing Rel2 (col AJ/AK, index 35/36): Re-read petition PDF page 1, Section 2 for second petitioner.
Entity owner → SOS: See Step 5b above.
Step 5e — Web search renter check (ArcGIS miss rows)
For probate rows with ⚠️ ArcGIS miss: search "{street} {city} MA" to check for mobile home parks, apartment listings, or owner name matches. Update F and K accordingly.
Step 5f — owner ≠ defendants: ArcGIS mismatch diagnosis
Re-query ArcGIS for LS_DATE (format YYYYMMDD). Classify:
- Entity + LS_DATE < 1yr ago →
⚠️ likely bank error — recent entity sale - Entity + LS_DATE > 1yr ago → sub2 gone bad → clear F flag, write diagnosis to K
- Individual mismatch →
⚠️ likely renter — owner mismatch
Step 5g — Registry of Deeds check (ALL flagged rows)
Run for EVERY row with ANY ⚠️ flag. Search both Grantor (seller) and Grantee (buyer) using defendant's last name. 18-month rule applies.
County → Registry URL:
REGISTRY_URL = {
"BROCKTON":"https://www.masslandrecords.com/plymouth/",
"PLYMOUTH":"https://www.masslandrecords.com/plymouth/",
"HINGHAM":"https://www.masslandrecords.com/plymouth/",
"MARSHFIELD":"https://www.masslandrecords.com/plymouth/",
"HALIFAX":"https://www.masslandrecords.com/plymouth/",
"BRIDGEWATER":"https://www.masslandrecords.com/plymouth/",
"MIDDLEBORO":"https://www.masslandrecords.com/plymouth/",
"ROCKLAND":"https://www.masslandrecords.com/plymouth/",
"HANSON":"https://www.masslandrecords.com/plymouth/",
"WHITMAN":"https://www.masslandrecords.com/plymouth/",
"SCITUATE":"https://www.masslandrecords.com/plymouth/",
"NORWELL":"https://www.masslandrecords.com/plymouth/",
"DEDHAM":"https://www.masslandrecords.com/norfolk/",
"NORWOOD":"https://www.masslandrecords.com/norfolk/",
"RANDOLPH":"https://www.masslandrecords.com/norfolk/",
"STOUGHTON":"https://www.masslandrecords.com/norfolk/",
"CANTON":"https://www.masslandrecords.com/norfolk/",
"MILTON":"https://www.masslandrecords.com/norfolk/",
"QUINCY":"https://www.masslandrecords.com/norfolk/",
"WEYMOUTH":"https://www.masslandrecords.com/norfolk/",
"BRAINTREE":"https://www.masslandrecords.com/norfolk/",
"NEEDHAM":"https://www.masslandrecords.com/norfolk/",
"WELLESLEY":"https://www.masslandrecords.com/norfolk/",
"FRANKLIN":"https://www.masslandrecords.com/norfolk/",
"WRENTHAM":"https://www.masslandrecords.com/norfolk/",
"NORFOLK":"https://www.masslandrecords.com/norfolk/",
"COHASSET":"https://www.masslandrecords.com/norfolk/",
"CAMBRIDGE":"https://www.masslandrecords.com/middlesexsouth/",
"FRAMINGHAM":"https://www.masslandrecords.com/middlesexsouth/",
"NEWTON":"https://www.masslandrecords.com/middlesexsouth/",
"NATICK":"https://www.masslandrecords.com/middlesexsouth/",
"WALTHAM":"https://www.masslandrecords.com/middlesexsouth/",
"WATERTOWN":"https://www.masslandrecords.com/middlesexsouth/",
"LEXINGTON":"https://www.masslandrecords.com/middlesexsouth/",
"MARLBOROUGH":"https://www.masslandrecords.com/middlesexsouth/",
"HUDSON":"https://www.masslandrecords.com/middlesexsouth/",
"ASHLAND":"https://www.masslandrecords.com/middlesexsouth/",
"WESTFORD":"https://www.masslandrecords.com/middlesexsouth/",
"MAYNARD":"https://www.masslandrecords.com/middlesexsouth/",
"ACTON":"https://www.masslandrecords.com/middlesexsouth/",
"HOPKINTON":"https://www.masslandrecords.com/middlesexsouth/",
"LOWELL":"https://www.masslandrecords.com/middlesexnorth/",
"MALDEN":"https://www.masslandrecords.com/middlesexnorth/",
"MEDFORD":"https://www.masslandrecords.com/middlesexnorth/",
"SOMERVILLE":"https://www.masslandrecords.com/middlesexnorth/",
"WOBURN":"https://www.masslandrecords.com/middlesexnorth/",
"BURLINGTON":"https://www.masslandrecords.com/middlesexnorth/",
"BILLERICA":"https://www.masslandrecords.com/middlesexnorth/",
"CHELMSFORD":"https://www.masslandrecords.com/middlesexnorth/",
"TEWKSBURY":"https://www.masslandrecords.com/middlesexnorth/",
"WILMINGTON":"https://www.masslandrecords.com/middlesexnorth/",
"STONEHAM":"https://www.masslandrecords.com/middlesexnorth/",
"WAKEFIELD":"https://www.masslandrecords.com/middlesexnorth/",
"DRACUT":"https://www.masslandrecords.com/middlesexnorth/",
"NORTH READING":"https://www.masslandrecords.com/middlesexnorth/",
"WORCESTER":"https://www.masslandrecords.com/worcestersouth/",
"MILFORD":"https://www.masslandrecords.com/worcestersouth/",
"SHREWSBURY":"https://www.masslandrecords.com/worcestersouth/",
"WESTBOROUGH":"https://www.masslandrecords.com/worcestersouth/",
"AUBURN":"https://www.masslandrecords.com/worcestersouth/",
"OXFORD":"https://www.masslandrecords.com/worcestersouth/",
"GRAFTON":"https://www.masslandrecords.com/worcestersouth/",
"NORTHBOROUGH":"https://www.masslandrecords.com/worcestersouth/",
"MILLBURY":"https://www.masslandrecords.com/worcestersouth/",
"BLACKSTONE":"https://www.masslandrecords.com/worcestersouth/",
"UXBRIDGE":"https://www.masslandrecords.com/worcestersouth/",
"DOUGLAS":"https://www.masslandrecords.com/worcestersouth/",
"STERLING":"https://www.masslandrecords.com/worcestersouth/",
"FITCHBURG":"https://www.masslandrecords.com/worcesternorth/",
"LEOMINSTER":"https://www.masslandrecords.com/worcesternorth/",
"GARDNER":"https://www.masslandrecords.com/worcesternorth/",
"ATHOL":"https://www.masslandrecords.com/worcesternorth/",
"GROTON":"https://www.masslandrecords.com/worcesternorth/",
"LUNENBURG":"https://www.masslandrecords.com/worcesternorth/",
"HOLDEN":"https://www.masslandrecords.com/worcesternorth/",
"SPRINGFIELD":"https://www.masslandrecords.com/hampden/",
"HOLYOKE":"https://www.masslandrecords.com/hampden/",
"CHICOPEE":"https://www.masslandrecords.com/hampden/",
"WESTFIELD":"https://www.masslandrecords.com/hampden/",
"NORTHAMPTON":"https://www.masslandrecords.com/hampshire/",
"AMHERST":"https://www.masslandrecords.com/hampshire/",
"GREENFIELD":"https://www.masslandrecords.com/franklin/",
"BARNSTABLE":"https://www.masslandrecords.com/barnstable/",
"FALMOUTH":"https://www.masslandrecords.com/barnstable/",
"SANDWICH":"https://www.masslandrecords.com/barnstable/",
"YARMOUTH":"https://www.masslandrecords.com/barnstable/",
"DENNIS":"https://www.masslandrecords.com/barnstable/",
"HARWICH":"https://www.masslandrecords.com/barnstable/",
"CHATHAM":"https://www.masslandrecords.com/barnstable/",
"BREWSTER":"https://www.masslandrecords.com/barnstable/",
"ORLEANS":"https://www.masslandrecords.com/barnstable/",
"EASTHAM":"https://www.masslandrecords.com/barnstable/",
"CENTERVILLE":"https://www.masslandrecords.com/barnstable/",
"OSTERVILLE":"https://www.masslandrecords.com/barnstable/",
"TAUNTON":"https://www.masslandrecords.com/bristolnorth/",
"ATTLEBORO":"https://www.masslandrecords.com/bristolnorth/",
"NORTH ATTLEBORO":"https://www.masslandrecords.com/bristolnorth/",
"MANSFIELD":"https://www.masslandrecords.com/bristolnorth/",
"NORTON":"https://www.masslandrecords.com/bristolnorth/",
"RAYNHAM":"https://www.masslandrecords.com/bristolnorth/",
"SEEKONK":"https://www.masslandrecords.com/bristolnorth/",
"REHOBOTH":"https://www.masslandrecords.com/bristolnorth/",
"FALL RIVER":"https://www.masslandrecords.com/bristolfallriver/",
"SOMERSET":"https://www.masslandrecords.com/bristolfallriver/",
"SWANSEA":"https://www.masslandrecords.com/bristolfallriver/",
"DIGHTON":"https://www.masslandrecords.com/bristolfallriver/",
"NEW BEDFORD":"https://www.masslandrecords.com/bristolsouth/",
"DARTMOUTH":"https://www.masslandrecords.com/bristolsouth/",
"FAIRHAVEN":"https://www.masslandrecords.com/bristolsouth/",
"ACUSHNET":"https://www.masslandrecords.com/bristolsouth/",
"MATTAPOISETT":"https://www.masslandrecords.com/bristolsouth/",
"WAREHAM":"https://www.masslandrecords.com/bristolsouth/",
"MARION":"https://www.masslandrecords.com/bristolsouth/",
"BOSTON":"https://www.suffolkdeeds.com/",
"DORCHESTER":"https://www.suffolkdeeds.com/",
"ROXBURY":"https://www.suffolkdeeds.com/",
"JAMAICA PLAIN":"https://www.suffolkdeeds.com/",
"CHELSEA":"https://www.suffolkdeeds.com/",
"REVERE":"https://www.suffolkdeeds.com/",
"WINTHROP":"https://www.suffolkdeeds.com/",
"SALEM":"https://www.salemdeeds.com/",
"BEVERLY":"https://www.salemdeeds.com/",
"GLOUCESTER":"https://www.salemdeeds.com/",
"PEABODY":"https://www.salemdeeds.com/",
"MARBLEHEAD":"https://www.salemdeeds.com/",
"IPSWICH":"https://www.salemdeeds.com/",
"NEWBURYPORT":"https://www.salemdeeds.com/",
"LAWRENCE":"https://www.lawrencedeeds.com/",
"HAVERHILL":"https://www.lawrencedeeds.com/",
"METHUEN":"https://www.lawrencedeeds.com/",
"ANDOVER":"https://www.lawrencedeeds.com/",
"NORTH ANDOVER":"https://www.lawrencedeeds.com/",
"AMESBURY":"https://www.lawrencedeeds.com/",
}
18-month cutoff = today - timedelta(days=548).
- Grantee deed found → ArcGIS lag → clear flag, write defendant as E
- Grantor deed <18mo → sold/bank error →
⚠️ sold — skip - Grantor deed >18mo → sub2 gone bad → clear flag, K note
- No deed → leave 5f classification
Step 6 — Send QA email
Email must be a complete decision log — every flagged row gets a resolution.
lines = [
f"✅ QA Pass — {pull_str}",
f"Rows reviewed: {len(rows)}",
f"Cells fixed: {fixes_applied}",
"",
]
# Add lookups_done and couldnt_fix lists
# Add daily list log summary from Step 0
# --- Sequence integrity (from Phase 0b qa_sequence_check.py --heal) ---
# seq_gaps = int after "SEQUENCE_GAPS:" ; gap_lines = the "SM/TL gap:" lines
# rearmed_count = int after "REARMED_IMAGE_SKIPS:"; rearmed_lines = the "re-armed ->" lines
lines.append("\n— Sequence integrity —")
if seq_gaps == 0:
lines.append("SM/TL case-number sequence: contiguous, no gaps")
else:
lines.append(f"⚠️ {seq_gaps} sequence gap(s) — scraper dropped these; looked up + added to tab:")
lines.extend(f" {g}" for g in gap_lines)
lines.append(f"Image-skip rows re-armed to no-image (will re-pull on next sweep): {rearmed_count}")
lines.extend(f" {r}" for r in rearmed_lines)
lines.append(f"\nhttps://docs.google.com/spreadsheets/d/1JS--FPwrBR0Qt3GalZe-_xAaLT0Hgr_3P_sVttUdGMw/edit")
send_notification_email(subject=f"🔍 QA Pass — {pull_str}", body="\n".join(lines))
Duplicate row detection
Check R (17), S (18), T (19) for duplicate case numbers. Delete later-occurring duplicates in reverse order via Chrome MCP (shift-click row headers → right-click → Delete rows).
Schedule
- Time: 5:00am weekdays
- Scheduled task name:
daily-qa-pass - If tab missing at 5am: wait 15 min, retry once, then send error email
Key gotchas
update_cellscol = 1-based integer, not letter. Usecol_num()helper.- Apps Script CORS: use
Content-Type: text/plainin fetch calls; call from Google domain tab. - SOS PDF: renders as canvas — use PDF.js +
fetch(link.href), notget_page_text. - gviz row offset:
rows[i]= sheet rowi+2. Always verify target row before writing. - ArcGIS double spaces: always use
LIKE '%NUM%NAME%'wildcards. - Village cities: try parent municipality if city query fails (e.g. "No. Easton" → "EASTON").
- Probate E column: always decedent full name, never ArcGIS OWNER1.
- 2captcha on MassCourts: use
button.click(), neverform.submit(). - Apps Script secret: empty string works (secret check short-circuits when SECRET="").
- 2captcha key:
0e4b6415b991cfcbcc1aabf47f2e7561— never commit to GitHub.