name: personal-data-gather description: Persistent data collection across Gmail, Calendar, and Drive — updates daily logs and memory snapshots model: sonnet
GATHER — Personal Data Collection System
This is a scheduled task that runs every 4 hours. It scans personal data sources and routes actionable intelligence into the memory system.
Calendar IDs (all 5 must be queried)
| Calendar | ID | Notes |
|---|---|---|
| Alton (primary) | primary / alto84@gmail.com |
Alton's main calendar |
| Family | family06179810230244859800@group.calendar.google.com |
Shared family calendar: birthdays, vacations (Disney), school events |
| Aneeta | aneetasax@gmail.com |
Aneeta's calendar (owner access): work travel, personal appointments |
| Alton's Tasks | 42418d485f3839dfbc255305ef9839b030193d1a875283cb6884694db7bb5c4c@group.calendar.google.com |
Task list calendar |
| Blue Sombrero (Vayu soccer) | oghb6g9npuam0i4fmhcgh65vmb4e4drn@import.calendar.google.com |
Read-only imported webcal for Vayu's soccer schedule |
Data Sources (in priority order)
Gmail (via Gmail MCP)
- Search for unread/recent messages since last gather
- Categorize: ACTION_REQUIRED, FINANCIAL, FAMILY, BUSINESS, INFORMATIONAL
- Extract deadlines, amounts, contacts, and follow-up dates
- Flag tax-related correspondence (CPA, IRS, state agencies)
Google Calendar (via Calendar MCP)
- Loop through ALL 5 calendars listed above (use gcal_list_events with each calendarId)
- Pull events for next 7 days from each calendar
- Merge results into a single timeline, tagging each event with its source calendar
- The Family calendar is where most shared family events live (birthdays, vacations, school events) -- prioritize these for family-ops routing
- Aneeta's calendar contains her work travel and personal appointments -- flag schedule conflicts between Alton and Aneeta
- Blue Sombrero is read-only; just pull upcoming soccer games/practices for Vayu
- Identify scheduling conflicts across ALL calendars (not just within one)
- Flag events requiring preparation (meetings, appointments, school events)
- Track recurring patterns (commute days, family obligations)
System State
- Check gpuserver1 SSH connectivity
- Pull vast.ai rental status and earnings since last check
- Check heartbeat-log.csv for missed tasks
- Check disk utilization on both machines
[!IMPORTANT] Runner-capability gate (added 2026-06-11). This task may execute on a cloud runner that CANNOT SSH to gpuserver1/rtxserver. When SSH to the GPU fleet is unavailable from the current runner, you CANNOT observe host or vast.ai state — so:
- Mark every infrastructure/host-status item (machine online/offline, listing active, vast.ai rental status, GPU health, disk) as
unverifiable-from-this-runner, not as a fact. - Do NOT escalate an unverifiable host item to P0/critical, and do NOT increment an "unresolved" counter against it. Absence of an SSH reachability check is not evidence the machine is offline.
- Only items the runner can actually verify from its own capabilities (Gmail, Calendar, Drive, local files, logs already on disk) are eligible for status escalation.
- Rationale: a cloud runner without fleet SSH carried and re-escalated a false "vast.ai 52271+124192 OFFLINE — UNRESOLVED" P0 for 21+ runs while both machines were rented and earning. Test SSH first (e.g.
ssh -o ConnectTimeout=5 alton@gpuserver1 'true'); if it fails or the tool is unavailable, downgrade every host item tounverifiable-from-this-runner.
Routing Rules (Karpathy Ingest Pattern)
Each fact gets routed to its PRIMARY target AND all RELATED pages. The Karpathy rule: every ingest should touch 10-15 related pages, not just one. This is how the wiki compounds — cross-references are maintained at ingest time, not re-derived per query.
Primary routing map:
| Fact category | Primary target | Also update |
|---|---|---|
| Tax deadlines, CPA | TAXES.md |
business/solar-inference.md, business/sante-total.md, family/active-todos.md |
| Family events, school | FAMILY.md |
family/active-todos.md, family/vayu.md or vishala.md or vasu.md (whichever kid), family/family-calendar.md |
| Kid-specific (medical, school, activity) | family/{kid}.md |
family/active-todos.md, FAMILY.md |
| Solar Inference LLC | business/solar-inference.md |
TAXES.md, BUSINESS.md, people/doug-paige.md or people/jonathan-francis.md |
| Sante Total | business/sante-total.md |
TAXES.md, BUSINESS.md, people/barbara-weis.md |
| Career/AZ (external only) | business/az-career.md |
ASTRAZENECA.md, ALTON.md |
| Disney trip | family/disney-july-2026.md |
family/active-todos.md, family/family-calendar.md |
| New person encountered | people/README.md (contact card) or people/{name}.md (if enough context) |
Relevant domain page |
| Commute/logistics | ALTON.md or family/active-todos.md |
— |
| Everything else | daily/{date}.md |
— |
When updating a page:
- Read the existing file first. Never overwrite. Always read-then-append.
- Preserve YAML frontmatter verbatim. Only bump
updated:to today's ISO date andupdated_by:to "personal-data-gather". Seesartor/memory/feedback/feedback_preserve_frontmatter.mdfor the full contract. - Preserve all callouts (
> [!deadline],> [!blocker], etc.). Never remove or rewrite them. - Add new facts as callouts when appropriate:
- New deadline found in email? Add
> [!deadline] YYYY-MM-DDto the target page - New blocker or unresolved issue? Add
> [!blocker] - New decision needed? Add
> [!decision] - New verified fact? Add
> [!fact]
- New deadline found in email? Add
- Append new content to a "## Latest from gather (YYYY-MM-DD)" section at the bottom of the target page. Do NOT insert into the middle of existing sections.
- Add wikilinks when referencing related entities (e.g.,
[[people/jonathan-francis|Jonathan Francis]]when noting CPA correspondence). - Deduplicate: if the fact is already present on the page (same claim, same date), skip it. Check the last 20 lines of the target page for duplicates before appending.
Output
- Daily log: Append a timestamped entry to
sartor/memory/daily/{date}.mdwith ALL findings (this is the raw log, kept for audit trail) - Wiki pages: Update the PRIMARY target page and 2-5 RELATED pages per fact (the compiled artifact)
- Active TODOs: If any ACTION_REQUIRED items are found, add them as callouts to
family/active-todos.md(for family items) or the relevant business page (for business items) - Log spine: Append a
## [YYYY-MM-DD] ingest | personal-data-gather run Nentry tosartor/memory/log.mdsummarizing: how many facts gathered, which pages updated, any new deadlines/blockers surfaced - Heartbeat: Write a one-line summary to
data/heartbeat-log.csv - Alerts: If any ACTION_REQUIRED items found, write to
data/gather-alerts.mdwith urgency ranking
Page update contract
CRITICAL: When touching any file under sartor/memory/, you MUST follow the contract in sartor/memory/feedback/feedback_preserve_frontmatter.md:
- Read the file before writing
- Preserve frontmatter verbatim (only bump
updated:andupdated_by:) - Preserve callouts verbatim
- Preserve wikilinks verbatim
- Append, don't overwrite
- If a write would delete more than 10 lines of existing content, STOP and skip that file. Write the fact to
daily/{date}.mdinstead and flag it for manual review.
Constraints
- Convert all relative dates to absolute dates (e.g., "next Thursday" -> "2026-04-10")
- Do not store email bodies or sensitive content — extract facts only
- Do not send any messages or modify any external state
- Skip emails older than 48 hours unless they contain unresolved deadlines
- Record access in .meta/access-log.json for decay tracking