name: debug-user description: Investigate a user-reported issue using server logs, user events, storage snapshots, and Rollbar. Use when given a user id or email plus an issue description (e.g. from a support email). argument-hint: [user-id-or-email] [issue description]
Debug User Issue
Investigate a user-reported issue: $ARGUMENTS
Fetch the user's server-side data incrementally, correlate it with their description of the problem and the source code, and explain the root cause. If the data isn't conclusive, instrument the suspect code paths with lgDebug and hand off for deploy.
Prerequisites
ROLLBAR_READ_TOKENmust be set in the environment for the Rollbar steps.- All scripts run via
npm run r ./lambda/scripts/<script>.ts .... Their stdout includes DynamoDB/log noise around the useful output — when saving to a file, extract the relevant part (e.g. the JSON object) afterwards.
Step 1: Resolve the user id
- Looks like a user id (short lowercase string) → use directly.
- Looks like an email → resolve it:
npm run r ./lambda/scripts/get_user_id_by_email.ts <email> - Looks like a Rollbar occurrence id (long number) → fetch the occurrence (see Rollbar section below) and take
person.idfrom it. If the task is purely "fix this exception" rather than "investigate this user's issue", use/fix-rollbarinstead. - No identifier at all → ask for one.
Stage all artifacts under .tmp/debug-<userid>/:
install -d .tmp/debug-<userid>
Step 2: Always fetch the events timeline first
npm run r ./lambda/scripts/user_events_markdown.ts <userid> > .tmp/debug-<userid>/events.md
This is the cheapest and densest source — the last 2 weeks of client events, grouped by day, each line carrying time, platform (Web/Mobile + iOS/Android app version), and short commit hash:
❌ **ERROR**lines — client errors with message, stack, androllbar_id📸 safesnapshot/mergesnapshotlines — storage snapshots withstorage_idand the syncupdatepayload- Regular named events with
extraJSON
Read it looking for: the time window matching the user's description, errors near that window, and suspicious event sequences (e.g. repeated syncs, unexpected screen flows). Quick error scan:
grep -B 3 -A 3 "❌ \*\*ERROR\*\*" .tmp/debug-<userid>/events.md
Step 3: Fetch targeted data based on the issue
Server-side behavior (sync, payments, API errors)
Server logs for the relevant date(s). The script writes logs-<date>-<userid>.txt into the project root:
npm run r ./lambda/scripts/get_logs.ts <YYYY-MM-DD> <userid>
mv logs-<YYYY-MM-DD>-<userid>.txt .tmp/debug-<userid>/
These can be large — don't read whole files; grep around the timestamps identified from events.md.
State/data issues (wrong weights, missing history, broken program)
Current storage:
npm run r ./lambda/scripts/get_user_storage.ts <userid> > .tmp/debug-<userid>/storage-raw.txt
Extract the JSON object from the surrounding stdout noise into storage.json, then query with jq (storages are big — don't read whole):
jq '{version: .storage.version, programs: [.storage.programs[].name], historyCount: (.storage.history | length)}' .tmp/debug-<userid>/storage.json
Useful shapes: .history[] (workouts, newest first), .programs[], .settings (units, equipment, gyms), .stats.
"It was fine before X" (data corrupted or lost at some point)
Diff storage snapshots from before and after the suspect moment. Snapshot ids come from the 📸 lines in events.md:
npm run r ./lambda/scripts/get_storage_from_snapshot.ts <userid> <storage_id> > .tmp/debug-<userid>/snapshot-<storage_id>.txt
Note: this prints via util.inspect (JS object notation, possibly ANSI-colored), not JSON. Compare targeted fields between snapshots rather than diffing whole files.
Step 4: Rollbar errors for the user
Primary path — from events.md. ❌ ERROR lines include rollbar_id. For errors near the issue window:
curl -sS -H "X-Rollbar-Access-Token: $ROLLBAR_READ_TOKEN" \
"https://api.rollbar.com/api/1/instance/<ROLLBAR_ID>" \
-o .tmp/debug-<userid>/rollbar-<ROLLBAR_ID>.json
Drill down like /fix-rollbar does:
jq -r '.result.item_id, .result.data.liftosaur_exception_id, .result.data.timestamp' .tmp/debug-<userid>/rollbar-<ROLLBAR_ID>.json
If liftosaur_exception_id is non-null, fetch the captured state/actions:
npm run r ./lambda/scripts/get_exception.ts <EXCEPTION_ID> > .tmp/debug-<userid>/exception-<EXCEPTION_ID>.json
jq -r '.data' .tmp/debug-<userid>/exception-<EXCEPTION_ID>.json | jq '.lastActions | fromjson | .[-10:]'
jq -r '.data' .tmp/debug-<userid>/exception-<EXCEPTION_ID>.json | jq '.lastState | fromjson | {screenStack, progressCount: (.storage.progress | length)}'
Fallback — when events.md shows no errors but you suspect a crash. RQL is unavailable on the current Rollbar plan; instead page recent occurrences and filter by person (20 per page, only practical for recent crashes — scan ~10–20 pages max):
for page in 1 2 3 4 5 6 7 8 9 10; do
curl -sS -H "X-Rollbar-Access-Token: $ROLLBAR_READ_TOKEN" \
"https://api.rollbar.com/api/1/instances?page=$page" |
jq -r '.result.instances[] | select(.data.person.id == "<userid>") |
"\(.id) \(.timestamp) \(.data.body.trace.exception.message // .data.body.message.body // .data.body.trace_chain[0].exception.message // "?")"'
done
Matching ids are occurrence ids — feed them back into the primary path above.
Step 5: Correlate and explain
Cross-reference events, server logs, storage, and Rollbar data against the user's description and the relevant source code. Key files:
src/types.ts— IStorage, IProgram, IHistoryRecordsrc/models/state.ts— IStatesrc/ducks/reducer.ts,src/ducks/thunks.ts— state transitions and synclambda/index.ts— server endpoints
Produce a root-cause explanation citing specific evidence (event lines, log entries, storage fields), plus a proposed fix when the cause is clear.
Step 6: If inconclusive — instrument with lgDebug
lgDebug(name, userId, extra?) from src/utils/posthog.ts logs an event only when the running client's tempUserId matches — safe to ship for a single user, works on web and native.
- Add
lgDebug("descriptive-event-name", "<userid>", { key: value })calls at the suspect code paths, capturing the values that would confirm or refute each hypothesis. - Stop and hand off: summarize what was added, where, which hypothesis each call tests, and what to look for in the events afterwards. The developer deploys.
- After the user reproduces the issue, re-run Step 2 — the new events appear in
events.md. - When the investigation concludes, remind the developer to remove the
lgDebugcalls.