name: debug-session description: Investigate a Slack session where Clack went wrong — given a Slack thread permalink and a description of the issue, read the persisted session, correlate it with the user's complaint, dig into the relevant source code, and produce a written root-cause assessment with suggested fixes. Use whenever the user pastes a Slack link alongside a complaint about Clack's behavior ("Claude did X in this thread and it was wrong", "this session got the answer wrong", "the bot picked the wrong repo here", "something broke in this thread"), even if they don't use the word "debug".
Debug a Slack session
Investigate why Clack handled a specific Slack thread the way it did, and report back with concrete code-level suggestions. The user applies the fix themselves — do not edit source code as part of this skill. The goal is diagnosis, not repair.
Inputs
The user provides:
- A Slack thread permalink
- A short description of the issue (what looked wrong, what they expected)
If either is missing, ask for it before proceeding.
Workflow
1. Parse the Slack link
Slack permalinks look like https://{workspace}.slack.com/archives/{CHANNEL_ID}/p{16DIGIT_TS} and sometimes carry ?thread_ts={TS}&cid={CHANNEL}.
CHANNEL_IDis the segment after/archives/. It starts withC,G, orD.- The
p-prefixed number is a Slack message timestamp with the decimal point stripped. Insert a.six digits from the right:p1775833779602979→1775833779.602979. - If
?thread_ts=X.Yis present,X.Yis the thread root — that's what matches thethreadTsfield on sessions. Thepnumber is the specific reply being linked. Usethread_tsfor session lookup in this case. - If
?thread_tsis absent, thepnumber is itself the thread root (the link points at the top-level message).
So from the link, derive channelId and threadTs.
2. Locate the session
Where sessions actually live
Clack is deployed on GCE. The authoritative copy of data/ lives on the VM's persistent disk at /mnt/disks/clack-data/data/. The local ./data/ in this repo is usually NOT the right place to look — it's only populated when the user has explicitly sync'd or is running the bot locally.
Always use the fetch script to pull a single session from the VM into a sandboxed local dir:
scripts/gce-fetch-session.sh '<slack-permalink>'
This pulls just the matching Q&A session, worker session (if any), and the corresponding SDK JSONL into data/.debug-sessions/<channelId>-<threadTs>/. It's non-destructive and won't touch your real ./data/. The script handles ?thread_ts= parsing and SDK JSONL discovery for you. Read files from the resulting data/.debug-sessions/<id>/ path for the rest of this skill.
Channelless plugin sessions (cron)
If the fetch script reports
NO_MATCH, do NOT assume Clack never engaged. Plugin cron jobs (casual-talk, trivia, etc.) that post viadeliver_toorpost_tocreate channelless sessions filed underchannelless-<jobId>-...-plugin-<name>-<createdAtMs>directories, keyed by jobId not by<channelId>-<threadTs>. The fetch script only matches<channelId>-*prefixes, so it can't see those.Reliable fallback: SSH to the GCE instance and grep all
context.jsonfiles for the thread root ts:gcloud compute ssh clack --zone=northamerica-northeast1-a --quiet --command='sudo grep -rlE "<threadTs>" /mnt/disks/clack-data/data/sessions/*/context.json'(Also useful:
sudo ls -dt /mnt/disks/clack-data/data/sessions/channelless* | headlists recent channelless plugin sessions.) Once you have the matching directory name, read itscontext.jsonand SDK JSONL directly over SSH rather than relying on the channel-prefix match. Channelless sessions havetriggerType: "scheduled"andtrigger.type: "scheduled"with ajobId— that signature confirms it's a cron-driven post, not a user-triggered Q&A.
Two kinds of persisted sessions exist, and the right one depends on whether the thread triggered a Q&A or a Changes Workflow. Check both.
Q&A sessions (data/sessions/)
Directories are named {channelId}-{tsSecs}-{tsMicros}-{userId}-{createdAtMs}. The threadTs in a session equals messageTs when the session started the thread, so its directory prefix is {channelId}-{tsSecs}-{tsMicros} where {tsSecs}.{tsMicros} is the thread_ts with . replaced by -.
Find candidates by listing data/sessions/ and filtering for names starting with {channelId}-{threadTs with . → -}. Multiple matches are possible (different users triggering in the same thread is rare but possible — pick by opening context.json and matching threadTs exactly).
Read the session's context.json. The fields that matter for debugging:
trigger— structured metadata describing what started the session. A discriminated union keyed ontype:{ type: "reactions", userId, emoji, messageTs, messageText, imageFiles? }— someone reacted with a configured emoji.{ type: "mentions", userId, messageTs, messageText, imageFiles? }— @Clack in a channel/thread.{ type: "directMessages", userId, messageTs, messageText, imageFiles? }— DM to Clack.{ type: "autoRespond", userId, messageTs, messageText, ruleName?, imageFiles?, preAnalysis? }— auto-respond rule matched;preAnalysiscaptures the session-creating verdict.{ type: "scheduled", jobId?, prompt, preAnalysis? }— cron fired;promptis the cron's instruction.- The trigger is independent from
messages[]— it records what kicked off the session without being part of the turn log.
messages[]— the unified temporal log of everything after the trigger.messages[0]is usually Clack's first assistant response (NOT the user's message — that's ontrigger). Entries are either{ role: "user", source, text, ts, value? }(source:"reply" | "choice" | "followup") or{ role: "assistant", ts, text?, payload?, toolCalls?, error?, skipped?, disengaged?, postedTopLevel?, preAnalysis? }.- User messages are replies/button clicks/followups that arrived after the trigger, in temporal order.
- Assistant messages carry their own per-turn
toolCalls[]andpayload.skipped: truemeans Claude declined to reply;errormeans the turn failed;postedTopLevel: truemeans the response was posted to the channel, not the thread.preAnalysison an assistant turn records the gate verdict for THAT specific turn (autoRespond thread replies run pre-analysis per turn). - The
find_session_transcripttool returns the trigger AND paginatedmessages[]in one call — useful when readingcontext.jsondirectly is noisy.
triggerType—reactions,directMessages,mentions,autoRespond,threadReply, orscheduled(mirrorstrigger.typefor convenience;threadReplyonly appears on sessions reused via the auto-respond thread path).errors[]—{ errorMessage, conversationTrace, timestamp }. Present when something threw. (Per-turn failures are also captured on the correspondingmessagesentry asmessage.error.)stagedIntents— action-button intents Claude queuedchannelName,userId,username,displayName
Pre-migration note: sessions written before the trigger split (and the earlier unified-conversation-log change) persist a flatter shape — either the legacy
originalQuestion/refinements[]/lastAnswer/lastResponse/toolCallHistoryfields, or an earlymessages[]wheremessages[0]hadsource: "initial". Clack'sgetSessionsynthesizes(trigger, messages[])for those on read and writes the new shape back on nextupdateSession. If you opencontext.jsondirectly and see legacy fields, or see a user entry atmessages[0]withsource: "initial", that's expected — the information is the same, just laid out differently.
Worker sessions (data/worktree-sessions/)
Changes Workflow threads also produce a session here, keyed by branch name (slashes replaced with -), not by channel. To find the right one, scan data/worktree-sessions/*/state.json and match channel + threadTs against the link.
Each folder contains:
state.json—{ sessionId, status, phase, branch, repo, userId, description, prUrl, channel, threadTs, lastMessage, startedAt, lastActivityAt }execution.log— timestamped log lines from the worker run. The worker does not persist per-tool-call history the way Q&A does — this log is your main source of truth for what happened inside the worktree.
A single thread can have both kinds of session (Q&A that led to a change request, then a worker session for the implementation). Read whichever is relevant to the issue, or both.
SDK conversation log — ALWAYS READ THIS (data/.claude/projects/)
Clack's context.json only stores the latest turn's tool calls and response — earlier turns get overwritten. The underlying Claude Agent SDK log captures every turn, every user message, every tool call, every tool result, in order, and is the source of truth for what actually happened.
Read the JSONL every time, not just when you suspect multi-turn complexity. Even single-turn sessions have detail in the JSONL (system prompt, full tool_result payloads, intermediate assistant text) that context.json drops. Locating the file is not enough — open it and scan it before forming any hypothesis. The only acceptable reason to skip this step is when the file genuinely doesn't exist on disk.
The SDK persists each session as {sdkSessionId}.jsonl under data/.claude/projects/<project-subdir>/. The project subdir varies depending on which repo Claude was operating in (e.g., -app-data-repositories, -app-data-worktrees-applauz-monorepo-clack-fix-foo), so glob across subdirs:
data/.claude/projects/*/{sdkSessionId}.jsonl
sdkSessionId is on the Clack session at context.json → sdkSessionId. If it's missing or the JSONL file isn't on disk (older sessions, or the SDK evicted it), say so explicitly in your assessment and fall back to what's in context.json — that's the only data available for those.
The JSONL is one JSON object per line. The line types you care about:
user— user-role messages. The first one usually contains a long system-injected preamble ("DELIVERY CONTEXT: ...") followed by the actual user input. Subsequentuserlines often carrytool_resultblocks (results from the previous tool call).assistant(under amessagewrapper) — Claude's text andtool_useblocks. Tool calls live here as{ type: "tool_use", name, input, id }.system(subtypeinit) — start of a turn, carries thesession_id.queue-operation,last-prompt,skill_listing,file-history-snapshot, etc. — internal bookkeeping, generally skip.
To reconstruct a turn-by-turn view, walk the lines in order and group by the init boundaries (or just by the natural user → assistant → tool_result → assistant pattern).
If no Clack session is found at all, stop and tell the user — it likely means the session was evicted (30-day age cap for Q&A sessions) or the link points at a thread that never triggered Clack.
3. Reconstruct the story
From the session data, build a timeline. The SDK JSONL is the primary source — always work from it when the file exists, not from context.json snippets. Use the Clack context.json (messages[] + triggering metadata like channel, user, triggerType, errors[]) as the fallback when the JSONL isn't on disk and to understand the Slack-side framing around each turn.
- What did the user ask? Start with
trigger.messageText(ortrigger.promptfor scheduled), then walkmessages[]in order for subsequent user replies/button clicks/followups. - Which tools did Claude call on each turn, in what order, with what args, and what did they return (per-turn
toolCalls[]on each assistant message, or the JSONL for the full picture)? - Did any tool error or return a surprising result?
- What did Claude say back, turn by turn (each assistant
messagesentry carriestext,payload, and flags likeskipped/postedTopLevel)? - Where does that diverge from what the user expected, per their description of the issue?
Name the divergence concretely. Typical shapes:
- Wrong repo chosen — Claude called
list_repositoriesthen picked one that didn't match the question. - Tool returned nothing useful — e.g.,
git_logcame back empty because history wasn't deep enough; Claude didn't calldeepen_history. - Misread the question — Claude's plan in its tool args shows a different interpretation than what the user asked.
- Missing capability — Claude had no tool for what was needed and made something up, or stalled.
- Prompt steered it wrong — the system prompt or an instruction file nudged Claude away from the right path.
- Permission/role gate — the tool Claude needed was hidden at the user's role tier.
- Worker-mode failure —
execution.logshows the branch/PR steps that failed.
4. Investigate the code
Once you have a candidate root cause, read the relevant source to confirm it and locate the fix site. The map:
- A specific tool misbehaved →
src/tools/query/{tool}.ts,src/tools/actions/{tool}.ts,src/tools/worker/{tool}.ts, orsrc/tools/presentation/submitResponse.ts. Also checksrc/tools/server.tsfor role gating and availability rules. - Claude was fed bad context →
src/tools/context.ts(tool context builders),src/sessions.ts(what's persisted and surfaced). - System prompt steered Claude wrong →
src/claude/promptBuilder.tsassembles the prompt. The shipped instruction defaults live underdata/default_configuration/, with user overrides indata/configuration/taking precedence. Per-repo instructions live atdata/configuration/{repo}/ordata/default_configuration/{repo}/. - Query orchestration / delivery issue →
src/slack/handlers/core.ts(processMessage). - Changes Workflow issue →
src/changes/execution.ts(executeChange),src/changes/workflow.ts,src/changes/askClaudeWorktree.ts,src/changes/pr.ts,src/changes/monitor.ts. - Role / permission issue →
src/roles.ts,src/permissions.ts,src/repoAccess.ts.
Use the LSP tool (goToDefinition, findReferences, hover) before falling back to Grep/Glob — this is a TypeScript project and navigation is much more reliable that way.
When citing code, use file:line so the user can jump to the location.
5. Write the assessment
Deliver the investigation as a single message to the user with these sections:
What the user asked
One or two sentences summarizing the request, quoting the key line(s) from trigger.messageText / trigger.prompt and user replies in messages[] verbatim where helpful.
What Clack did A short narrative or bullet list of the tool calls and responses, in order, with the specific args/results that matter. Don't dump the whole history — pick the 3–6 steps that explain the outcome.
Where it went wrong Name the divergence. Tie it to the user's complaint.
Root cause
Point at the code. Use file:line references. Explain why the code produces this behavior — not just what line is responsible. If the root cause is a prompt/instruction rather than code, say so and cite the file.
Suggested fixes One or more concrete suggestions the user can act on. Each should be specific enough to implement: which file, roughly what change, and why it would address the root cause. If there are tradeoffs between fixes, flag them. If more than one change is needed, list them.
Not applied End with an explicit one-liner: "I haven't applied any of these — decide which (if any) you want and I'll implement it when you ask." This matters because the skill's contract is diagnosis-only; stating it avoids confusion.
Notes
- Sessions with
errors[]populated are preserved past the 30-day cleanup window, so old failure cases are often still investigable. - The SDK JSONL is still the richest per-turn source (full system preamble, every tool_use + tool_result). Always read it (step 2) alongside
messages[].messages[]is authoritative for what the user sent / what Claude returned; the JSONL is authoritative for how Claude reached that outcome. - The
find_session_transcripttool (query-mode) returns paginatedmessages[]for a givensessionIdwith full per-turn payload andtoolCalls[]. Useful when you want a structured view without parsing raw JSON. Subject to the standard privacy rules (owner always, non-owner only for public channels). - Tool results in both the JSONL and
messages[].toolCalls[]are stored as whatever the tool returned — they may be large (e.g.,view_slack_imageembeds base64). Skim first, then zoom in on the ones that look decisive. - If you find the Clack session was never persisted (no match on disk), that itself might be the bug — surface it and check
src/sessions.tsand the handler that would have created it. - If the user's complaint is about delivery (wrong channel, missing thread, etc.) rather than Claude's reasoning, focus on
src/slack/handlers/core.ts,src/slack/dmResponse.ts, andsrc/slack/messagesApi.tsrather than the tool history. - The bot sets
CLAUDE_CONFIG_DIR=/app/data/.claudein its Dockerfile, which is why SDK JSONLs land underdata/.claude/projects/and travel with the rest ofdata/. If you ever debug a setup where that env var isn't set, the JSONLs will be under~/.claude/projects/instead.