dashboard

star 612

Real-time harness observability dashboard. Reads campaigns, fleet sessions, telemetry, and pending queues to present a snapshot of harness state at a glance. Invoked by /dashboard, /do status, or phrases like "what's happening" and "show activity".

SethGammon By SethGammon schedule Updated 6/11/2026

name: dashboard license: MIT description: >- Real-time harness observability dashboard. Reads campaigns, fleet sessions, telemetry, and pending queues to present a snapshot of harness state at a glance. Invoked by /dashboard, /do status, or phrases like "what's happening" and "show activity". user-invocable: true auto-trigger: false trigger_keywords: - dashboard - what's happening - what's going on - show activity - harness state - show me status last-updated: 2026-03-26

/dashboard — Harness Observability Dashboard

When to Use

  • "What's happening?" / "Status?" / "What's going on?"
  • "Show activity" / "Show me the dashboard"
  • After returning to a project after time away
  • When /do routes "status", "dashboard", "what's happening", "what's going on", "show activity"
  • Directly: /dashboard

Inputs

None required. Works with whatever state exists on disk.

Protocol

Step 0: RUN DASHBOARD IMPLEMENTATION

Run the local dashboard implementation from the project root:

node scripts/dashboard.js

If the package scripts are available, this equivalent command is also valid:

npm run dashboard

The script is read-only. It renders a user-facing control-plane snapshot from .planning/, telemetry, hook config, coordination state, worktrees, and cost data. Use the manual collection protocol below only as a fallback if the script is missing or fails in the current project.

Step 1: COLLECT STATE

Read the following sources. Each is optional — if a file or directory doesn't exist, treat it as empty. Never crash on missing state.

Campaigns:

  • Glob .planning/campaigns/*.md
  • For each file, read the first 40 lines to extract:
    • Status: field
    • Direction: field (truncate to 60 chars)
    • Phase progress (search for Phase N of M or ## Phase headings)
    • Most recent line starting with - [ from the Decision Log
  • If all phases are complete but status is still active, report needs-completion and show: node scripts/campaign.js complete <slug> --archive
  • If a campaign is marked completed but still lives in .planning/campaigns/, report needs-archive and show the same archive command.
  • If prior build/verify phases are complete but the review-package Exit Evidence row is still pending, missing, or points at a missing local package, report a repair before campaign completion: node scripts/package-delivery.js <slug>

Cost Data (two sources, prefer real):

  • Primary: run node scripts/session-tokens.js --today and --all — reads Claude Code's native session JSONL for exact token counts
  • Fallback: read .planning/telemetry/session-costs.jsonl; cost priority real_cost > override_cost > estimated_cost; group by campaign_slug, sum cost/agents/minutes, compute grand total
  • Live session: read .planning/telemetry/cost-tracker-state.json for burn rate
  • Label real data "(real)" and estimates "(est)"

Fleet Sessions:

  • Glob .planning/fleet/session-*.md
  • For each file, read the first 30 lines to extract:
    • status: field
    • wave: or wave number
    • agents: or agent count

Recent Telemetry:

  • Read last 50 lines of .planning/telemetry/hook-timing.jsonl (if it exists)
  • Read last 50 lines of .planning/telemetry/audit.jsonl (if it exists)
  • Merge and sort by timestamp (descending). Take the 10 most recent entries.
  • For each entry: extract ts (or timestamp), hook (or event), and a short description field. Format as relative time.

Recent Hook Activity (separate from general telemetry):

  • Read last 20 lines of .planning/telemetry/hook-timing.jsonl
  • For event: "timing" entries: extract hook, duration_ms, timestamp (relative), and outcome (pass if no matching error in hook-errors.jsonl within 1s; block if a block entry exists)
  • For event: "counter" entries: extract metric name as the "event" column with count context

Hook Overhead (timing percentiles):

  • Read all of .planning/telemetry/hook-timing.jsonl (if it exists)
  • Keep only entries with a numeric duration_ms; group by hook
  • Per hook compute: count, p50, p95, max (nearest-rank percentile over the sorted durations)
  • Sort rows by p95 descending
  • If the file is missing or contains no timed entries, render the one-line note instead

Routine Quota (account-wide 15 runs / 24h cap):

  • Read .planning/telemetry/routine-runs.jsonl (if it exists)
  • Expected JSONL shape, one record per quota-consuming run: {"ts": "<ISO timestamp>", "kind": "RemoteTrigger" | "CronCreate" | "ScheduleWakeup"}
  • Count records with ts inside the last 24 hours; compare against the cap of 15
  • Warn when the count exceeds 12 (hitting the cap pauses every routine on the account; see docs/ROUTINE-QUOTA.md)
  • The harness does not write this file automatically yet — remote-run logging populates it when a routine mechanism is actually used. Local runners (local-watch.js, local-daemon.js, local-schedule.js) never consume quota and must not be counted.

Pending Queues:

  • Count actionable entries in .planning/telemetry/doc-sync-queue.jsonl where status is pending or needs-review (or 0 if missing)
  • Count lines in .planning/telemetry/merge-check-queue.jsonl (or 0 if missing)
  • Count files in .planning/intake/ (or 0 if missing)

Hook Value Data (for HOOKS VALUE section):

  • Read .planning/telemetry/hook-errors.jsonl (if it exists, last 200 lines)
    • Count entries where hook = "protect-files" (blocked file access)
    • Count entries where hook = "external-action-gate" (gated external actions)
    • Count entries where hook = "quality-gate" (quality violations)
  • Read .planning/telemetry/hook-timing.jsonl (if it exists, last 200 lines)
    • Count entries where hook = "circuit-breaker" and metric = "trips"
    • Count total entries from today (entries containing today's ISO date prefix)
  • Read .planning/telemetry/audit.jsonl (if it exists, last 200 lines)
    • Count entries mentioning "circuit-breaker" or "circuit_breaker"

Hook Problem Taxonomy:

  • Read last 100 entries from .planning/telemetry/hook-errors.jsonl.
  • Classify protect-files blocks and hard external-action-gate blocks as safety-block with info severity; they prove protection fired and do not create a repair action by themselves.
  • Classify error and parse-fail actions as hook-failure with high severity; these are actionable.
  • Classify blocked-restricted as restricted-scope-block with high severity; this is actionable.
  • Classify first-encounter and consent-block from external-action-gate as approval-needed with medium severity; this is actionable.
  • If an external-action-gate approval entry has a later matching tool-call entry in audit.jsonl, classify it as resolved-approval with info severity; it should not create a repair action. Treat git push -u and git push as equivalent for the same branch, and allow a small near-simultaneous timestamp skew between hook and tool-call entries.
  • If an unresolved external approval entry is older than 15 minutes, classify it as stale-approval with low severity; it should not create a current repair action.
  • Classify entries older than 24 hours as stale with low severity and do not create a repair action from stale entries.
  • The /telemetry repair action should appear only when actionable entries are present. Safety blocks remain visible in PROBLEMS and HOOKS VALUE.

Health:

  • Count circuit breaker entries from audit.jsonl (from hook value data above)
  • Count total lines in .planning/telemetry/audit.jsonl written today
  • Count entries in hooks array of .claude/hooks-template.json (or .claude/hooks.json if template not present); use 0 if neither exists
  • Read .claude/harness.jsontrust object:
    • sessions_completed, campaigns_completed counters
    • Compute level: novice (sessions < 5), familiar (5-19), trusted (20+ with 2+ campaigns)
    • If trust.override is set, use that and note "(override)"

Step 2: FORMAT RELATIVE TIMESTAMPS

Convert ISO timestamps: <60s → "just now" | <60min → "{N} min ago" | <24h → "{N} hr ago" | else → "{N} days ago". Display unparseable timestamps as-is.

Step 3: RENDER DASHBOARD

Output verbatim, substituting real values. Always show section headers even when content is "(none active)".

=== Citadel Dashboard ===
As of: {relative timestamp of most recent event, or "now"}

NEXT ACTION
  Command: {exact command}
  Why: {why this is next}
  Confidence: {low | medium | high}
  Repair available: {yes | no}
  Runbook: {docs or skill path}

REPAIR CONSOLE
  {repair|review} | {confidence} | {label}
    command: {exact command}
    why: {short reason}
    runbook: {docs or skill path}

CAMPAIGNS
  {slug}: Phase {N}/{total} — {direction, max 60 chars, ellipsis if truncated}
  Last event: {most recent telemetry entry for this campaign, or "no telemetry"}
  (none active)

COSTS
  This session: ${cost} | {duration} min | ${rate}/min | {messages} msgs | {agents} agents
  Today:        ${today_total} across {today_sessions} sessions
  All time:     ${all_time_total} across {all_time_sessions} sessions ({data_source})

  By campaign:
    {slug}: ${total_cost} across {sessions} sessions ({agents} agents, {minutes} min)
    _unattached: ${total_cost} across {sessions} sessions
  (no cost data recorded yet)

ROUTINE QUOTA
  Runs (last 24h): {N}/15
  WARNING: {N} of 15 routine runs used in the last 24h. Hitting the cap pauses every routine on the account. See docs/ROUTINE-QUOTA.md.
  (remote-run logging populates .planning/telemetry/routine-runs.jsonl - local runners do not consume quota)

HOOKS VALUE
  Circuit breaker: {N} trips (prevented token spirals)
  Quality gate:    {N} violations caught pre-commit
  Protect-files:   {N} blocks (path traversal, secrets)
  External gate:   {N} actions gated
  Total hook fires today: {N}
  (raw facts only -- no inflated savings claims)

FLEET SESSIONS
  {slug}: Wave {N} — {agent count} agents — {status}
  (none active)

RECENT ACTIVITY (last 10 events)
  {relative time} | {hook/event name} | {description}
  (no telemetry recorded yet)

HOOK ACTIVITY (last 10 hook fires)
  {relative time} | {hook name} | {duration_ms}ms | {outcome: pass/block/warn}
  (no hook timing recorded yet — set CITADEL_DEBUG=true in settings.json for verbose output)

HOOK OVERHEAD (sorted by p95 descending)
  hook                        count      p50      p95      max
  {hook name}                   {N}   {N}ms    {N}ms    {N}ms
  (no hook timing data recorded yet)

PROBLEMS
  Actionable: {N} | Safety blocks: {N} | Resolved approvals: {N} | Stale: {N}
  {relative time} | {severity} | {category} | {hook name} | {description}
  (none recorded)

PENDING
  Doc sync:     {N} items queued
  Merge reviews: {N} items queued
  Intake items:  {N} in .planning/intake/

HEALTH
  Circuit breaker trips this session: {N}
  Audit entries today:                {N}
  Hooks installed:                    {N}
  Operator tier:                      {novice | familiar | trusted} ({N} sessions, {N} campaigns)

QUICK COMMANDS
  /do continue    — resume active campaign
  /do rollback    — restore last checkpoint
  /telemetry      — cost breakdown, hook activity, telemetry settings
  /triage prs     — review open PRs
  /pr-watch       — watch PR CI
  /learn          — extract patterns from last completed campaign

Step 4: FRINGE CASE HANDLING

.planning/ missing: All zeros, "(none active)"; add "Run /do setup --express to initialize." harness.json missing or malformed: Show "not configured" for hooks count; do not crash. Malformed campaign file: Skip it; note (N campaign file(s) skipped — malformed). Large telemetry files: Read last 50 lines only. Missing timestamps: Fall back to file modification time; display entry without timestamp if unavailable. All campaigns completed: Note "No active campaigns" at top of CAMPAIGNS section. Completed campaign still active: Show the exact node scripts/campaign.js complete <slug> --archive repair command; suggesting /do continue here is wrong because the campaign is already finished. Campaign ready for review package: Show the exact node scripts/package-delivery.js <slug> repair command before showing campaign completion. All fleet sessions idle: Note "No active fleet sessions" under FLEET SESSIONS. routine-runs.jsonl missing or no runs in window: Show Runs (last 24h): 0/15 plus the one-line population hint; only show the WARNING line when more than 12 runs are counted. Mixed state: Proceed with whatever state exists; note each missing directory inline. Doc-sync backlog: Surface /learn --doc-sync as a repair action with skills/learn/SKILL.md as runbook. Dirty worktree: Surface git status --short as a review action; do not suggest destructive cleanup. Only safety blocks recorded: Show them in PROBLEMS and HOOKS VALUE, but do not surface /telemetry as NEXT ACTION. Actionable hook problem recorded: Surface /telemetry as repair action with skills/telemetry/SKILL.md as runbook.

Contextual Gates

Disclosure: "Displaying harness dashboard. No files modified." Reversibility: green — read-only; no files modified Trust gates:

  • Any: view the full dashboard

Quality Gates

  • Dashboard must render even when all state files are missing
  • Never display raw JSON to the user — always parse and format
  • Relative timestamps required — never show raw ISO strings in output
  • Campaign direction truncated to 60 chars with "..." if longer
  • NEXT ACTION must include command, why, confidence, repair availability, and runbook when known
  • REPAIR CONSOLE must list actionable repairs before raw activity logs
  • Safety blocks must not be treated as urgent repairs unless paired with an actionable hook failure, approval, or restricted-scope block
  • Total output must be skimmable in under 30 seconds

Exit Protocol

/dashboard does not produce a HANDOFF block. It is a read-only observability tool. After displaying the dashboard, wait for the next user command.

Install via CLI
npx skills add https://github.com/SethGammon/Citadel --skill dashboard
Repository Details
star Stars 612
call_split Forks 55
navigation Branch main
article Path SKILL.md
More from Creator