ha-logs

name: ha-logs description: "Self-service diagnostics — query Hope Agent's local SQLite databases (logs / sessions / background jobs) directly via the `exec` tool to investigate problems, analyze usage, and locate root causes. Trigger on: user reports something broken / failing / slow / stuck / not responding ('X 不工作', 'X 报错', 'X 卡住', '为什么 X 失败', 'why did X fail', 'show me the logs', 'check what happened'); ad-hoc data analysis ('this week's token usage', '最近调用最多的工具', 'how many subagent runs failed', 'tool error rate', 'find sessions where X happened'); verifying a fix ('did the error stop after I changed Y'). Use BEFORE asking the user to paste log snippets — the data is on disk, query it directly. Read-only — SELECT only, never UPDATE/DELETE/INSERT/DROP." version: 1.0.0 author: Hope Agent license: MIT requires: anyBins: [sqlite3, python3]

Hope Agent Logs — Self-Service Diagnostics

Hope Agent persists every log line, every session message, and background job state into local SQLite databases under ~/.hope-agent/. You can query these directly via exec to investigate problems before asking the user. Treat this as your primary evidence source — AGENTS.md explicitly designates the log database as "the agent's first source for self-repair".

Iron rule: read-only

SELECT only. Never UPDATE / DELETE / INSERT / DROP / ATTACH / VACUUM / CREATE / REPLACE.

The DBs are live: write queries can corrupt session state, kill running streams, or wipe history. Open with -readonly (CLI) or ?mode=ro (Python URI) so SQLite enforces it. If you genuinely need to modify state, use the dedicated tools (update_settings, task_update, etc.) or ask the user.

How to query

Use exec to run one of:

`sqlite3` CLI (Linux / macOS, usually preinstalled)

sqlite3 -readonly -cmd ".mode column" -cmd ".headers on" ~/.hope-agent/logs.db \
  "SELECT timestamp, level, category, source, message
     FROM logs
    WHERE level = 'ERROR'
    ORDER BY timestamp DESC
    LIMIT 20;"

Python fallback (Windows or no `sqlite3` on PATH)

python3 - <<'PY'
import sqlite3, os
p = os.path.expanduser("~/.hope-agent/logs.db")
con = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
for r in con.execute("""
    SELECT timestamp, level, category, source, message
      FROM logs
     WHERE level = 'ERROR'
     ORDER BY timestamp DESC
     LIMIT 20
"""):
    print(r)
PY

Schema discovery

sqlite3 -readonly ~/.hope-agent/logs.db ".schema logs"
sqlite3 -readonly ~/.hope-agent/sessions.db ".tables"

Databases

Path	Purpose
`~/.hope-agent/logs.db`	App logs from `app_info!`/`warn!`/`error!`/`debug!` macros
`~/.hope-agent/sessions.db`	Sessions, messages, tasks, subagent runs, learning events, channel conversations
`~/.hope-agent/background_jobs.db`	Unified background job cache (`exec` / `web_search` / `image_generate` plus subagent/group projections)
`~/.hope-agent/recap/recap.db`	Cached recap analysis
`~/.hope-agent/local_model_jobs.db`	Local LLM background jobs (download / preload)

Key schemas

`logs.db` → `logs`

Column	Type	Notes
`id`	INTEGER PK
`timestamp`	TEXT	ISO 8601 UTC, e.g. `2026-05-04T12:34:56.789Z`
`level`	TEXT	`INFO` / `WARN` / `ERROR` / `DEBUG`
`category`	TEXT	Subsystem tag — examples: `chat_engine`, `permission`, `mcp`, `channel`, `compact`, `failover`, `tool`, `provider`, `memory`, `cron`, `subagent`, `plan`, `config`, `skill`
`source`	TEXT	Origin file/function (free-form, agent-set)
`message`	TEXT	Rendered printf-style message
`details`	TEXT	Optional JSON payload
`session_id`	TEXT	Nullable, links to `sessions.db → sessions.id`
`agent_id`	TEXT	Nullable

Indexes: timestamp DESC, level, category, session_id.

`sessions.db` → `sessions`

id, title, agent_id, provider_id, provider_name, model_id, reasoning_effort, created_at, updated_at, context_json, last_read_message_id, is_cron, parent_session_id, incognito, title_source

`sessions.db` → `messages`

id, session_id, role, content, timestamp, attachments_meta, model, tokens_in, tokens_out, reasoning_effort, tool_call_id, tool_name, tool_arguments, tool_result, tool_duration_ms, is_error, ttft_ms, tokens_in_last, tokens_cache_creation, tokens_cache_read, tool_metadata, thinking

role ∈ user / assistant / system / tool. Tool calls land as paired rows: an assistant row with tool_name/tool_arguments, then a tool row with tool_result / is_error / tool_duration_ms.

messages_fts (FTS5 virtual table) provides full-text search over content for user/assistant rows — use it for keyword search:

SELECT m.id, m.session_id, m.role, snippet(messages_fts, 0, '<mark>', '</mark>', '…', 16)
  FROM messages_fts
  JOIN messages m ON m.id = messages_fts.rowid
 WHERE messages_fts MATCH 'timeout OR rate_limit'
 ORDER BY m.timestamp DESC
 LIMIT 20;

`sessions.db` → `subagent_runs`

run_id, parent_session_id, parent_agent_id, child_agent_id, child_session_id, task, status, result, error, depth, model_used, started_at, finished_at, duration_ms, label, attachment_count, input_tokens, output_tokens

status ∈ spawning / running / completed / failed / cancelled.

`sessions.db` → `tasks`

Plan Mode task tracking — session_id + state columns. Read only; writes must go through task_create / task_update tools.

`sessions.db` → `learning_events`

id, ts INTEGER (unix seconds), kind, session_id, ref_id, meta_json — current kind values: skill CRUD, tool_recall_memory hits, MCP tool calls. meta_json is opaque JSON.

`background_jobs.db` → `background_jobs`

job_id, session_id, agent_id, tool_name, tool_call_id, args_json, status, result_preview, result_path, error, created_at INTEGER (unix s), completed_at, injected, origin, approval_origin, incognito, pid, cancel_requested, kind, subagent_run_id, group_id

status includes queued/running/awaiting-approval and terminal states such as completed/failed/cancelled/timed-out/interrupted. Large results spill to result_path on disk; result_preview keeps a short head/tail.

Query cookbook

Pick a template, adapt the time window / filter / limit. Always bound by time and LIMIT — these tables can hold millions of rows.

Recent errors (last 30 min)

SELECT timestamp, category, source, message, details
  FROM logs
 WHERE level IN ('ERROR','WARN')
   AND timestamp >= datetime('now','-30 minutes')
 ORDER BY timestamp DESC
 LIMIT 50;

Errors for one session

SELECT timestamp, level, category, message, details
  FROM logs
 WHERE session_id = ?
 ORDER BY timestamp DESC
 LIMIT 100;

Top error categories (last 7 days)

SELECT category, source, COUNT(*) AS n
  FROM logs
 WHERE level = 'ERROR'
   AND timestamp >= datetime('now','-7 days')
 GROUP BY category, source
 ORDER BY n DESC
 LIMIT 20;

Failover events

SELECT timestamp, source, message, details
  FROM logs
 WHERE category = 'failover'
   AND timestamp >= datetime('now','-1 day')
 ORDER BY timestamp DESC;

Slowest tool calls (last 24h)

SELECT tool_name,
       COUNT(*)              AS calls,
       AVG(tool_duration_ms) AS avg_ms,
       MAX(tool_duration_ms) AS max_ms
  FROM messages
 WHERE role = 'tool'
   AND tool_duration_ms IS NOT NULL
   AND timestamp >= datetime('now','-1 day')
 GROUP BY tool_name
 ORDER BY avg_ms DESC;

Tool failure rate (last 7 days)

SELECT tool_name,
       SUM(is_error) AS errors,
       COUNT(*)      AS total,
       ROUND(100.0 * SUM(is_error) / COUNT(*), 1) AS error_pct
  FROM messages
 WHERE role = 'tool'
   AND timestamp >= datetime('now','-7 days')
 GROUP BY tool_name
HAVING errors > 0
 ORDER BY error_pct DESC;

Token usage by day (last 30 days)

SELECT date(timestamp)         AS day,
       SUM(tokens_in)          AS in_tok,
       SUM(tokens_out)         AS out_tok,
       SUM(tokens_cache_read)  AS cache_read,
       SUM(tokens_cache_creation) AS cache_create
  FROM messages
 WHERE role = 'assistant'
   AND timestamp >= datetime('now','-30 days')
 GROUP BY day
 ORDER BY day DESC;

Subagent runs failing recently

SELECT child_agent_id, label, error, started_at, duration_ms
  FROM subagent_runs
 WHERE status = 'failed'
   AND started_at >= datetime('now','-7 days')
 ORDER BY started_at DESC;

Async jobs stuck / orphaned

SELECT job_id, session_id, kind, tool_name, status, created_at, error
  FROM background_jobs
 WHERE status IN ('queued','running','awaiting_approval')
   AND created_at < strftime('%s','now') - 3600   -- older than 1 hour
 ORDER BY created_at;

Search log messages by keyword

SELECT timestamp, level, category, source, message
  FROM logs
 WHERE message LIKE '%timeout%'
   AND level IN ('ERROR','WARN')
 ORDER BY timestamp DESC
 LIMIT 30;

Most-active sessions in the last 24h

SELECT s.id, s.title, s.agent_id, s.model_id,
       COUNT(m.id) AS msg_count,
       MAX(m.timestamp) AS last_msg
  FROM sessions s
  JOIN messages m ON m.session_id = s.id
 WHERE m.timestamp >= datetime('now','-1 day')
 GROUP BY s.id
 ORDER BY msg_count DESC
 LIMIT 10;

Cross-DB: link a log row back to its conversation

# Step 1 — pick a log
sqlite3 -readonly ~/.hope-agent/logs.db \
  "SELECT id, session_id, message FROM logs WHERE id = 12345;"

# Step 2 — pull session metadata + recent messages
sqlite3 -readonly -cmd ".mode column" -cmd ".headers on" ~/.hope-agent/sessions.db \
  "SELECT id, title, agent_id, model_id FROM sessions WHERE id = '<session_id>';
   SELECT role, timestamp, substr(content,1,200), tool_name, is_error
     FROM messages WHERE session_id = '<session_id>'
     ORDER BY id DESC LIMIT 10;"

(Run as two separate queries — the read-only mode forbids ATTACH, and we don't want to modify state anyway.)

Workflow when the user reports a problem

Don't ask for log paste — query directly. Say "Looking at the logs now" and run a query.
Time-bound the scope. "Just now" → last 30 min. "Today" → last 24 h. "This week" → 7 days.
Start broad, narrow down. First level='ERROR' over the window, then drill into the suspect category with full details.
Cross-reference messages.tool_result for tool-related issues — the log line names the failure, the message row holds the actual payload (often a stack trace or HTTP body).
Report findings as an evidence chain. Quote real timestamps and verbatim message text. Don't paraphrase vaguely.
Combine with systematic-debugging for non-trivial bugs — root-cause discipline beats guessing.
Verify the fix. After changing config / code, re-run the same query to confirm the error stops appearing.

Privacy & safety caveats

Incognito sessions (sessions.incognito = 1) deliberately don't persist user content. If a query returns thin data for an incognito session, just note it — don't treat the gap as a bug.
Secrets in logs: API keys / tokens are redacted at the app_*! macro layer (redact_sensitive in crates/ha-core/src/logging.rs). However messages.tool_arguments / tool_result may contain user-pasted secrets — never echo raw tool arguments back in chat without scanning first.
Local only: these DBs never leave the user's machine. They're private by construction; treat them that way.
Row counts: messages and logs can grow to millions of rows. Always LIMIT and time-bound. Never SELECT * without filters.

When NOT to use this skill

For session message replay with attachments resolved, prefer the sessions_history / session_status tools — they handle media and pagination cleanly.
For listing sessions by simple filter, prefer sessions_list — it already understands agent_id / is_cron / pagination.
For dashboard aggregates (cost trends, heatmap, health score), the GUI Dashboard surface already runs these — query logs only for ad-hoc / unsupported angles.
For mutating state, use the corresponding write tool (update_settings, task_update, task_create, etc.). Never write SQL.

If neither `sqlite3` nor `python3` is available

Tell the user:

I'd query the logs directly but neither sqlite3 nor python3 is on PATH. Could you (a) install sqlite3 (brew install sqlite on macOS, apt install sqlite3 on Debian/Ubuntu), or (b) paste the relevant lines from ~/.hope-agent/logs.db so I can inspect manually?