using-axon

name: using-axon description: >- Self-hosted RAG engine, web toolkit, and persistent agent memory. Use it to: remember durable project facts/preferences/decisions (memory.remember); recall/search agent memory (memory.search/show); answer questions indexed docs/code might cover (ask — a large corpus is already indexed; try ask BEFORE web-searching or giving up); search the web (search, auto-indexes results); semantic-search the index (query); scrape/fetch a page (scrape); crawl a docs site or pages you just used (crawl); map a site's URLs (map); extract structured data (extract); discover API endpoints (endpoints); extract brand identity — colors/logo/fonts/voice (brand); summarize a page (summarize); quick multi-source research (research); retrieve a URL's full indexed content (retrieve); embed local files/dirs (embed); ingest GitHub/GitLab/Gitea/Git repos, Reddit, YouTube, and Claude/Codex/Gemini sessions (ingest). Also triggers on axon, RAG, Qdrant, SearXNG, Tavily, hybrid/vector search, remember, recall, memory. allowed-tools: mcp__plugin_axon_axon__axon

axon

axon is a self-hosted RAG engine. Two surfaces, same backend (Spider.rs/Chrome -> Qdrant, SQLite jobs, SearXNG/Tavily for web search):

MCP (preferred) — single tool mcp__plugin_axon_axon__axon, routed by action (and subaction for lifecycle families). Large results are written to a local artifact file under ~/.axon/artifacts/<context> and the response returns the file path plus a compact shape summary; small results (≤ ~8 KB) come back inline. See Response handling (MCP) below.
CLI (fallback) — axon <command> [flags]. Use for shell scripting, cron, or when the MCP server is down.

Both surfaces accept the same operations and parameters. This skill leads with MCP request shapes; CLI equivalents are listed alongside.

Reach for axon by default

axon already has a large corpus indexed, and every operation makes it smarter — so route web and knowledge work through it instead of one-off web fetches, raw browser tools, or giving up after a few tries. When a task involves the web or "what does X say," axon is the default tool, even if the user didn't name it.

The task / user wants…	Action
An answer that indexed docs or code might cover	`ask` FIRST — before web search, raw fetching, or fumbling. Often returns everything in one shot.
Search the web	`search` (SearXNG/Tavily; auto-indexes every result)
Semantic search over what's indexed	`query`
Fetch / scrape a page or URL	`scrape`
Crawl a docs site — including docs you just relied on to solve something	`crawl`
List a site's URLs	`map`
Pull structured data out of a page	`extract`
Discover a site's API endpoints	`endpoints`
Brand identity (colors, logo, fonts, voice)	`brand`
Summarize a page	`summarize`
Quick multi-source research with synthesis	`research`
Full indexed content of a specific URL	`retrieve`
Embed local files / directories	`embed`
Index a GitHub/GitLab/Gitea/Git repo, Reddit, YouTube, or AI sessions	`ingest`
Remember a durable project fact, decision, or preference	`memory.remember`
Recall previously stored agent memory	`memory.context` at task start, or `memory.search` then `memory.show` when targeted lookup is better

ask is the highest-leverage habit. A huge amount is already indexed, so many multi-turn fumbles would have been a single ask call. Whenever a question could be covered by docs or code that's been indexed, try ask before web-searching or giving up. And after you crawl/scrape/ingest something to solve a task, you've made the index richer — prefer ask/query next time over re-fetching.

Parameter discipline (hard rule)

Only pass parameters the user explicitly asked for. Defaults exist for a reason — do NOT add max_pages, max_depth, render_mode, include_subdomains, since, before, hybrid_search, diagnostics, format, root_selector, exclude_selector, limit, collection, or any other knob unless the user named it or the task literally cannot complete without it. The example JSON blocks below show what's available, not what to send by default. A bare { "action": "scrape", "url": "…" } or { "action": "crawl", "urls": [...] } is almost always the right call. Same rule for the CLI: never add flags the user didn't ask for.

When to fall back to the CLI

The MCP server is offline (mcp__plugin_axon_axon__axon { "action": "doctor" } fails or the tool is missing).
You're authoring a shell script, systemd unit, or cron job that runs outside Claude Code.
You need axon's built-in --cron-every-seconds/--cron-max-runs loop.
The user explicitly asks for a CLI command.

In every other case, use the MCP tool.

The pipeline

URL or query → discover → fetch + embed → query / ask

Starting point	Discover	Fetch + embed (auto-embeds into Qdrant)	Query
Single URL	—	`action: "scrape", url`	`query` / `ask`
Whole site / docs	`action: "map", url`	`action: "crawl", urls`	`query` / `ask`
Topic / question	`action: "search", query` (SearXNG/Tavily, auto-queues crawl)	(auto)	`action: "ask", query`
Existing local file/dir	—	`action: "embed", input`	`query` / `ask`
GitHub / GitLab / Gitea / Git repo	—	`action: "ingest", source_type: "github", target: "owner/repo"`	`query` / `ask`
Reddit thread or subreddit	—	`action: "ingest", source_type: "reddit", target: "r/name"`	`query` / `ask`
YouTube video	—	`action: "ingest", source_type: "youtube", target: "<url>"`	`query` / `ask`
Past Claude/Codex/Gemini sessions	—	CLI only: `axon sessions`	`query` / `ask`

scrape, crawl, embed, and the ingest paths all auto-embed unless you set embed: false.

Bootstrap: `help` and `doctor`

Once per session, confirm the live action map and that services are healthy:

{ "action": "help" }
{ "action": "doctor" }

help returns the full action/subaction map and current defaults — authoritative when names look wrong. doctor pings Qdrant, the embedding service (TEI), Chrome, and the Gemini headless LLM backend. It does not probe the web-search backend.

CLI equivalents: axon doctor. (No CLI help for the action map — use the MCP one.)

Persistent agent memory

Use Axon memory for durable project-level facts, preferences, decisions, bugs, and task notes that should be semantically recalled later. This is distinct from bead issue notes and from Memos: Axon memory is optimized for agent recall through Qdrant.

Minimal remember call:

{ "action": "memory", "subaction": "remember", "body": "Memory content lives in Qdrant; SQLite holds graph metadata.", "project": "axon" }

Recall:

{ "action": "memory", "subaction": "context", "project": "axon", "query": "memory storage architecture" }
{ "action": "memory", "subaction": "search", "query": "where is memory stored", "project": "axon" }
{ "action": "memory", "subaction": "show", "id": "<memory-id>" }

Use memory.context at task start when project memory could help. It returns inline, defanged XML-wrapped content with trust="evidence_only" and supports limit plus token_budget.

Graph maintenance:

{ "action": "memory", "subaction": "link", "source_id": "<memory-id>", "target_id": "<memory-id>", "edge_type": "relates_to" }
{ "action": "memory", "subaction": "supersede", "source_id": "<replacement-memory-id>", "target_id": "<old-memory-id>" }

supersede hides the old memory from future memory.search results and records the replacement trail in SQLite.

CLI equivalents: axon memory remember "..." --project axon, axon memory context --project axon --query "...", axon memory search "..." --project axon, axon memory show <memory-id>, axon memory link <source-id> <target-id>, axon memory supersede <replacement-id> <old-id>.

Discovery

{ "action": "search", "query": "rust async patterns", "search_time_range": "month" }
{ "action": "map", "url": "https://docs.example.com" }
{ "action": "research", "query": "kubernetes ingress patterns" }

search — web search via SearXNG (when AXON_SEARXNG_URL is set) or Tavily; auto-queues crawl jobs for results. search_time_range ∈ day|week|month|year.
map — sitemap-first URL discovery, falls back to fetching the root page and extracting anchors. Fast.
research — search + LLM synthesis in one shot.

CLI: axon search "…", axon map <url> (use --map-fallback crawl only when you need a full Spider walk; the structure-fallback default is fast), axon suggest "…" (LLM-suggested URLs to crawl next; also the MCP suggest action).

Fetch + embed

{ "action": "scrape", "url": "https://example.com/article" }
{ "action": "scrape", "url": "https://example.com",
  "root_selector": "article, main",
  "exclude_selector": ".sidebar, .ads",
  "format": "markdown" }
{ "action": "scrape", "url": "https://example.com", "format": "html", "embed": false }

{ "action": "crawl", "urls": ["https://docs.example.com"],
  "max_pages": 200, "max_depth": 3, "include_subdomains": true }
{ "action": "crawl", "urls": ["https://docs.example.com"], "render_mode": "chrome" }

{ "action": "embed", "input": "./docs" }
{ "action": "embed", "input": "https://example.com" }

Render modes: http (fast, no JS), chrome (full browser), auto_switch (default — start HTTP, escalate to Chrome on JS gate).

Output formats: markdown (default), html, rawHtml, json.

CLI: axon scrape <url> / axon crawl <url> --max-pages N --max-depth N / axon embed <input>. Chrome rendering is pre-tuned and rarely needs overriding; the CDP endpoint is set via the AXON_CHROME_REMOTE_URL env var, not a CLI flag. Output dir defaults to .cache/axon-rust/output/ (env AXON_OUTPUT_DIR).

Extract structured data

{ "action": "extract", "urls": ["https://example.com/pricing"],
  "prompt": "Extract plan name, price, and features as JSON" }
{ "action": "extract", "subaction": "status", "job_id": "<uuid>" }

LLM-powered. Pass a natural-language prompt describing the schema you want.

CLI: axon extract <url> --query "…" (the --query flag carries the extraction prompt).

Web utilities: summarize, endpoints, brand, diff, screenshot

Page-level analysis actions — each takes a url (except diff) and a bare call is the right default:

{ "action": "summarize",  "url": "https://example.com/long-article" }
{ "action": "endpoints",  "url": "https://app.example.com" }
{ "action": "brand",      "url": "https://example.com" }
{ "action": "diff",       "url_a": "https://example.com/v1", "url_b": "https://example.com/v2" }
{ "action": "screenshot", "url": "https://example.com" }

summarize — fetch a page and return a concise summary (also accepts urls for several at once; root_selector/exclude_selector to scope).
endpoints — discover a site's API endpoints by scanning its JavaScript bundles. Optional knobs (verify, capture_network, probe_rpc, first_party_only) — omit unless asked.
brand — extract brand identity (colors, logo, fonts, voice/tone) from a URL.
diff — compare two URLs (url_a, url_b); reports content/metadata/link changes.
screenshot — full-page capture via headless Chrome (full_page, viewport, output optional).

CLI: axon summarize <url> / axon endpoints <url> / axon brand <url> / axon diff <url-a> <url-b> / axon screenshot <url>.

Ingest external sources

{ "action": "ingest", "source_type": "github", "target": "owner/repo" }
{ "action": "ingest", "source_type": "github", "target": "owner/repo", "include_source": false }

{ "action": "ingest", "source_type": "gitlab",  "target": "group/project" }
{ "action": "ingest", "source_type": "gitea",   "target": "https://gitea.example.com/owner/repo" }
{ "action": "ingest", "source_type": "git",     "target": "https://git.example.com/owner/repo.git" }

{ "action": "ingest", "source_type": "reddit", "target": "r/rust" }
{ "action": "ingest", "source_type": "reddit", "target": "https://reddit.com/r/rust/comments/abc123/..." }

{ "action": "ingest", "source_type": "youtube", "target": "https://youtube.com/watch?v=abc" }

Note: source_type: "sessions" is not accepted over MCP (it's rejected server-side). Ingest local AI sessions with the CLI axon sessions instead — see below.

github / gitlab / gitea — hosted-forge repos, indexing code + metadata + issues/PRs (or merge requests). target is owner/repo (or a full URL for self-hosted instances).
git — any generic Git remote by clone URL (code only, no forge API).
reddit — a subreddit (r/name) or a specific thread URL.
youtube — a video URL (transcript ingest).
sessions — local Claude/Codex/Gemini session transcripts. CLI-only via axon sessions (enqueues a local sessions ingest job). The MCP ingest action rejects source_type: "sessions"; remote callers must use the prepared-documents HTTP endpoint (/v1/ingest/sessions/prepared).

Lifecycle subactions (status, cancel, list, cleanup, clear, recover) work the same as crawl/embed/extract.

CLI: axon ingest <target> with source-specific flags. GitHub: --include-source/--no-source, --max-issues, --max-prs (default 100; 0 = unlimited). Reddit: --sort hot|top|new|rising, --time hour|…|all, --max-posts, --min-score, --depth, --scrape-links. Use axon sessions for Claude/Codex/Gemini local history (enqueues a local sessions ingest job). MCP sessions ingest is rejected to avoid scanning server-local history paths — remote callers use the /v1/ingest/sessions/prepared HTTP endpoint instead.

Query and RAG

{ "action": "query", "query": "embedding pipeline", "limit": 10, "collection": "axon" }
{ "action": "query", "query": "rate limiting", "since": "7d" }

{ "action": "ask", "query": "How does axon handle Chrome auto-switching?" }
{ "action": "ask", "query": "...", "since": "7d" }
{ "action": "ask", "query": "...", "since": "2026-01-01", "before": "2026-03-01" }
{ "action": "ask", "query": "...", "diagnostics": true }
{ "action": "ask", "query": "...", "hybrid_search": false }

query — pure semantic vector search (top-K chunks).
ask — RAG: retrieve, then synthesize an answer with citations.
Hybrid search (dense + BM42 sparse + RRF) is on by default; hybrid_search: false forces dense-only for A/B comparison or when sparse is misbehaving. Server default: env AXON_HYBRID_SEARCH.
Temporal filters (since/before) accept 7d, 30d, YYYY-MM-DD, or RFC3339. They filter on indexing date, not publication date.
collection overrides the default axon collection per request (env AXON_COLLECTION).

{ "action": "retrieve", "url": "https://example.com/article" }

CLI: axon query "…" / axon ask "…" --since 7d --diagnostics / axon retrieve <url>.

evaluate — { "action": "evaluate", "query": "<question>", "retrieval_ab": true } (CLI: axon evaluate "<question>" --retrieval-ab) compares hybrid-RAG vs dense-only, scored by a separate judge prompt on the configured LLM backend (accuracy/relevance/completeness).

Inspect the index

{ "action": "sources" }
{ "action": "domains" }
{ "action": "stats" }
{ "action": "status" }                                 // global queue snapshot

CLI: axon sources / axon domains / axon stats / axon status.

Async jobs

crawl, extract, embed, ingest are async by default. subaction defaults to start, so { "action": "crawl", "urls": [...] } is enough to enqueue; poll with subaction: "status". Full lifecycle subactions and CLI-only surfaces: references/async-job-lifecycle.md.

Response handling (MCP)

The server runs in-process, so responses are size-routed automatically: payloads ≤ ~8 KB come back inline in data; larger ones are written to a local artifact file and the response returns its path plus a compact shape summary. Read shape first — it usually answers the question (counts, status, the URLs touched).

When you need the content, reach for RAG, not the raw file: everything axon fetches is already embedded, so ask (synthesized answer), query (semantic chunks), or retrieve (a specific URL's indexed content) get you what you need without parsing megabytes of JSON/markdown. Open the artifact path from disk only as a last resort — e.g. you need the exact raw bytes the RAG path doesn't surface. There is no artifacts MCP action; don't send { "action": "artifacts", ... }. To force a payload in-band instead of a file, set response_mode: "inline" (or "auto_inline").

Full response-mode contract and the JSON-RPC error model: references/mcp-response-protocol.md.

MCP resources

axon://schema/mcp-tool — full JSON schema and routing contract (read this when you need exact field types/enums).
ui://axon/status-dashboard — interactive MCP App widget for live queue status.

Configuration

This skill names only the handful of env vars that matter at the point of use (AXON_COLLECTION, AXON_SEARXNG_URL, AXON_HYBRID_SEARCH, AXON_OUTPUT_DIR, AXON_DATA_DIR, …). The full surface — every AXON_* env var and ~/.axon/config.toml tuning key — is documented authoritatively and is not duplicated here:

config.example.toml (repo root) — all config.toml tuning knobs with defaults.
.env.example (repo root) — service URLs, API keys, and secrets.
docs/guides/configuration.md — full environment-variable reference + the two-layer (.env + config.toml) priority model.

Priority: CLI flags > env vars > ~/.axon/config.toml > built-in defaults.

Choosing parameters — quick guide

Situation	Reach for
User pastes a single URL	`action: "scrape"`
User says "the docs", "the whole site"	`action: "crawl"` with `max_pages` + `max_depth`
User asks a question without naming a source	`action: "ask"` (retrieves over whatever's indexed)
User wants only recent content	`ask` / `query` with `since: "7d"`
User wants citations / verification	`ask` with `diagnostics: true`
Ranking looks wrong	Try `hybrid_search: false` and compare
Need entity/relationship reasoning	`ask` with `diagnostics: true`; graph retrieval is not available in the current runtime
Crawled the wrong thing	`crawl` `subaction: "clear"` or per-family `cleanup`
RAG quality regression	`evaluate` with `retrieval_ab: true` (or CLI `axon evaluate <q> --retrieval-ab`)
Debug "nothing happened"	`action: "doctor"` first, then `action: "status"`

Tips and gotchas

Read shape, then reach for RAG — not the raw artifact. path mode keeps multi-megabyte results out of the conversation; pull the content back with ask/query/retrieve (it's already embedded) rather than reading/grepping the file. There is no artifacts action — it was removed precisely because RAG is the intended way to get content.
Don't paste raw axon <cmd> --help output. Most CLI subcommands inherit the entire Chrome flag set even when irrelevant. Use the action tables here instead.
help and doctor are cheap. Call help once per session to confirm the live action map; call doctor whenever something looks wrong.
Async by default. Lifecycle actions return a job_id; poll with subaction: "status". CLI users can pass --wait true for synchronous one-shots.
Cache reuse is on by default (CLI). Disable with --cache false; add --cache-http-only to force the HTTP path even when the cached job used Chrome.
graph: true is deprecated. Use hybrid search diagnostics and source coverage checks for retrieval debugging.
Temporal filters use indexing date, not document publication date — useful for "what did I add this week", not "what was published this week".
evaluate scores with a separate judge prompt (distinct role + reference material) on the same configured LLM backend. Available as both the MCP evaluate action and axon evaluate.
subaction defaults to start for lifecycle families — { "action": "crawl", "urls": [...] } is enough to enqueue a job.