refresh

name: refresh description: Re-verify the ship-it-skynet coding-agent comparison matrix against primary sources. Fetches official docs for tier-1 CLIs (Claude Code, Codex CLI, Gemini CLI), diffs claims in README.md and docs/*.md against current reality, proposes updates for human approval, bumps "Last verified" dates, and runs the link-check gate. Optional arg scopes to one tool (e.g., "/refresh gemini-cli"). Without arg, refreshes all three tier-1 tools. Use monthly, or whenever you notice a vendor release.

/refresh — re-verify the matrix

This skill keeps ship-it-skynet's comparison matrix honest. It fetches current official docs, diffs them against the claims in this repo, and proposes updates for human approval.

Scope

If the user supplied an argument (e.g., claude-code, codex-cli, gemini-cli), scope the refresh to that single tool. Otherwise refresh all three tier-1 tools.

Tier 2 tools are never refreshed into the matrix — they belong in prose only. If the user tries to add one as a matrix column, push back and point them at docs/*.md.

Canonical sources

These URLs are the single source of truth for "what counts as primary" in this project. If a URL is wrong or outdated, update it in this file — then commit the correction as part of the refresh run.

Claude Code

Confirmed (explicitly linked in the project):

Skills: https://code.claude.com/docs/en/skills
Sub-agents: https://code.claude.com/docs/en/sub-agents

Probable (pattern extension — verify by fetching before trusting):

Overview: https://code.claude.com/docs/en/
Hooks: https://code.claude.com/docs/en/hooks
MCP: https://code.claude.com/docs/en/mcp
Settings / permissions: https://code.claude.com/docs/en/settings
Agent SDK tool reference (authoritative built-in tool list): https://code.claude.com/docs/en/agent-sdk/typescript

Note: Custom slash commands are now part of skills. Fetch the skills page as authoritative for both — there is no separate slash-commands page. Confirmed 2026-04-11: code.claude.com/docs/en/slash-commands now returns the skills page content, so fetching both pages is duplicate work.

Note: The Agent SDK TypeScript reference is the authoritative enumeration of Claude Code's built-in tools (WebSearch, WebFetch, Bash, Read, Edit, etc.) with their full input/output schemas. The settings and permissions pages document permission rules for tools but do not list which tools exist — consulting them alone will miss tools. Any matrix row that makes a claim about a specific built-in tool (web search, file ops, bash, image handling, etc.) must cross-check the SDK reference before being marked 🟡 or ❌. This rule exists because the 2026-04-11 run missed Claude Code's WebSearch tool by checking permissions instead of agent-sdk/typescript, and the error sat in the matrix until a reader falsified it empirically.

Authoritative secondary:

Changelog: https://github.com/anthropics/claude-code (look for CHANGELOG.md or releases)
Latest release: gh api repos/anthropics/claude-code/releases/latest — tag_name is the authoritative version string, published_at is the authoritative publish date (same pattern as Codex and Gemini).

Codex CLI (`@openai/codex`)

Primary source — developers site (preferred; check this first):

Features: https://developers.openai.com/codex/cli/features
Skills: https://developers.openai.com/codex/skills
Slash commands: https://developers.openai.com/codex/cli/slash-commands
Agent approvals & security: https://developers.openai.com/codex/agent-approvals-security
AGENTS.md guide: https://developers.openai.com/codex/guides/agents-md
Hooks: https://developers.openai.com/codex/hooks (bare path — not under /cli/; codex/cli/hooks 404s)
Plugins overview: https://developers.openai.com/codex/plugins
Plugins — Build (authoritative for marketplace semantics): https://developers.openai.com/codex/plugins/build (the /plugins page describes installation only; codex plugin marketplace add argument types, manifest shape, and self-serve catalog rules live on /build. Any matrix claim about Codex marketplace flexibility must check this page — the 2026-05-15 run found Codex plugins flip from 🟡 to ✅ here.)
Subagents: https://developers.openai.com/codex/subagents (concepts) and https://developers.openai.com/codex/cli/subagents if it exists; check both
MCP: https://developers.openai.com/codex/mcp (canonical — replaces the old github.com/openai/codex/blob/main/docs/config.md link, which is now a 15-line redirect stub)
IDE extension: https://developers.openai.com/codex/ide
Config (basic / advanced / reference): https://developers.openai.com/codex/config-basic, /codex/config-advanced (legacy notify hook lives here), /codex/config-reference

Note: github.com/openai/codex/docs/*.md files are mostly one-line redirect stubs pointing at the developers site. Check the developers site first; use the GitHub repo only for README, CHANGELOG, and version signals.

Note (2026-05-15): The developers.openai.com site nav has expanded with new concept pages — Memories, Chronicle, Sandboxing, Auto-review, Subagents, Workflows, Plugins — that didn't exist on prior runs. None of these introduce new matrix rows (they refine concepts already covered), but if a future row addition touches one of those areas, the corresponding /codex/<topic> URL is the canonical source. Sandboxing details still live on the agent-approvals-security page.

GitHub repo (README, changelog, version):

Repo: https://github.com/openai/codex — README and /CHANGELOG.md
Latest release: gh api repos/openai/codex/releases/latest — tag_name is the authoritative version string, published_at is the authoritative publish date. The npm package page is intentionally not listed here: it returns 403 to WebFetch, so the GitHub Release API is the reliable source for version signals.

Antigravity CLI (`agy`) — successor to Gemini CLI, migrated 2026-06-10

Primary sources — the official GitHub repo (preferred; the docs site is not crawlable):

Repo: https://github.com/google-antigravity/antigravity-cli — officialness established by the transition blog linking it directly. README (features/integration/auth) and CHANGELOG.md are the cell-level primary sources; fetch via gh api repos/google-antigravity/antigravity-cli/contents/{path} -H "Accept: application/vnd.github.raw".
Latest release: gh api repos/google-antigravity/antigravity-cli/releases/latest — tag_name / published_at, same pattern as the other tools.
Transition blog: https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/ — the only source for "Skills, Hooks, Subagents, Extensions preserved" claims; treat as announcement-grade (⏳ in cells) until repo or docs corroborate.

Note (2026-06-10): antigravity.google is a JS-rendered SPA with a catch-all route — every path returns the same 22 KB shell (HTTP 200) rendering to no visible text, .md suffixes and llms.txt 404, and the sitemap (26 URLs) lists no CLI doc pages. Do not cite antigravity.google/docs/* for cell-level claims until it server-renders; re-probe each refresh (curl --compressed — the server gzips unconditionally). Five matrix cells are ⏳ for exactly this reason: plan mode, agent teams, hooks, web search, checkpoints/rewind.

Gemini CLI (`@google/gemini-cli`) — predecessor, enterprise-only after 2026-06-18

Primary source — geminicli.com/docs/ (preferred; check this first):

Subagents: https://geminicli.com/docs/core/subagents/
Remote agents: https://geminicli.com/docs/core/remote-agents/
Hooks: https://geminicli.com/docs/hooks/ and reference at https://geminicli.com/docs/hooks/reference/
Extensions: https://geminicli.com/docs/extensions/
Plan mode: https://geminicli.com/docs/cli/plan-mode/
Rewind: https://geminicli.com/docs/cli/rewind/
Checkpointing (opt-in shadow git): https://geminicli.com/docs/cli/checkpointing/
Skills: https://geminicli.com/docs/cli/skills/
Custom commands: https://geminicli.com/docs/cli/custom-commands/
GEMINI.md: https://geminicli.com/docs/cli/gemini-md/
Sandbox: https://geminicli.com/docs/cli/sandbox/
Git worktrees: https://geminicli.com/docs/cli/git-worktrees/
Model: https://geminicli.com/docs/cli/model/
Headless: https://geminicli.com/docs/cli/headless/
IDE integration: https://geminicli.com/docs/ide-integration/
Web search: https://geminicli.com/docs/tools/web-search/
MCP servers: https://geminicli.com/docs/tools/mcp-server/ (canonical — replaces older citations of github.com/google-gemini/gemini-cli/blob/main/docs/hooks/reference.md for MCP claims)
Settings: https://geminicli.com/docs/cli/settings/

Note: geminicli.com/docs/ is Gemini CLI's canonical docs site. The GitHub repo's docs/ tree is incomplete — for example, docs/cli/subagents.md does not exist in the repo even though user-definable subagents are a shipped, documented feature on geminicli.com/docs/core/subagents/. Always check the canonical site first. The GitHub repo is still the source of truth for README, CHANGELOG, and version signals.

This rule exists because the 2026-04-19 refresh surfaced a material sourcing failure: the 2026-04-11 run had the Gemini Sub-agents cell marked 🟡 ("built-in only") because it only checked the repo tree, which doesn't document user-defined subagents. The same run had the Checkpoints / rewind cell at 🟡 because it checked only docs/cli/checkpointing.md (the opt-in shadow-git system), missing the separate /rewind + Esc+Esc feature at geminicli.com/docs/cli/rewind/. Any matrix claim about Gemini CLI must cross-check the canonical site before being marked 🟡 or ❌.

GitHub repo (README, changelog, version):

Repo: https://github.com/google-gemini/gemini-cli
Latest release: gh api repos/google-gemini/gemini-cli/releases/latest — tag_name is the authoritative version string, published_at is the authoritative publish date. The npm package page is intentionally not listed here: it returns 403 to WebFetch, so the GitHub Release API is the reliable source for version signals.

Maintenance: The first time this skill runs for a given tool, the agent should verify each URL resolves. Broken URLs → update this file, then continue. This list is expected to drift; that's why it's in the skill, not in the matrix.

Fetching tips

These rules make the fetch phase reliable. They were surfaced on the 2026-04-11 refresh run and apply to every future run.

Prefer `gh api` raw-content over `WebFetch` for GitHub files

For anything under github.com/<owner>/<repo>/blob/..., use:

gh api repos/{owner}/{repo}/contents/{path} -H "Accept: application/vnd.github.raw"

WebFetch against blob URLs redirects to rendered HTML, and the extraction model summarizes the page instead of returning raw markdown. The gh api raw-content header returns clean markdown every time. Use this for every doc file, README, and CHANGELOG hosted on GitHub.

Claude Code docs serve raw markdown — curl the `.md` endpoint, skip `WebFetch` entirely

Discovered 2026-06-10: every code.claude.com/docs/en/<slug> page also serves clean raw markdown at code.claude.com/docs/en/<slug>.md (e.g. https://code.claude.com/docs/en/hooks.md), and https://code.claude.com/docs/llms.txt is a machine-readable index of all pages. This supersedes the old WebFetch + temp-file-grep procedure for Claude Code pages.

Procedure:

curl -sL --compressed "https://code.claude.com/docs/en/<slug>.md" -o /tmp/cc-<slug>.md — fan out the fetches in parallel (one Bash call with a for/&/wait loop works). All 18 tier-1-relevant pages fetched in seconds on 2026-06-10.
grep the local files directly — they're already markdown, no HTML stripping needed.
If a future .md fetch starts returning HTML or 404s, fall back to WebFetch + temp-file grep (the pre-2026-06-10 procedure; see git history of this file) and note the regression in the refresh report.

This also beats WebFetch on fidelity: you get source bytes, not a summarization model's paraphrase, which matters when confirming an exact event name or table row. The old guidance ("three pages trip the WebFetch output budget; narrow prompts don't help, persistence is the expected path") was accurate while it applied — it's preserved in git history if the .md endpoint ever disappears.

For non-GitHub HTML pages, `curl` + `sed` + `grep` beats `WebFetch`

developers.openai.com/codex/* and geminicli.com/docs/* pages run 90 KB–290 KB of HTML. WebFetch reliably trips its output budget on the larger ones, and the summarization model rewrites whatever it does return — fine when you trust the model, painful when you need to confirm a specific event name, matcher value, or table cell verbatim.

The faster pattern is curl -sL <url> -o /tmp/<slug>.html, then strip the chrome and pipe to grep for the keywords you care about. Concretely:

curl -sL https://developers.openai.com/codex/hooks -o /tmp/cx-hooks.html
grep -A 2000 '<article' /tmp/cx-hooks.html \
  | sed -E 's/<[^>]+>/ /g; s/&nbsp;/ /g; s/  +/ /g' \
  | tr -s '\n' \
  | grep -E '^.{15,}'

You can fan out the curl calls in parallel (one Bash tool call per URL in the same message), then grep each cached file independently. The whole tier-1 fetch phase ran in well under a minute on 2026-04-27 this way, and surfaced the verbatim text of the Codex hooks event table — including the PermissionRequest event that was missing from the matrix.

Use this for developers.openai.com/codex/*, geminicli.com/docs/*, and any other non-GitHub HTML doc. Keep gh api for GitHub-hosted markdown (it's already raw), and WebFetch for cases where you genuinely want a model summary rather than the source bytes.

For Gemini CLI, prefer `geminicli.com/docs/` over the GitHub repo

The GitHub repo's docs/ tree is an incomplete subset of Gemini CLI's canonical docs. Shipped, documented features — user-definable subagents, remote agents, the /rewind command — have no corresponding file in github.com/google-gemini/gemini-cli/blob/main/docs/cli/. Check geminicli.com/docs/ first for every Gemini claim. Fall back to the repo only when a page genuinely doesn't exist on the canonical site (rare).

Confirmed on 2026-04-19: the 2026-04-11 run's Gemini Sub-agents cell (🟡) and Checkpoints / rewind cell (🟡) were both sourcing errors, not facts — both flipped to ✅ once the canonical site was checked. Treat a 🟡 or ❌ on a Gemini row as a trigger to re-verify against geminicli.com/docs/ before trusting it.

Process

Run these steps for each target tool. Never batch writes — one proposed edit at a time, each approved before the next.

1. Fetch primary sources

Fetch the URLs from the canonical-sources section above, following the Fetching tips:

For GitHub-hosted files, use gh api repos/{owner}/{repo}/contents/{path} -H "Accept: application/vnd.github.raw".
For long Claude Code docs pages, pass a narrow extraction prompt to WebFetch to stay under the output budget.
For everything else (e.g., developers.openai.com/codex/*), use WebFetch (or the rubber-duck MCP if it's configured).
For version signals on Codex and Gemini CLI, call gh api repos/{owner}/{repo}/releases/latest and read tag_name / published_at.

If a fetch fails:

Note the failure in the refresh report
Try the next URL in the list — don't silently skip
Never substitute a vibes-based claim for a failed fetch

2. Diff against current claims

For each row in README.md's matrix that references this tool, verify the cell against the fetched docs. For each section in docs/skills.md, docs/hooks.md, docs/mcp.md, and docs/glossary.md that references this tool, verify the prose.

3. Categorize each change

Change	Severity	How to handle
⏳ → ✅ / 🟡 / ❌ (first verify)	normal	Expected on first run — fill the cell, add source link
✅ → ✅ with new URL	minor	URL rot — update the link, bump the date
✅ → 🟡 (feature degraded)	major	Flag as regression — document what moved and why in the report
✅ → ❌ (feature removed)	major	Flag prominently — this is the kind of thing people need to know
new feature exists	normal	Propose a new matrix row — but ask before adding it
new tool version / rename	major	Call out in the report header; may require updating other cells

4. Write a report

Produce a markdown summary before touching any files:

## Refresh report — <tool> — <YYYY-MM-DD>

Sources fetched:
- <url 1> ✓
- <url 2> ✓
- <url 3> ✗ (reason)

Proposed changes:

| Row / Section | Old | New | Source |
| ------------- | --- | --- | ------ |
| ...           | ... | ... | ...    |

Regressions flagged: <count, or "none">

5. Propose edits one at a time

Use the Edit tool. For each row:

Show the proposed change (old_string → new_string)
Explain which source backs it
Wait for approval
Move to the next row

Do not stage multiple edits in one batch — the user's trust in this matrix comes from being able to approve or reject each cell independently.

6. Bump "Last verified" dates

On every row / page you touched, even if the cell didn't change. "Verified and still correct as of <date>" is useful signal. Use today's date in YYYY-MM-DD format.

7. Run the gate

Run npm run gate. If check:links catches broken links anywhere, fix them before finishing. Do not hand back a refresh run that leaves the gate red.

8. Post-run self-check

After the gate passes, ask the user to flag any new friction discovered during the run:

Were any canonical URLs wrong, 404-ing, or returning unexpected content?
Did any WebFetch response exceed the output budget and require temp-file persistence?
Were any important doc paths missing from the canonical-sources list?
Did any step in the process (fetch, diff, approval flow, report format) feel painful?

If the user flags anything, offer to open a follow-up task — either as a new milestone in specs/plan.md or as a TODO at the top of this skill file. This closes the feedback loop so future improvements happen systematically, not reactively. Rationale: the 2026-04-11 refresh run surfaced six friction points that sat undocumented until a retrospective caught them. A self-check at the end of each run prevents that pattern from recurring.

Rules

NEVER edit silently. Every proposed change goes through human approval.
NEVER invent sources. If a claim can't be verified from the canonical URLs, leave the cell as-is and note it in the report.
NEVER delete historical claims without explanation. A feature disappearing is a story, not a silent deletion.
NEVER mark unverified cells as ✅. Use ⏳ until you have a primary-source link.
NEVER add Tier 2 tools as matrix columns. They belong in prose only.
NEVER use GitHub footnote syntax ([^name]) in matrix cells. VSCode's default markdown preview doesn't render footnotes, so the raw [^name] marker leaks into the rendered output. Always use reference-style links: [✅][cx-plan] in the cell, with [cx-plan]: url "short description" at the bottom of the file. Reference-style links render correctly in both GitHub and VSCode, and the link title attribute doubles as a hover tooltip for the source description.
NEVER encode exact counts for drift-prone metrics in matrix cells or hover text. Slash-command counts, hook-event counts, plugin counts, and similar "how many of X does this CLI have" numbers drift every release and cost one edit per refresh to maintain. Link to the authoritative source table instead (e.g. "built-in slash commands; user-defined via skills" rather than "31 built-in slash commands"). The actual count is one click away from the linked page. Rationale: the Codex slash-command count drifted 26 → 28 → 31 across three consecutive refreshes (2026-04-11, 2026-04-14, 2026-04-19), making the number itself a maintenance tax with no information value.
ALWAYS cross-check every 🟡 or ❌ Gemini cell against geminicli.com/docs/ before committing. The GitHub repo's docs/ tree is an incomplete subset — at least two Gemini cells (Sub-agents 🟡, Checkpoints / rewind 🟡) were mis-classified on the 2026-04-11 run because the canonical site wasn't checked. Treat a sub-optimal Gemini rating as a trigger to re-verify, not as a settled result. See Fetching tips.
ALWAYS use the canonical-site URL in matrix link refs and deep-dive Docs: lines, not the GitHub repo URL. For Gemini that's geminicli.com/docs/; for Codex that's developers.openai.com/codex/. The skill's own canonical-source rule applies to the matrix link refs themselves, not just the fetch phase. Exceptions: when a feature genuinely has no dedicated canonical-site page (e.g. Gemini image-paste UX as of 2026-05-15), citing the GitHub README is acceptable — but note the gap in the refresh report so future runs catch a new page if it appears. The 2026-05-15 run flipped 11 Gemini and 1 Codex matrix link refs from GitHub URLs to canonical-site URLs after a reader flagged that the rule was applied to fetches but not to citations.
ALWAYS bump the date on every row you verified, changed or not.
ALWAYS run npm run gate at the end. A green gate is part of the deliverable.

Final output

When done, produce a single summary suitable for pasting into a commit message:

refresh: <tools> — <YYYY-MM-DD>

Sources fetched: <n>
Changes proposed: <n>
Changes approved: <m>
Regressions: <count, or "none">
Link check: <pass/fail>

<one-paragraph changelog of what actually moved>

When to run

Monthly — tied to a /schedule cron if one is configured. See specs/plan.md for the cadence decision.
After a vendor release — new major version, blog post, changelog entry.
Before publishing externally — don't ship a stale matrix.
When opening a vendor PR that needs maintainer verification of a vendor-unverified claim.