codex-cli-review

name: codex-cli-review description: Run a code review through the locally-installed Codex CLI wrapper (`codex exec review`) as an unattended, artifact-producing second opinion. Use when a non-Codex-App caller needs isolated `CODEX_HOME`, wrapper sandboxing, a repository-state guard, `final.md`, stderr, and JSONL diagnostics for a PR, commit, branch diff, or working tree. Do not use for Codex App thread/fork review management; use codex-app-review for that. The wrapper fixes the reviewer to gpt-5.5 with xhigh reasoning; do not substitute cheaper or non-flagship models.

Delegate a code review to the user's locally-installed Codex CLI (codex binary, codex-cli >= 0.130.0). The skill enforces a non-interactive, externally-bounded invocation so the review can run unattended, and so Codex does not inherit permissive user config (in this user's case, sandbox = "danger-full-access"). By default the wrapper runs Codex in workspace-write mode with $TMPDIR redirected to the review artifact directory and a before/after repository-state guard; set CODEX_CLI_REVIEW_SANDBOX=read-only for static-only reviews.

This skill is for the Codex CLI wrapper, not Codex App thread management. Use codex-app-review when the user asks Codex App to fork, start, monitor, steer, or reuse another Codex thread for code review.

workspace-write is deliberately a verification mode, not a promise that every test runner can work. On macOS, Codex's sandbox can still block AF_UNIX sockets even inside writable temp directories. The wrapper therefore injects runtime guidance telling Codex to retry tsx-backed checks through the underlying TypeScript entrypoint with node --experimental-transform-types when possible, and to lower confidence when no equivalent check can run.

The commands and event-schema details in this skill were originally verified against codex-cli 0.130.0 and refreshed against codex-cli 0.132.0. If codex --version reports a major-version bump, re-validate the flags before trusting this skill.

When to use

Trigger when the user:

explicitly asks for /codex-cli-review, codex exec review, or the local Codex CLI wrapper
wants an unattended Codex review with final.md, stderr, JSONL diagnostics, and a repository-state guard
asks a non-Codex-App client, such as Claude Code, to obtain a Codex CLI review of a PR, commit, branch diff, or working tree
wants the project's AGENTS.md review rubric applied through the isolated CLI wrapper

When NOT to use

The user is asking a question, not requesting a review.
The user asks Codex App to create, fork, inspect, monitor, or steer another review thread. Use codex-app-review.
The diff is a single trivial file (<50 lines) — the setup overhead exceeds the value.
A codex-cli-review/run.sh or codex ... exec review process is already reviewing this same worktree (see Pre-flight below).
You are mid-edit and may modify files while Codex reads them.

Pre-flight

Run these before the first invocation in a session. If any fails, stop and report to the user — do not paper over a missing prerequisite.

which codex                              # confirms install
codex --version                          # record (skill expects >= 0.130.0)
codex login status                       # must say "Logged in"
pgrep -fl 'codex-cli-review/run.sh|codex .*exec review' | grep -v $$ || true

Treat the pgrep output as a conflict only when it appears to be another review-skill run, or another codex exec review, operating on this same worktree. The goal is to avoid two reviewers reading a moving diff or competing for the same review artifacts.

Do not ask the user to kill unrelated Codex processes such as an IDE/app-server daemon, remote-control daemon, or a separate codex --dangerously-bypass-approvals-and-sandbox session. Those use different process roles and may use a different $CODEX_HOME; they are not evidence that this review wrapper is unsafe to start. If the process list is ambiguous, report the matching command line and ask whether it is reviewing this checkout.

CLI semantics that drive the command shape

Two things about codex exec review shape every invocation in this skill:

The diff-scope flags (--commit, --base, --uncommitted) are mutually exclusive with a custom [PROMPT] argument. You pick exactly one mode:
- Structured mode — --commit SHA / --base BRANCH / --uncommitted, no prompt. Codex picks the diff itself and runs its built-in review prompt + AGENTS.md. Output is terse but Codex tends to read more files autonomously.
- Free-form mode — a custom prompt (positional arg or - for stdin), no diff flag. The scope is described in the prompt; Codex reads AGENTS.md and follows your instructions verbatim. Output is structured per the prompt.
--output-last-message FILE captures the final assistant message (markdown). This is the report. The JSONL event stream is for diagnostics, not for parsing the review.

This skill defaults to free-form mode because the structured 5-section report is what callers actually need to summarize. Structured mode is acceptable when AGENTS.md alone fully encodes the review rubric and a terse "looks fine / needs work" answer is sufficient.

How to invoke

All invocations go through the wrapper script run.sh. It builds a fresh $CODEX_HOME containing a sealed codex-config.toml (gpt-5.5 + xhigh reasoning + fast tier + fast_mode, never-approve, isolated memories) and symlinks to the user's auth.json and installation_id. The user's real ~/.codex/config.toml is never loaded.

The review model is fixed to gpt-5.5 with xhigh reasoning. Do not add prompt or environment overrides that downgrade the reviewer to mini, Spark, or another non-flagship model. If this fixed model is unavailable, stop and report the blocker instead of substituting.

Default mode is verification-capable: the wrapper passes -s workspace-write, redirects $TMPDIR to the review artifact directory, adds that directory as writable, and snapshots the repository before and after the run. If Codex modifies the repository, the wrapper exits non-zero and writes repo-before.txt / repo-after.txt for forensics. Use CODEX_CLI_REVIEW_SANDBOX=read-only only when you want a strict static review and accept that tools needing temp sockets/pipes (for example tsx) may fail.

In free-form mode, run.sh appends a short runtime-guidance block to the prompt. The added guidance is intentionally narrow: if a Node/TypeScript command fails only because tsx cannot bind its temp IPC pipe, Codex should inspect the package script and retry the same non-mutating check with Node's native TypeScript transform when available (for example, node --experimental-transform-types path/to/script.ts --help). If no equivalent fallback exists, the report must say exactly what remains unverified.

Limit of this fallback: it only helps for direct single-file invocations like tsx scripts/foo.ts .... Test suites that internally spawn a tsx child process (for example spawnSync('tsx', ['scripts/cli.ts', ...]) from a Node test runner) will still hit the same EPERM in the spawned child; the fallback does not transparently rewrite those spawn calls. When a test runner depends on spawning tsx, treat the suite as unverified and report it as such, rather than running a partial subset that may hide regressions.

# Free-form mode (default): prompt on stdin.
./resources/skills/codex-cli-review/run.sh < /path/to/prompt.md

# Show wrapper help without requiring Codex auth.
./resources/skills/codex-cli-review/run.sh --help

# Structured mode: pass diff-scope flags directly. No prompt.
./resources/skills/codex-cli-review/run.sh --commit <SHA>
./resources/skills/codex-cli-review/run.sh --base main
./resources/skills/codex-cli-review/run.sh --uncommitted

# Override hard timeout (default 1800s).
CODEX_CLI_REVIEW_ALARM=600 ./resources/skills/codex-cli-review/run.sh < prompt.md

# Static-only mode. Safer, but some verification commands may fail.
CODEX_CLI_REVIEW_SANDBOX=read-only ./resources/skills/codex-cli-review/run.sh < prompt.md

# Keep every artifact (events, codex-home/, repo snapshots, …) for debugging
# even on a successful run. Default is to trim them.
CODEX_CLI_REVIEW_KEEP_ARTIFACTS=1 ./resources/skills/codex-cli-review/run.sh < prompt.md

The wrapper prints a workdir path to stdout. Inside that workdir:

File	Contents	Kept on clean success?
`final.md`	Codex's last message — this is the report you summarize back to the user.	yes
`stderr.log`	Codex stderr (empty on success).	yes
`prompt-user.md`	The caller's original prompt (free-form mode only).	yes
`prompt.md`	The exact prompt fed to Codex: caller prompt plus runtime guidance (free-form mode only).	yes
`events.jsonl`	Full JSONL event stream for diagnostics.	no (trimmed)
`codex-home/`	The throwaway `$CODEX_HOME` used for this run. Inspect to confirm which auth files were symlinked.	no (trimmed)
`tmp/`	The runtime `TMPDIR` Codex was given.	no (trimmed)
`runtime-guidance.md`	The wrapper's runtime fallback guidance (also embedded in `prompt.md`).	no (trimmed)
`repo-before.txt` / `repo-after.txt`	Repository-state snapshots used by the guard. They should match.	no (trimmed)

By default (CODEX_CLI_REVIEW_KEEP_ARTIFACTS=0), a clean success deletes the "trimmed" rows above and leaves only the report and inputs (a few KB). Any failure path, including a guard-detected repo change, an alarm timeout, or a non-zero Codex exit, always preserves the full workdir for forensics regardless of this setting. Set CODEX_CLI_REVIEW_KEEP_ARTIFACTS=1 to also keep them on a clean success when you need the event stream or the temp CODEX_HOME for debugging.

Exit semantics: the wrapper returns 0 whenever final.md is non-empty — even if perl's SIGALRM fired during Codex's post-turn SQLite WAL flush (which can add up to ~2 minutes to wall-clock time after the review itself has already been written to disk). Exception: if the before/after repository-state guard reports a change, the wrapper exits 1 unconditionally — even when final.md is non-empty — because a modified repository is a higher-priority signal than a completed review, and the full workdir is preserved for forensics. Outside that exception, treat a missing or empty final.md as the only real failure signal.

What lives in `codex-config.toml`

The sealed config that the wrapper drops into the fresh $CODEX_HOME:

sandbox_mode = "read-only"
approval_policy = "never"

model = "gpt-5.5"
model_reasoning_effort = "xhigh"
service_tier = "fast"

[features]
fast_mode = true

[memories]
generate_memories = false
use_memories = false

Why these specific keys

Key	Why
`sandbox_mode = "read-only"`	Conservative config fallback. The wrapper overrides this to `workspace-write` by default so verification commands can create temp pipes/cache, then guards the repository with before/after snapshots.
`approval_policy = "never"`	No TTY prompts; unattended runs.
`model = "gpt-5.5"`	Fixed flagship review model. Codex CLI's own default is unstable across versions, so the wrapper pins the intended reviewer explicitly.
`model_reasoning_effort = "xhigh"`	Fixed extra-high reasoning for review quality. Do not lower for latency unless this skill is intentionally revised.
`service_tier = "fast"`	OpenAI back-end API queue priority. Independent of `fast_mode` below.
`[features] fast_mode = true`	Codex CLI's client-side Fast Mode feature flag.
`[memories] generate_memories = use_memories = false`	An ephemeral, isolated review should not read from or write to the user's memory store.

Why CODEX_HOME redirection instead of `-c key=value` overrides or a profile

-c overrides build a flag wall. Eight CLI flags ahead of exec review are unreadable, error-prone, and require TOML quoting (-c key='"value"') for string values.
Profiles are not enough. Per OpenAI's config reference, [profiles.NAME] blocks can override sandbox_mode and service_tier, but not model, model_reasoning_effort, or approval_policy. Those would still have to go on the CLI.
--ignore-user-config alone is also not enough. It correctly drops the user's sandbox = "danger-full-access" default, but also silently drops model, model_reasoning_effort, service_tier, and features.fast_mode — and there is no --config-file PATH flag to point at a replacement file (see openai/codex#7971).
CODEX_HOME redirection wins because the wrapper materializes a fresh $CODEX_HOME with exactly our config and only the auth files we choose to expose. No flag wall, no user-config mutation, and no surface area Codex could surprise us with from the user's setup.
Wrapper-level sandbox override is intentional. Config stays conservative, but run.sh passes -s workspace-write by default because static read-only mode blocks common test runners that create temp files. The wrapper also passes --add-dir "$WORKDIR" and redirects TMPDIR there, then fails if the repo changed.
The tsx fallback is prompt-level, not sandbox-level. workspace-write does not currently grant macOS AF_UNIX socket creation to tsx; when the underlying TypeScript entrypoint can run under Node's native transform, that is the preferred non-mutating verification fallback.

Picking the timeout

Scope	Suggested `alarm` value
Single small commit (`--commit`-style)	`300` (5 min)
Branch diff < 500 lines	`900` (15 min)
Branch diff 500–2000 lines	`1800` (30 min)
Anything larger	Run the planning pass first; do not run a single review pass beyond 30 min.

Running it

For anything beyond a single small commit, call the Bash tool with run_in_background: true. The harness will notify you when the wrapper exits — do not poll, do not sleep. While the run is in flight, you may tail -n 30 "$EVENT_LOG" once or twice to check progress; never poll in a loop.

Output handling

The three artifacts:

$FINAL_MSG (final.md) — the assistant's last message, in markdown. Read this. Summarize this. The prompt template enforces a 5-section structure; the report will follow it if Codex stayed on task. Always present after a successful run.
$EVENT_LOG (events.jsonl) — full JSONL event stream. Use only for diagnostics (which files Codex read, which commands it ran). Never paste verbatim to the user. Trimmed on a clean success by default; available while the run is in flight, on any failure path, or when CODEX_CLI_REVIEW_KEEP_ARTIFACTS=1. If a successful run is already over and you need the event log, re-run with CODEX_CLI_REVIEW_KEEP_ARTIFACTS=1 — the prior workdir's events.jsonl is gone.
$ERR_LOG (stderr.log) — empty on success. Non-empty means Codex itself errored (auth, network, schema). Check this first if final.md is missing or truncated. Always present.

The jq queries below assume events.jsonl exists — that is true during the run, on failure, and when CODEX_CLI_REVIEW_KEEP_ARTIFACTS=1. After a default clean-success run, the file is gone and these queries will fail with "No such file or directory"; that is not a Codex bug, that is the trim policy.

Useful jq queries against $EVENT_LOG:

# What commands did Codex run? (the proof that sandbox held)
jq -r 'select(.type=="item.completed" and .item.type=="command_execution") | .item.command' "$EVENT_LOG"

# Was there a final agent message at all?
jq -r 'select(.type=="item.completed" and .item.type=="agent_message") | .text' "$EVENT_LOG"

# Token usage (often 0 on ChatGPT-auth — subscription mode does not report)
jq -c 'select(.type=="turn.completed") | .usage' "$EVENT_LOG"

Event-stream schema (verified against codex-cli 0.130.0):

Top-level keys: type (string), plus payload keys per event type.
type values seen: thread.started, turn.started, item.started, item.completed, turn.completed.
item.type values seen: command_execution, agent_message.

When the user is on a ChatGPT subscription (codex login status says "Logged in using ChatGPT"), turn.completed.usage is typically all zeros — token accounting is not exposed via this path. Do not treat zero as "didn't run".

Summarizing back to the user

After the run completes:

Read $FINAL_MSG.
Quote or render the report as Codex emitted it; attribute it explicitly as "Codex's review", not yours. The prompt template targets a 5-section structure (Summary / Classification / File-level findings / Risks / Recommendation), but Codex may pick a different shape — relay it verbatim if the content is sound. Re-prompt with a stricter Output ONLY these section headers in this order clause only when downstream tooling needs to parse the report mechanically.
Include the path to $WORKDIR so the user can read the full report. On a default clean success the workdir holds only final.md, stderr.log, prompt-user.md, and prompt.md. The event log is available only on a failure path, while the run is in flight, or when the user re-runs with CODEX_CLI_REVIEW_KEEP_ARTIFACTS=1 — do not promise an events.jsonl you cannot point at.
Do not blend Codex's conclusions with your own opinions in the same paragraph. If you disagree with Codex, say so in a separate paragraph below.

Two-step pattern for large diffs

For PRs above ~1000 lines of diff or above ~10 files, run a planning pass first (see references/prompt-templates.md → Planning pass):

Plan pass — Codex lists the files it intends to review and the angles it will cover. No findings. ~2 min.
Show the plan to the user; let them prune or expand the scope.
Review pass — re-run with the agreed scope written into the prompt.

This avoids burning a 30-minute slot on a review that focused on the wrong files.

Failure detection

Check these in order; do not retry blindly.

Symptom	Likely cause	Action
exit code 142 from perl (SIGALRM), `final.md` empty	Hit the timeout	Switch to two-step pattern; do not just bump the timeout without user agreement.
`$ERR_LOG` contains `unauthorized` / `401`	Auth expired	Ask user to run `codex login`. Do not attempt to re-auth on their behalf.
`$ERR_LOG` contains `rate limit` / `429`	Quota	Stop. Report the stderr line to the user verbatim.
wrapper exits non-zero and reports `codex exited 0 but produced no output`	Codex finished but with no message	Inspect last `item.completed` in `$EVENT_LOG` for clues. Usually a too-vague prompt; tighten and retry.
`$FINAL_MSG` ignores the requested sections	Prompt was overridden by AGENTS.md content	Make the prompt's output-format clause more explicit; or run in structured mode and accept terser output.
`$FINAL_MSG` says a central check could not run because of `EPERM`, temp pipe/socket creation, or `command_blocked`	Sandbox too strict for the verification command	If the run used `CODEX_CLI_REVIEW_SANDBOX=read-only`, re-run in the default mode. If the default mode still blocks `tsx` temp IPC, retry the underlying TypeScript entrypoint with `node --experimental-transform-types ...` when available. Do not accept a merge recommendation that rests on an unverified central behavior.
wrapper exits non-zero and reports `repository state changed during review`	Codex or a verification command modified files	Inspect `repo-before.txt` / `repo-after.txt`, review the working tree, and do not summarize the review as clean until the change is understood.

What this skill does NOT do

Open a PR comment, post to GitHub, or apply Codex's suggestions. The output lives on disk; the user reads it or asks for a summary.
Run multiple Codex reviews in parallel.
Re-run after final.md is non-empty unless the user asks. If the review is unsatisfactory, change the prompt and run once more.
Generalize to other agents (Gemini, Cursor, etc.). Fork this skill if that becomes a need; do not expand in place.

codex-cli-review

codex-cli-review

When to use

When NOT to use

Pre-flight

CLI semantics that drive the command shape

How to invoke

What lives in codex-config.toml

Why these specific keys

Why CODEX_HOME redirection instead of -c key=value overrides or a profile

Picking the timeout

Running it

Output handling

Summarizing back to the user

Two-step pattern for large diffs

Failure detection

What this skill does NOT do

What lives in `codex-config.toml`

Why CODEX_HOME redirection instead of `-c key=value` overrides or a profile