name: codex-cli-review
description: Run a code review through the locally-installed Codex CLI wrapper (codex exec review) as an unattended, artifact-producing second opinion. Use when a non-Codex-App caller needs isolated CODEX_HOME, wrapper sandboxing, a repository-state guard, final.md, stderr, and JSONL diagnostics for a PR, commit, branch diff, or working tree. Do not use for Codex App thread/fork review management; use codex-app-review for that. The wrapper fixes the reviewer to gpt-5.5 with xhigh reasoning; do not substitute cheaper or non-flagship models.
codex-cli-review
Delegate a code review to the user's locally-installed Codex CLI (codex binary, codex-cli >= 0.130.0). The skill enforces a non-interactive, externally-bounded invocation so the review can run unattended, and so Codex does not inherit permissive user config (in this user's case, sandbox = "danger-full-access"). By default the wrapper runs Codex in workspace-write mode with $TMPDIR redirected to the review artifact directory and a before/after repository-state guard; set CODEX_CLI_REVIEW_SANDBOX=read-only for static-only reviews.
This skill is for the Codex CLI wrapper, not Codex App thread management. Use codex-app-review when the user asks Codex App to fork, start, monitor, steer, or reuse another Codex thread for code review.
workspace-write is deliberately a verification mode, not a promise that every test runner can work. On macOS, Codex's sandbox can still block AF_UNIX sockets even inside writable temp directories. The wrapper therefore injects runtime guidance telling Codex to retry tsx-backed checks through the underlying TypeScript entrypoint with node --experimental-transform-types when possible, and to lower confidence when no equivalent check can run.
The commands and event-schema details in this skill were originally verified against codex-cli 0.130.0 and refreshed against codex-cli 0.132.0. If codex --version reports a major-version bump, re-validate the flags before trusting this skill.
When to use
Trigger when the user:
- explicitly asks for
/codex-cli-review,codex exec review, or the local Codex CLI wrapper - wants an unattended Codex review with
final.md, stderr, JSONL diagnostics, and a repository-state guard - asks a non-Codex-App client, such as Claude Code, to obtain a Codex CLI review of a PR, commit, branch diff, or working tree
- wants the project's
AGENTS.mdreview rubric applied through the isolated CLI wrapper
When NOT to use
- The user is asking a question, not requesting a review.
- The user asks Codex App to create, fork, inspect, monitor, or steer another review thread. Use
codex-app-review. - The diff is a single trivial file (<50 lines) — the setup overhead exceeds the value.
- A
codex-cli-review/run.shorcodex ... exec reviewprocess is already reviewing this same worktree (see Pre-flight below). - You are mid-edit and may modify files while Codex reads them.
Pre-flight
Run these before the first invocation in a session. If any fails, stop and report to the user — do not paper over a missing prerequisite.
which codex # confirms install
codex --version # record (skill expects >= 0.130.0)
codex login status # must say "Logged in"
pgrep -fl 'codex-cli-review/run.sh|codex .*exec review' | grep -v $$ || true
Treat the pgrep output as a conflict only when it appears to be another review-skill run, or another codex exec review, operating on this same worktree. The goal is to avoid two reviewers reading a moving diff or competing for the same review artifacts.
Do not ask the user to kill unrelated Codex processes such as an IDE/app-server daemon, remote-control daemon, or a separate codex --dangerously-bypass-approvals-and-sandbox session. Those use different process roles and may use a different $CODEX_HOME; they are not evidence that this review wrapper is unsafe to start. If the process list is ambiguous, report the matching command line and ask whether it is reviewing this checkout.
CLI semantics that drive the command shape
Two things about codex exec review shape every invocation in this skill:
- The diff-scope flags (
--commit,--base,--uncommitted) are mutually exclusive with a custom[PROMPT]argument. You pick exactly one mode:- Structured mode —
--commit SHA/--base BRANCH/--uncommitted, no prompt. Codex picks the diff itself and runs its built-in review prompt + AGENTS.md. Output is terse but Codex tends to read more files autonomously. - Free-form mode — a custom prompt (positional arg or
-for stdin), no diff flag. The scope is described in the prompt; Codex reads AGENTS.md and follows your instructions verbatim. Output is structured per the prompt.
- Structured mode —
--output-last-message FILEcaptures the final assistant message (markdown). This is the report. The JSONL event stream is for diagnostics, not for parsing the review.
This skill defaults to free-form mode because the structured 5-section report is what callers actually need to summarize. Structured mode is acceptable when AGENTS.md alone fully encodes the review rubric and a terse "looks fine / needs work" answer is sufficient.
How to invoke
All invocations go through the wrapper script run.sh. It builds a fresh $CODEX_HOME containing a sealed codex-config.toml (gpt-5.5 + xhigh reasoning + fast tier + fast_mode, never-approve, isolated memories) and symlinks to the user's auth.json and installation_id. The user's real ~/.codex/config.toml is never loaded.
The review model is fixed to gpt-5.5 with xhigh reasoning. Do not add prompt or environment overrides that downgrade the reviewer to mini, Spark, or another non-flagship model. If this fixed model is unavailable, stop and report the blocker instead of substituting.
Default mode is verification-capable: the wrapper passes -s workspace-write, redirects $TMPDIR to the review artifact directory, adds that directory as writable, and snapshots the repository before and after the run. If Codex modifies the repository, the wrapper exits non-zero and writes repo-before.txt / repo-after.txt for forensics. Use CODEX_CLI_REVIEW_SANDBOX=read-only only when you want a strict static review and accept that tools needing temp sockets/pipes (for example tsx) may fail.
In free-form mode, run.sh appends a short runtime-guidance block to the prompt. The added guidance is intentionally narrow: if a Node/TypeScript command fails only because tsx cannot bind its temp IPC pipe, Codex should inspect the package script and retry the same non-mutating check with Node's native TypeScript transform when available (for example, node --experimental-transform-types path/to/script.ts --help). If no equivalent fallback exists, the report must say exactly what remains unverified.
Limit of this fallback: it only helps for direct single-file invocations like tsx scripts/foo.ts .... Test suites that internally spawn a tsx child process (for example spawnSync('tsx', ['scripts/cli.ts', ...]) from a Node test runner) will still hit the same EPERM in the spawned child; the fallback does not transparently rewrite those spawn calls. When a test runner depends on spawning tsx, treat the suite as unverified and report it as such, rather than running a partial subset that may hide regressions.
# Free-form mode (default): prompt on stdin.
./resources/skills/codex-cli-review/run.sh < /path/to/prompt.md
# Show wrapper help without requiring Codex auth.
./resources/skills/codex-cli-review/run.sh --help
# Structured mode: pass diff-scope flags directly. No prompt.
./resources/skills/codex-cli-review/run.sh --commit <SHA>
./resources/skills/codex-cli-review/run.sh --base main
./resources/skills/codex-cli-review/run.sh --uncommitted
# Override hard timeout (default 1800s).
CODEX_CLI_REVIEW_ALARM=600 ./resources/skills/codex-cli-review/run.sh < prompt.md
# Static-only mode. Safer, but some verification commands may fail.
CODEX_CLI_REVIEW_SANDBOX=read-only ./resources/skills/codex-cli-review/run.sh < prompt.md
# Keep every artifact (events, codex-home/, repo snapshots, …) for debugging
# even on a successful run. Default is to trim them.
CODEX_CLI_REVIEW_KEEP_ARTIFACTS=1 ./resources/skills/codex-cli-review/run.sh < prompt.md
The wrapper prints a workdir path to stdout. Inside that workdir:
| File | Contents | Kept on clean success? |
|---|---|---|
final.md |
Codex's last message — this is the report you summarize back to the user. | yes |
stderr.log |
Codex stderr (empty on success). | yes |
prompt-user.md |
The caller's original prompt (free-form mode only). | yes |
prompt.md |
The exact prompt fed to Codex: caller prompt plus runtime guidance (free-form mode only). | yes |
events.jsonl |
Full JSONL event stream for diagnostics. | no (trimmed) |
codex-home/ |
The throwaway $CODEX_HOME used for this run. Inspect to confirm which auth files were symlinked. |
no (trimmed) |
tmp/ |
The runtime TMPDIR Codex was given. |
no (trimmed) |
runtime-guidance.md |
The wrapper's runtime fallback guidance (also embedded in prompt.md). |
no (trimmed) |
repo-before.txt / repo-after.txt |
Repository-state snapshots used by the guard. They should match. | no (trimmed) |
By default (CODEX_CLI_REVIEW_KEEP_ARTIFACTS=0), a clean success deletes the "trimmed" rows above and leaves only the report and inputs (a few KB). Any failure path, including a guard-detected repo change, an alarm timeout, or a non-zero Codex exit, always preserves the full workdir for forensics regardless of this setting. Set CODEX_CLI_REVIEW_KEEP_ARTIFACTS=1 to also keep them on a clean success when you need the event stream or the temp CODEX_HOME for debugging.
Exit semantics: the wrapper returns 0 whenever final.md is non-empty — even if perl's SIGALRM fired during Codex's post-turn SQLite WAL flush (which can add up to ~2 minutes to wall-clock time after the review itself has already been written to disk). Exception: if the before/after repository-state guard reports a change, the wrapper exits 1 unconditionally — even when final.md is non-empty — because a modified repository is a higher-priority signal than a completed review, and the full workdir is preserved for forensics. Outside that exception, treat a missing or empty final.md as the only real failure signal.
What lives in codex-config.toml
The sealed config that the wrapper drops into the fresh $CODEX_HOME:
sandbox_mode = "read-only"
approval_policy = "never"
model = "gpt-5.5"
model_reasoning_effort = "xhigh"
service_tier = "fast"
[features]
fast_mode = true
[memories]
generate_memories = false
use_memories = false
Why these specific keys
| Key | Why |
|---|---|
sandbox_mode = "read-only" |
Conservative config fallback. The wrapper overrides this to workspace-write by default so verification commands can create temp pipes/cache, then guards the repository with before/after snapshots. |
approval_policy = "never" |
No TTY prompts; unattended runs. |
model = "gpt-5.5" |
Fixed flagship review model. Codex CLI's own default is unstable across versions, so the wrapper pins the intended reviewer explicitly. |
model_reasoning_effort = "xhigh" |
Fixed extra-high reasoning for review quality. Do not lower for latency unless this skill is intentionally revised. |
service_tier = "fast" |
OpenAI back-end API queue priority. Independent of fast_mode below. |
[features] fast_mode = true |
Codex CLI's client-side Fast Mode feature flag. |
[memories] generate_memories = use_memories = false |
An ephemeral, isolated review should not read from or write to the user's memory store. |
Why CODEX_HOME redirection instead of -c key=value overrides or a profile
-coverrides build a flag wall. Eight CLI flags ahead ofexec revieware unreadable, error-prone, and require TOML quoting (-c key='"value"') for string values.- Profiles are not enough. Per OpenAI's config reference,
[profiles.NAME]blocks can overridesandbox_modeandservice_tier, but notmodel,model_reasoning_effort, orapproval_policy. Those would still have to go on the CLI. --ignore-user-configalone is also not enough. It correctly drops the user'ssandbox = "danger-full-access"default, but also silently dropsmodel,model_reasoning_effort,service_tier, andfeatures.fast_mode— and there is no--config-file PATHflag to point at a replacement file (see openai/codex#7971).- CODEX_HOME redirection wins because the wrapper materializes a fresh
$CODEX_HOMEwith exactly our config and only the auth files we choose to expose. No flag wall, no user-config mutation, and no surface area Codex could surprise us with from the user's setup. - Wrapper-level sandbox override is intentional. Config stays conservative, but
run.shpasses-s workspace-writeby default because static read-only mode blocks common test runners that create temp files. The wrapper also passes--add-dir "$WORKDIR"and redirectsTMPDIRthere, then fails if the repo changed. - The
tsxfallback is prompt-level, not sandbox-level.workspace-writedoes not currently grant macOS AF_UNIX socket creation totsx; when the underlying TypeScript entrypoint can run under Node's native transform, that is the preferred non-mutating verification fallback.
Picking the timeout
| Scope | Suggested alarm value |
|---|---|
Single small commit (--commit-style) |
300 (5 min) |
| Branch diff < 500 lines | 900 (15 min) |
| Branch diff 500–2000 lines | 1800 (30 min) |
| Anything larger | Run the planning pass first; do not run a single review pass beyond 30 min. |
Running it
For anything beyond a single small commit, call the Bash tool with run_in_background: true. The harness will notify you when the wrapper exits — do not poll, do not sleep. While the run is in flight, you may tail -n 30 "$EVENT_LOG" once or twice to check progress; never poll in a loop.
Output handling
The three artifacts:
$FINAL_MSG(final.md) — the assistant's last message, in markdown. Read this. Summarize this. The prompt template enforces a 5-section structure; the report will follow it if Codex stayed on task. Always present after a successful run.$EVENT_LOG(events.jsonl) — full JSONL event stream. Use only for diagnostics (which files Codex read, which commands it ran). Never paste verbatim to the user. Trimmed on a clean success by default; available while the run is in flight, on any failure path, or whenCODEX_CLI_REVIEW_KEEP_ARTIFACTS=1. If a successful run is already over and you need the event log, re-run withCODEX_CLI_REVIEW_KEEP_ARTIFACTS=1— the prior workdir'sevents.jsonlis gone.$ERR_LOG(stderr.log) — empty on success. Non-empty means Codex itself errored (auth, network, schema). Check this first iffinal.mdis missing or truncated. Always present.
The jq queries below assume events.jsonl exists — that is true during the run, on failure, and when CODEX_CLI_REVIEW_KEEP_ARTIFACTS=1. After a default clean-success run, the file is gone and these queries will fail with "No such file or directory"; that is not a Codex bug, that is the trim policy.
Useful jq queries against $EVENT_LOG:
# What commands did Codex run? (the proof that sandbox held)
jq -r 'select(.type=="item.completed" and .item.type=="command_execution") | .item.command' "$EVENT_LOG"
# Was there a final agent message at all?
jq -r 'select(.type=="item.completed" and .item.type=="agent_message") | .text' "$EVENT_LOG"
# Token usage (often 0 on ChatGPT-auth — subscription mode does not report)
jq -c 'select(.type=="turn.completed") | .usage' "$EVENT_LOG"
Event-stream schema (verified against codex-cli 0.130.0):
- Top-level keys:
type(string), plus payload keys per event type. typevalues seen:thread.started,turn.started,item.started,item.completed,turn.completed.item.typevalues seen:command_execution,agent_message.
When the user is on a ChatGPT subscription (codex login status says "Logged in using ChatGPT"), turn.completed.usage is typically all zeros — token accounting is not exposed via this path. Do not treat zero as "didn't run".
Summarizing back to the user
After the run completes:
- Read
$FINAL_MSG. - Quote or render the report as Codex emitted it; attribute it explicitly as "Codex's review", not yours. The prompt template targets a 5-section structure (Summary / Classification / File-level findings / Risks / Recommendation), but Codex may pick a different shape — relay it verbatim if the content is sound. Re-prompt with a stricter
Output ONLY these section headers in this orderclause only when downstream tooling needs to parse the report mechanically. - Include the path to
$WORKDIRso the user can read the full report. On a default clean success the workdir holds onlyfinal.md,stderr.log,prompt-user.md, andprompt.md. The event log is available only on a failure path, while the run is in flight, or when the user re-runs withCODEX_CLI_REVIEW_KEEP_ARTIFACTS=1— do not promise anevents.jsonlyou cannot point at. - Do not blend Codex's conclusions with your own opinions in the same paragraph. If you disagree with Codex, say so in a separate paragraph below.
Two-step pattern for large diffs
For PRs above ~1000 lines of diff or above ~10 files, run a planning pass first (see references/prompt-templates.md → Planning pass):
- Plan pass — Codex lists the files it intends to review and the angles it will cover. No findings. ~2 min.
- Show the plan to the user; let them prune or expand the scope.
- Review pass — re-run with the agreed scope written into the prompt.
This avoids burning a 30-minute slot on a review that focused on the wrong files.
Failure detection
Check these in order; do not retry blindly.
| Symptom | Likely cause | Action |
|---|---|---|
exit code 142 from perl (SIGALRM), final.md empty |
Hit the timeout | Switch to two-step pattern; do not just bump the timeout without user agreement. |
$ERR_LOG contains unauthorized / 401 |
Auth expired | Ask user to run codex login. Do not attempt to re-auth on their behalf. |
$ERR_LOG contains rate limit / 429 |
Quota | Stop. Report the stderr line to the user verbatim. |
wrapper exits non-zero and reports codex exited 0 but produced no output |
Codex finished but with no message | Inspect last item.completed in $EVENT_LOG for clues. Usually a too-vague prompt; tighten and retry. |
$FINAL_MSG ignores the requested sections |
Prompt was overridden by AGENTS.md content | Make the prompt's output-format clause more explicit; or run in structured mode and accept terser output. |
$FINAL_MSG says a central check could not run because of EPERM, temp pipe/socket creation, or command_blocked |
Sandbox too strict for the verification command | If the run used CODEX_CLI_REVIEW_SANDBOX=read-only, re-run in the default mode. If the default mode still blocks tsx temp IPC, retry the underlying TypeScript entrypoint with node --experimental-transform-types ... when available. Do not accept a merge recommendation that rests on an unverified central behavior. |
wrapper exits non-zero and reports repository state changed during review |
Codex or a verification command modified files | Inspect repo-before.txt / repo-after.txt, review the working tree, and do not summarize the review as clean until the change is understood. |
What this skill does NOT do
- Open a PR comment, post to GitHub, or apply Codex's suggestions. The output lives on disk; the user reads it or asks for a summary.
- Run multiple Codex reviews in parallel.
- Re-run after
final.mdis non-empty unless the user asks. If the review is unsatisfactory, change the prompt and run once more. - Generalize to other agents (Gemini, Cursor, etc.). Fork this skill if that becomes a need; do not expand in place.