name: test-changes description: "Use when the user says 'test changes', 'run tests', 'test the build', or after execute-plan completes in a run-phase. Runs the project's build/test/lint suite and returns a filtered report with errors only. Branches on project type: Apple projects (.xcodeproj/.xcworkspace) run xcodebuild from the main session with diff-scoped -only-testing; other projects dispatch the dev-workflow:test-runner sub-agent." model: sonnet
Overview
Two execution paths:
- Apple path (project root contains
.xcodeprojor.xcworkspace): runxcodebuild testfrom the main session, scoped to suites in the recent diff, with simulator pre-flight + crash recovery. Do NOT dispatch a sub-agent. - Other projects: dispatch the
dev-workflow:test-runnersub-agent.
In both cases, write a structured report to .claude/test-reports/test-run-{ts}.md so run-phase parsing stays consistent.
Why "main session only" for Apple
The real constraint is single-process serialization of xcodebuild test, not anything Apple-specific about which process invokes it. Two concurrent xcodebuild test runs break in three ways:
- DerivedData
build.dbSQLite lock: both processes write to the same~/Library/Developer/Xcode/DerivedData/<Project>-<hash>/Build/Intermediates.noindex/XCBuildData/build.db, second writer fails withdatabase is locked. simctlstate is global: one process can shut down the simulator the other is using, mid-test.- CoreSimulator daemon contention: under multi-sim CPU load, the simulator's internal system apps (SpringBoard, MobileCal) can't finish launching within 30 seconds → FRONTBOARD process-launch watchdog SIGKILLs them → entire test run aborts.
The main session is single-threaded, so running xcodebuild test from there naturally serializes. Sub-agents could in principle run tests too — but the agent framework can dispatch sub-agents in parallel, and there's no mutex primitive across them. So the simplest enforceable rule is "main session only". This is an execution-discipline choice, not an Apple platform rule.
Process
Step 1: Detect Project Type
From project root, check for Apple project markers:
if ls *.xcodeproj *.xcworkspace 2>/dev/null | head -1 > /dev/null; then
echo "apple"
else
echo "other"
fi
If apple → Step 2A. Otherwise → Step 2B.
Step 2A: Apple Path (run from main session)
Run all xcodebuild commands directly via Bash from the main session. Never delegate to a sub-agent.
A1. Identify scheme
SCHEME=$(xcodebuild -list -json 2>/dev/null | python3 -c "
import sys, json
try:
d = json.load(sys.stdin)
schemes = (d.get('project') or d.get('workspace') or {}).get('schemes', [])
print(schemes[0] if schemes else '')
except Exception:
pass
")
if [[ -z "$SCHEME" ]]; then
echo "❌ No scheme detected. Configure a shared scheme in Xcode (Product → Scheme → Manage Schemes → check 'Shared')."
# Write minimal FAIL report at A8 and return.
fi
The python block tolerates malformed JSON, missing project/workspace keys, and empty schemes arrays — all return empty $SCHEME instead of a traceback.
A2. Derive -only-testing filters from diff
# Prefer uncommitted changes; if empty, fall back to the most recent commit.
CHANGED=$(git diff --name-only HEAD 2>/dev/null)
[[ -z "$CHANGED" ]] && CHANGED=$(git diff --name-only HEAD~1..HEAD 2>/dev/null)
# Filter to Swift test files: under any *Tests/ directory (root- or nested-), file ending in Tests.swift.
# Matches: LifuelTests/RecordHistoryRowTests.swift (root), LifuelTests/Sub/TimerStateTests.swift (nested),
# apps/X/MyAppTests/AuthTests.swift (deep). Drops non-test files like LifuelTests/Helper.swift.
TEST_FILES=$(echo "$CHANGED" | grep -E '(^|/)[^/]*Tests/.*Tests\.swift$' || true)
For each test file, derive -only-testing:<TestTarget>/<SuiteName>:
- TestTarget: walk up the path from the file; the deepest ancestor directory whose name ends in
Testsis the target (handlesapps/Tests/MyAppTests/Foo.swift→MyAppTests, notTests). - SuiteName: file basename without
.swift(e.g.,RecordHistoryRowTests).
ONLY_TESTING_FLAGS=""
# Use `while read` instead of `for f in` — paths with spaces (legal in Xcode group folders) word-split otherwise.
while IFS= read -r f; do
[[ -z "$f" ]] && continue
# Walk up the path, find the deepest ancestor matching *Tests.
TARGET=""
d=$(dirname "$f")
while [[ "$d" != "." && "$d" != "/" && -n "$d" ]]; do
base=$(basename "$d")
if [[ "$base" == *Tests ]]; then
TARGET="$base"
break
fi
d=$(dirname "$d")
done
[[ -z "$TARGET" ]] && continue
SUITE=$(basename "$f" .swift)
ONLY_TESTING_FLAGS+=" -only-testing:${TARGET}/${SUITE}"
done <<< "$TEST_FILES"
Fallback policy: if ONLY_TESTING_FLAGS is empty, do NOT run the full test suite — full-suite runs are 5–15 minutes on real projects and waste CI time when the diff doesn't touch any test files. Instead, run build only in A4 and report:
No *Tests.swift files in diff — ran build only. Re-invoke with explicit -only-testing targets to run tests.
A3. Simulator pre-flight
BOOTED=$(xcrun simctl list devices booted | grep -oE '[A-F0-9-]{36}')
COUNT=$(echo "$BOOTED" | grep -c .)
if [[ "$COUNT" -gt 1 ]]; then
xcrun simctl shutdown all
sleep 3
xcrun simctl boot "iPhone 16 Pro"
sleep 8
elif [[ "$COUNT" -eq 0 ]]; then
xcrun simctl boot "iPhone 16 Pro"
sleep 8
fi
BOOTED_UDID=$(xcrun simctl list devices booted | grep -oE '[A-F0-9-]{36}' | head -1)
[[ -z "$BOOTED_UDID" ]] && { echo "❌ Could not boot simulator"; exit 1; }
# Capture the actual device name for the report (the booted sim may not be iPhone 16 Pro
# if the user already had a different sim running before this skill ran).
DEVICE_NAME=$(xcrun simctl list devices | grep "$BOOTED_UDID" | head -1 | sed -E 's/^[[:space:]]+//; s/ \([A-F0-9-]+\).*$//')
Note: xcodebuild test shuts down the destination sim on completion (§9), so on the next invocation COUNT == 0 is normal — auto-boot recovers.
A4. Build (always run)
xcodebuild build \
-scheme "$SCHEME" \
-destination "platform=iOS Simulator,id=$BOOTED_UDID" \
-quiet 2>&1 | tee /tmp/test-changes-build.log
BUILD_EXIT=${PIPESTATUS[0]}
If BUILD_EXIT != 0: write report with build failure, skip A5, return.
A5. Test (only if ONLY_TESTING_FLAGS non-empty)
RESULT_BUNDLE="/tmp/test-run-$$.xcresult"
LOG="/tmp/test-changes-test.log"
xcodebuild test \
-scheme "$SCHEME" \
-destination "platform=iOS Simulator,id=$BOOTED_UDID" \
$ONLY_TESTING_FLAGS \
-resultBundlePath "$RESULT_BUNDLE" \
2>&1 | tee "$LOG"
TEST_EXIT=${PIPESTATUS[0]}
A6. Recovery + retry-once
After a non-zero exit, grep the log for crash/hang tokens (these can pair with various exit codes; plain assertion failures must NOT trigger recovery):
NEEDS_RECOVERY=0
TRIGGER_TOKEN=""
if [[ "$TEST_EXIT" -ne 0 ]]; then
TRIGGER_TOKEN=$(grep -oE '0x8BADF00D|IDELaunchiPhoneSimulatorLauncher|FRONTBOARD|RequestDenied|process-launch watchdog' "$LOG" | head -1)
[[ -n "$TRIGGER_TOKEN" ]] && NEEDS_RECOVERY=1
fi
if [[ "$NEEDS_RECOVERY" -eq 1 ]]; then
xcrun simctl shutdown all
killall -9 com.apple.CoreSimulator.CoreSimulatorService 2>/dev/null
sleep 5
xcrun simctl boot "iPhone 16 Pro"
sleep 8
BOOTED_UDID=$(xcrun simctl list devices booted | grep -oE '[A-F0-9-]{36}' | head -1)
DEVICE_NAME=$(xcrun simctl list devices | grep "$BOOTED_UDID" | head -1 | sed -E 's/^[[:space:]]+//; s/ \([A-F0-9-]+\).*$//')
# Retry once with same flags. Use a separate log so A8 can include both runs if needed.
RETRY_LOG="/tmp/test-changes-test-retry.log"
xcodebuild test \
-scheme "$SCHEME" \
-destination "platform=iOS Simulator,id=$BOOTED_UDID" \
$ONLY_TESTING_FLAGS \
-resultBundlePath "${RESULT_BUNDLE}-retry" \
2>&1 | tee "$RETRY_LOG"
TEST_EXIT=${PIPESTATUS[0]}
LOG="$RETRY_LOG" # subsequent filter (A7) operates on the retry's output
fi
If recovery retry also fails with the same tokens: report FAIL — simulator unrecoverable, manual intervention required. Do not retry a second time.
A7. Filter output
Keep only these line patterns in the report. Build phase reads /tmp/test-changes-build.log; test phase reads $LOG (which A6 reassigns to the retry log if recovery ran, so the filter sees the most recent test output):
- Compile errors/warnings:
error:,warning: - XCTest:
Test Case '-[…]' passed|failed - Swift Testing: lines starting with
✓or✗, plus◇ Test run - Suite summary:
Test Suite '…' passed|failed,Executed N tests, with M failures - Crash tokens:
0x8BADF00D,FRONTBOARD,RequestDenied
Drop everything else (compile noise, Copying / Linking lines, etc.). Cap each section at 200 lines; append ... (N more lines truncated) if exceeded.
A8. Write report
Create .claude/test-reports/ if missing. Write .claude/test-reports/test-run-{YYYY-MM-DDTHH-MM-SS}.md:
## Test Report
**Timestamp:** {ISO 8601}
**Project:** {root basename}
**Path:** apple
**Scheme:** {SCHEME}
**Simulator:** {DEVICE_NAME} ({BOOTED_UDID})
**Status:** {PASS | FAIL}
### Build
**Command:** `xcodebuild build -scheme {SCHEME} -destination "...,id={UDID}"`
**Result:** {PASS | FAIL} (exit code {N})
{filtered build output if FAIL}
### Tests
**Command:** `xcodebuild test -scheme {SCHEME} -destination "...,id={UDID}" {ONLY_TESTING_FLAGS}`
**Targets:** {list of -only-testing flags, or "skipped — no test files in diff"}
**Result:** {PASS | FAIL | SKIPPED} (exit code {N})
**Recovery:** {none | applied — {token that triggered}}
{filtered per-test pass/fail lines if FAIL}
### Lint
**Result:** SKIPPED (no lint configured for Apple projects)
A9. Plan-supplied commands — ignored on Apple path
The test-runner agent (non-Apple) extracts verification commands from the plan file. The Apple path ignores plan-supplied commands to keep all xcodebuild test invocations on the established invocation template (id=$BOOTED_UDID + -only-testing + isolated -resultBundlePath). A plan-supplied command using name= instead of id=, or omitting -only-testing, would re-introduce the failure modes Step 2A is designed to prevent. If the plan needs different test targets, surface that in the report and let the user decide.
Step 2B: Other Projects (dispatch sub-agent)
For non-Apple projects, dispatch dev-workflow:test-runner with model: "sonnet" (always pass explicitly):
Run the project's build, test, and lint suite.
Project root: {project root}
Plan file: {plan file path, or "none" if standalone}
The agent detects the project type (Node / Swift Package / Cargo / Go / Python) and writes a report to .claude/test-reports/.
Why pass model: "sonnet" explicitly: if the parent session runs a 1M-context variant (e.g., claude-opus-4-7[1m]), agent dispatch may inherit the 1M context flag and fail with API Error: Extra usage is required for 1M context. The explicit model: "sonnet" parameter (no [1m] suffix) forces standard 200k context and bypasses the billing gate. Standard sonnet context is sufficient for a single test-suite run.
Step 3: Process Results
- Read the report file (Apple path: written in A8; sub-agent path: from agent's return).
- If sub-agent didn't return a path, search
.claude/test-reports/test-run-*.mdand use the most recent.
- If sub-agent didn't return a path, search
- Present summary to user:
- Build: PASS/FAIL
- Tests: X/Y passed (Z failed) — for Apple path, also note
Recovery: appliedif it ran - Lint: PASS/FAIL/SKIPPED
- If any failures: show the filtered errors from the report (already filtered — do not re-filter).
Standalone mode (not within run-phase):
- All pass: "All build, test, and lint checks pass."
- All pass + 改动 > 0:追加 hint「下一步可
/review-execution做 4-lens 深审(correctness / test-coverage / breaking-changes / root-cause-depth)」 - Failures: present errors and suggest fixing in main context.
State Integration
When running within a phase orchestrated by run-phase:
- After report is written, do NOT update
phase_step(orchestrator owns state transitions) - Output the report path for run-phase to read
- Output: "Test run complete. Returning to run-phase."
Completion Criteria
- Project type detected (apple / other)
- Apple path: build + (test or skip-with-reason) executed from main session, recovery applied at most once if triggered, report written
- Other path: test-runner agent dispatched and returned, report read
- Summary presented to user
- When in run-phase context: report path output for orchestrator