name: running-tests description: > Safe test execution patterns for opencode-swarm. Covers when to use the test_runner tool vs shell bun commands, scope safety rules, per-file isolation loops (bash and PowerShell), pre-existing failure verification, CI log reading, and failure classification. Load this skill when you need to run tests — not when you need to write them (see writing-tests for authoring guidance).
Running Tests for opencode-swarm
This skill is about executing tests safely. For writing tests, see writing-tests.
⛔ The One Rule That Prevents Session Kills
Never use test_runner with more than one source file for any discovery scope.
graph and impact each fan out per file through the import tree; convention maps
each source file to a test file by name convention. The union quickly exceeds
MAX_SAFE_TEST_FILES = 50, triggering scope_exceeded, which causes LLMs to
cascade to scope: 'all' and kill the session. All three scopes now reject with
scope_exceeded before fan-out when sourceFiles.length > MAX_SAFE_SOURCE_FILES = 1.
Three-Layer Defense Against Session Blocking
test_runner has three pre-resolution guards that prevent unbounded fan-out from blocking the session:
Layer 1 — Source-file count guard (synchronous, fires before any I/O)
sourceFiles.length > MAX_SAFE_SOURCE_FILES (1) → returns scope_exceeded immediately. Catches the common case of multi-file calls before any filesystem access.
Layer 2 — Pre-resolution fan-out estimate (fast, ~100ms)
estimateFanOut(sourceFiles, workingDir) reads the cached impact map and counts unique test files without spawning subprocesses. If the estimate exceeds MAX_SAFE_TEST_FILES = 50, the call returns scope_exceeded immediately — before any graph traversal begins. Only fires when sourceFiles.length === 1 (Layer 1 has already passed).
Layer 3 — Budget-limited traversal + post-resolution length check
analyzeImpact accepts a budget parameter (MAX_SAFE_TEST_FILES = 50). The traversal stops as soon as it has visited 50 test files and sets budgetExceeded: true. The call site checks this flag and returns scope_exceeded before processing results.
After graph resolution, the final testFiles.length is additionally compared to MAX_SAFE_TEST_FILES. If exceeded, scope_exceeded is returned.
Result: When fan-out exceeds the safe threshold, the session gets outcome: 'scope_exceeded' instead of hanging.
Decision Tree: test_runner tool vs bun shell command
Do you need to run tests?
│
├─ Single test file, targeted validation
│ └─ Either works. Prefer shell: bun --smol test <file> --timeout 30000
│
├─ Multiple files in the same directory (e.g. all agents tests)
│ └─ Shell only — per-file loop. Never test_runner with multiple files.
│
├─ Find tests related to ONE changed source file
│ └─ test_runner is fine: { scope: 'graph', files: ['src/agents/coder.ts'] }
│ (single file → bounded fan-out)
│
├─ Find tests related to MULTIPLE changed source files
│ └─ Shell only — per-file loop over the changed files, or run the whole directory.
│ test_runner with any discovery scope + multiple source files = scope_exceeded
│ (guard fires before fan-out for convention, graph, and impact scopes).
│
└─ Validate the entire repo (pre-push)
└─ Shell only — 5-tier suite from commit-pr skill. Never test_runner scope:'all'.
Scope Safety Reference
| Scope | With files: [one] |
With files: [many] |
Notes |
|---|---|---|---|
'convention' |
✅ Safe | ❌ Rejected (scope_exceeded) |
Guard fires before fan-out; direct test file paths exempt |
'graph' |
✅ Safe (capped at 50 via budget) | ❌ Rejected (scope_exceeded) |
Two-layer guard: source-file count + fan-out estimate |
'impact' |
✅ Safe (capped at 50 via budget) | ❌ Rejected (scope_exceeded) |
Two-layer guard: source-file count + fan-out estimate |
'all' |
❌ Never | ❌ Never | Requires allow_full_suite: true; CI mirror only |
'all' |
❌ Never | ❌ Never | Requires allow_full_suite: true; CI mirror only |
Rule of thumb: Pass exactly one source file to test_runner. For multiple files, use a shell loop.
Per-File Isolation Loops
CI runs agents/tools/services in per-file isolation (one bun --smol process per file).
Reproduce this locally with the following loops.
bash (Linux / macOS)
# Single directory — per-file isolation
for f in tests/unit/agents/*.test.ts; do
bun --smol test "$f" --timeout 30000
done
# Multiple directories
for dir in tests/unit/tools tests/unit/services tests/unit/agents; do
for f in "$dir"/*.test.ts; do
bun --smol test "$f" --timeout 30000
done
done
# Stop on first failure (useful for debugging)
for f in tests/unit/agents/*.test.ts; do
bun --smol test "$f" --timeout 30000 || { echo "FAILED: $f"; break; }
done
PowerShell (Windows)
# Single directory — per-file isolation
Get-ChildItem tests/unit/agents/*.test.ts | ForEach-Object {
bun --smol test $_.FullName --timeout 30000
}
# Multiple directories
@('tests/unit/tools', 'tests/unit/services', 'tests/unit/agents') | ForEach-Object {
Get-ChildItem "$_/*.test.ts" | ForEach-Object {
bun --smol test $_.FullName --timeout 30000
}
}
# Capture output (avoids truncation on large output)
Get-ChildItem tests/unit/agents/*.test.ts | ForEach-Object {
bun --smol test $_.FullName --timeout 30000
} | Out-File "$env:TEMP\test_out.txt"
Get-Content "$env:TEMP\test_out.txt" | Select-Object -Last 50
Common PowerShell pitfalls:
for f in ...; do— invalid, useGet-ChildItem | ForEach-ObjectSelect-String -Last N— invalid parameter, useSelect-Object -Last N2>&1 2>&1— duplicate redirection, causes parse error; use2>&1once&&— not supported in PowerShell 5.1; use; if ($?) { cmd2 }instead- After
bun install --frozen-lockfile --force, non-elevated Windows shells can hitEPERMwhile reading refreshednode_modulesentries. Treat that as a host permission/access issue: rerun the same focused Bun command with approved/elevated access before diagnosing it as a code or test failure.
Batch vs Per-File: Which Directories Need Isolation?
| Directory | Mode | Reason |
|---|---|---|
tests/unit/tools/ |
Per-file loop | Heavy mock.module usage; cache poisoning risk |
tests/unit/services/ |
Per-file loop | Same |
tests/unit/agents/ |
Per-file loop | Same |
tests/unit/hooks/ |
Per-file loop | Same |
tests/unit/cli/ |
Batch OK | Fewer mock conflicts |
tests/unit/commands/ |
Batch OK | Fewer mock conflicts |
tests/unit/config/ |
Batch OK | Fewer mock conflicts |
tests/integration/ |
Batch OK | Integration fixtures, not mock-heavy |
tests/security/ |
Batch OK | Adversarial inputs, no module mocks |
tests/smoke/ |
Batch OK | Built-package tests |
Truncated Output Recovery
When bun test output exceeds the bash tool's buffer, it is saved to a file with an ID
like tool_dff778.... This ID format is not accepted by retrieve_summary (which only
reads S1, S2 etc. format IDs). The output is effectively lost.
Prevention — pipe to a file explicitly:
# PowerShell
bun --smol test tests/unit/agents --timeout 60000 |
Out-File "$env:TEMP\test_out.txt"
Get-Content "$env:TEMP\test_out.txt" | Select-Object -Last 50
# bash
bun --smol test tests/unit/agents --timeout 60000 2>&1 | tee /tmp/test_out.txt
tail -50 /tmp/test_out.txt
To get a clean pass/fail summary only, filter immediately:
# PowerShell — show only summary lines
bun --smol test tests/unit/agents --timeout 60000 |
Select-String "pass|fail|error" |
Select-Object -Last 10
# bash
bun --smol test tests/unit/agents --timeout 60000 2>&1 | grep -E "pass|fail|error" | tail -10
Verifying Pre-Existing Failures
Before documenting a failure as "pre-existing," prove it exists on main without affecting
your working tree. Use a Git worktree — safer than git stash (stash can drop untracked
files, fail on locked files on Windows, and leave you in an inconsistent state).
# bash — create a throwaway checkout of main
git worktree add /tmp/repro-check origin/main
bun --smol test /tmp/repro-check/tests/unit/agents/architect-workflow-security.test.ts --timeout 30000
git worktree remove /tmp/repro-check
# PowerShell — same pattern (use Join-Path for robust separator handling)
git worktree add "$env:TEMP\repro-check" origin/main
$testPath = Join-Path "$env:TEMP\repro-check" "tests\unit\agents\architect-workflow-security.test.ts"
bun --smol test $testPath --timeout 30000
git worktree remove "$env:TEMP\repro-check"
Decision after checking:
- Fails on
maintoo → pre-existing. Document under## Pre-existing failuresin PR body. Continue. - Fails only on your branch → you introduced it. Fix before pushing.
⚠️ Check your own session history first. Before documenting anything as pre-existing, confirm you did not fix or update this test earlier in the current session. A test you fixed 20 messages ago is not pre-existing — listing it as such in the table or PR body is incorrect and will be caught in review.
Failure Classification
Not all failures are equal. Before deciding what to do, classify the failure:
| Class | Definition | Example | What to do |
|---|---|---|---|
| Stale assertion | Test checks for text/value that was deliberately removed | expect(prompt).toContain('CONSTRAINT: [what NOT to do]') — template removed in refactor |
Update the assertion to match current state |
| Soft regression indicator | Test checks a threshold the codebase has since exceeded | expect(tokenCount).toBeLessThan(35000) — prompt grew past limit |
Fix the threshold or reduce the prompt; do not just document and ignore |
| Genuine pre-existing | Failure exists on main unrelated to any recent change |
full-auto-intercept.test.ts logger gating issue |
Document in PR body; do not fix unless scoped |
| New regression | Failure introduced by your changes | Tests for prompt text you removed without updating tests | Fix before pushing |
Stale assertions and soft regression indicators are actionable — they signal drift between tests and code. Genuine pre-existing failures are not your responsibility to fix in this PR, but they must be documented.
Reading CI Failure Logs
When a CI job fails, the GitHub Actions log shows the exact file:line of the failure.
Do not guess — read the log.
# Get the failing job URL from the PR
gh pr view <number> --json statusCheckRollup --jq '.statusCheckRollup[] | select(.conclusion=="FAILURE") | .detailsUrl'
# Fetch and search the log (if gh CLI available)
gh run view --log <run-id> | grep -E "FAIL|error" | head -20
Or open the detailsUrl directly in a browser / via WebFetch and search for:
(fail)— Bun test failure markererror:— parse or runtime errorat <anonymous>— stack frame pointing to the test file and line
Once you have tests/unit/agents/some-file.test.ts:354, reproduce locally:
bun --smol test tests/unit/agents/some-file.test.ts --timeout 30000
Quick Reference: Common Failures and Causes
| Symptom | Likely cause | Fix |
|---|---|---|
scope_exceeded returned from test_runner |
Fan-out exceeded 50 test files during graph/impact resolution | Switch to per-file shell loop; reduce changed-files scope |
| Session killed during test_runner | Pre-fix: unbounded fan-out on multiple files | Now returns scope_exceeded instead — no more session kills |
mock.module breaks unrelated tests |
Missing spread of real module exports | Add ...realModule spread |
| Windows tests fail with EBUSY | mock.restore() called while child process holds lock |
Add test.skipIf(process.platform === 'win32') |
| Test output truncated, ID unreadable | Bash tool buffer exceeded | Pipe to Out-File/tee explicitly |
for f in ...; do parse error |
Bash syntax in PowerShell | Use `Get-ChildItem |
Select-String -Last N error |
Invalid PowerShell parameter | Use Select-Object -Last N |
| Token budget test failure | Prompt grew past hardcoded threshold | Treat as soft regression; update threshold |
| CONSTRAINT assertion fails after refactor | Test checks for removed format template | Update assertion to match current prompt |
package-check CI failure |
package-check validates the npm tarball (npm pack + tarball contents) — a source/build/package-manifest problem, not generated-file drift. dist/ is generated and NOT committed — do not stage it; run bun run build locally only when you need the bundle. There is no longer a committed-dist drift check. |