running-tests

star 351

Safe test execution patterns for opencode-swarm. Covers when to use the test_runner tool vs shell bun commands, scope safety rules, per-file isolation loops (bash and PowerShell), pre-existing failure verification, CI log reading, and failure classification. Load this skill when you need to run tests — not when you need to write them (see writing-tests for authoring guidance).

zaxbysauce By zaxbysauce schedule Updated 6/5/2026

name: running-tests description: > Safe test execution patterns for opencode-swarm. Covers when to use the test_runner tool vs shell bun commands, scope safety rules, per-file isolation loops (bash and PowerShell), pre-existing failure verification, CI log reading, and failure classification. Load this skill when you need to run tests — not when you need to write them (see writing-tests for authoring guidance).

Running Tests for opencode-swarm

This skill is about executing tests safely. For writing tests, see writing-tests.


⛔ The One Rule That Prevents Session Kills

Never use test_runner with more than one source file for any discovery scope.

graph and impact each fan out per file through the import tree; convention maps each source file to a test file by name convention. The union quickly exceeds MAX_SAFE_TEST_FILES = 50, triggering scope_exceeded, which causes LLMs to cascade to scope: 'all' and kill the session. All three scopes now reject with scope_exceeded before fan-out when sourceFiles.length > MAX_SAFE_SOURCE_FILES = 1.


Three-Layer Defense Against Session Blocking

test_runner has three pre-resolution guards that prevent unbounded fan-out from blocking the session:

Layer 1 — Source-file count guard (synchronous, fires before any I/O)

sourceFiles.length > MAX_SAFE_SOURCE_FILES (1) → returns scope_exceeded immediately. Catches the common case of multi-file calls before any filesystem access.

Layer 2 — Pre-resolution fan-out estimate (fast, ~100ms)

estimateFanOut(sourceFiles, workingDir) reads the cached impact map and counts unique test files without spawning subprocesses. If the estimate exceeds MAX_SAFE_TEST_FILES = 50, the call returns scope_exceeded immediately — before any graph traversal begins. Only fires when sourceFiles.length === 1 (Layer 1 has already passed).

Layer 3 — Budget-limited traversal + post-resolution length check

analyzeImpact accepts a budget parameter (MAX_SAFE_TEST_FILES = 50). The traversal stops as soon as it has visited 50 test files and sets budgetExceeded: true. The call site checks this flag and returns scope_exceeded before processing results. After graph resolution, the final testFiles.length is additionally compared to MAX_SAFE_TEST_FILES. If exceeded, scope_exceeded is returned.

Result: When fan-out exceeds the safe threshold, the session gets outcome: 'scope_exceeded' instead of hanging.


Decision Tree: test_runner tool vs bun shell command

Do you need to run tests?
│
├─ Single test file, targeted validation
│   └─ Either works. Prefer shell: bun --smol test <file> --timeout 30000
│
├─ Multiple files in the same directory (e.g. all agents tests)
│   └─ Shell only — per-file loop. Never test_runner with multiple files.
│
├─ Find tests related to ONE changed source file
│   └─ test_runner is fine: { scope: 'graph', files: ['src/agents/coder.ts'] }
│      (single file → bounded fan-out)
│
├─ Find tests related to MULTIPLE changed source files
│   └─ Shell only — per-file loop over the changed files, or run the whole directory.
│      test_runner with any discovery scope + multiple source files = scope_exceeded
│      (guard fires before fan-out for convention, graph, and impact scopes).
│
└─ Validate the entire repo (pre-push)
    └─ Shell only — 5-tier suite from commit-pr skill. Never test_runner scope:'all'.

Scope Safety Reference

Scope With files: [one] With files: [many] Notes
'convention' ✅ Safe ❌ Rejected (scope_exceeded) Guard fires before fan-out; direct test file paths exempt
'graph' ✅ Safe (capped at 50 via budget) ❌ Rejected (scope_exceeded) Two-layer guard: source-file count + fan-out estimate
'impact' ✅ Safe (capped at 50 via budget) ❌ Rejected (scope_exceeded) Two-layer guard: source-file count + fan-out estimate
'all' ❌ Never ❌ Never Requires allow_full_suite: true; CI mirror only
'all' ❌ Never ❌ Never Requires allow_full_suite: true; CI mirror only

Rule of thumb: Pass exactly one source file to test_runner. For multiple files, use a shell loop.


Per-File Isolation Loops

CI runs agents/tools/services in per-file isolation (one bun --smol process per file). Reproduce this locally with the following loops.

bash (Linux / macOS)

# Single directory — per-file isolation
for f in tests/unit/agents/*.test.ts; do
  bun --smol test "$f" --timeout 30000
done

# Multiple directories
for dir in tests/unit/tools tests/unit/services tests/unit/agents; do
  for f in "$dir"/*.test.ts; do
    bun --smol test "$f" --timeout 30000
  done
done

# Stop on first failure (useful for debugging)
for f in tests/unit/agents/*.test.ts; do
  bun --smol test "$f" --timeout 30000 || { echo "FAILED: $f"; break; }
done

PowerShell (Windows)

# Single directory — per-file isolation
Get-ChildItem tests/unit/agents/*.test.ts | ForEach-Object {
  bun --smol test $_.FullName --timeout 30000
}

# Multiple directories
@('tests/unit/tools', 'tests/unit/services', 'tests/unit/agents') | ForEach-Object {
  Get-ChildItem "$_/*.test.ts" | ForEach-Object {
    bun --smol test $_.FullName --timeout 30000
  }
}

# Capture output (avoids truncation on large output)
Get-ChildItem tests/unit/agents/*.test.ts | ForEach-Object {
  bun --smol test $_.FullName --timeout 30000
} | Out-File "$env:TEMP\test_out.txt"
Get-Content "$env:TEMP\test_out.txt" | Select-Object -Last 50

Common PowerShell pitfalls:

  • for f in ...; do — invalid, use Get-ChildItem | ForEach-Object
  • Select-String -Last N — invalid parameter, use Select-Object -Last N
  • 2>&1 2>&1 — duplicate redirection, causes parse error; use 2>&1 once
  • && — not supported in PowerShell 5.1; use ; if ($?) { cmd2 } instead
  • After bun install --frozen-lockfile --force, non-elevated Windows shells can hit EPERM while reading refreshed node_modules entries. Treat that as a host permission/access issue: rerun the same focused Bun command with approved/elevated access before diagnosing it as a code or test failure.

Batch vs Per-File: Which Directories Need Isolation?

Directory Mode Reason
tests/unit/tools/ Per-file loop Heavy mock.module usage; cache poisoning risk
tests/unit/services/ Per-file loop Same
tests/unit/agents/ Per-file loop Same
tests/unit/hooks/ Per-file loop Same
tests/unit/cli/ Batch OK Fewer mock conflicts
tests/unit/commands/ Batch OK Fewer mock conflicts
tests/unit/config/ Batch OK Fewer mock conflicts
tests/integration/ Batch OK Integration fixtures, not mock-heavy
tests/security/ Batch OK Adversarial inputs, no module mocks
tests/smoke/ Batch OK Built-package tests

Truncated Output Recovery

When bun test output exceeds the bash tool's buffer, it is saved to a file with an ID like tool_dff778.... This ID format is not accepted by retrieve_summary (which only reads S1, S2 etc. format IDs). The output is effectively lost.

Prevention — pipe to a file explicitly:

# PowerShell
bun --smol test tests/unit/agents --timeout 60000 |
  Out-File "$env:TEMP\test_out.txt"
Get-Content "$env:TEMP\test_out.txt" | Select-Object -Last 50
# bash
bun --smol test tests/unit/agents --timeout 60000 2>&1 | tee /tmp/test_out.txt
tail -50 /tmp/test_out.txt

To get a clean pass/fail summary only, filter immediately:

# PowerShell — show only summary lines
bun --smol test tests/unit/agents --timeout 60000 |
  Select-String "pass|fail|error" |
  Select-Object -Last 10
# bash
bun --smol test tests/unit/agents --timeout 60000 2>&1 | grep -E "pass|fail|error" | tail -10

Verifying Pre-Existing Failures

Before documenting a failure as "pre-existing," prove it exists on main without affecting your working tree. Use a Git worktree — safer than git stash (stash can drop untracked files, fail on locked files on Windows, and leave you in an inconsistent state).

# bash — create a throwaway checkout of main
git worktree add /tmp/repro-check origin/main
bun --smol test /tmp/repro-check/tests/unit/agents/architect-workflow-security.test.ts --timeout 30000
git worktree remove /tmp/repro-check
# PowerShell — same pattern (use Join-Path for robust separator handling)
git worktree add "$env:TEMP\repro-check" origin/main
$testPath = Join-Path "$env:TEMP\repro-check" "tests\unit\agents\architect-workflow-security.test.ts"
bun --smol test $testPath --timeout 30000
git worktree remove "$env:TEMP\repro-check"

Decision after checking:

  • Fails on main too → pre-existing. Document under ## Pre-existing failures in PR body. Continue.
  • Fails only on your branch → you introduced it. Fix before pushing.

⚠️ Check your own session history first. Before documenting anything as pre-existing, confirm you did not fix or update this test earlier in the current session. A test you fixed 20 messages ago is not pre-existing — listing it as such in the table or PR body is incorrect and will be caught in review.


Failure Classification

Not all failures are equal. Before deciding what to do, classify the failure:

Class Definition Example What to do
Stale assertion Test checks for text/value that was deliberately removed expect(prompt).toContain('CONSTRAINT: [what NOT to do]') — template removed in refactor Update the assertion to match current state
Soft regression indicator Test checks a threshold the codebase has since exceeded expect(tokenCount).toBeLessThan(35000) — prompt grew past limit Fix the threshold or reduce the prompt; do not just document and ignore
Genuine pre-existing Failure exists on main unrelated to any recent change full-auto-intercept.test.ts logger gating issue Document in PR body; do not fix unless scoped
New regression Failure introduced by your changes Tests for prompt text you removed without updating tests Fix before pushing

Stale assertions and soft regression indicators are actionable — they signal drift between tests and code. Genuine pre-existing failures are not your responsibility to fix in this PR, but they must be documented.


Reading CI Failure Logs

When a CI job fails, the GitHub Actions log shows the exact file:line of the failure. Do not guess — read the log.

# Get the failing job URL from the PR
gh pr view <number> --json statusCheckRollup --jq '.statusCheckRollup[] | select(.conclusion=="FAILURE") | .detailsUrl'

# Fetch and search the log (if gh CLI available)
gh run view --log <run-id> | grep -E "FAIL|error" | head -20

Or open the detailsUrl directly in a browser / via WebFetch and search for:

  • (fail) — Bun test failure marker
  • error: — parse or runtime error
  • at <anonymous> — stack frame pointing to the test file and line

Once you have tests/unit/agents/some-file.test.ts:354, reproduce locally:

bun --smol test tests/unit/agents/some-file.test.ts --timeout 30000

Quick Reference: Common Failures and Causes

Symptom Likely cause Fix
scope_exceeded returned from test_runner Fan-out exceeded 50 test files during graph/impact resolution Switch to per-file shell loop; reduce changed-files scope
Session killed during test_runner Pre-fix: unbounded fan-out on multiple files Now returns scope_exceeded instead — no more session kills
mock.module breaks unrelated tests Missing spread of real module exports Add ...realModule spread
Windows tests fail with EBUSY mock.restore() called while child process holds lock Add test.skipIf(process.platform === 'win32')
Test output truncated, ID unreadable Bash tool buffer exceeded Pipe to Out-File/tee explicitly
for f in ...; do parse error Bash syntax in PowerShell Use `Get-ChildItem
Select-String -Last N error Invalid PowerShell parameter Use Select-Object -Last N
Token budget test failure Prompt grew past hardcoded threshold Treat as soft regression; update threshold
CONSTRAINT assertion fails after refactor Test checks for removed format template Update assertion to match current prompt
package-check CI failure package-check validates the npm tarball (npm pack + tarball contents) — a source/build/package-manifest problem, not generated-file drift. dist/ is generated and NOT committed — do not stage it; run bun run build locally only when you need the bundle. There is no longer a committed-dist drift check.
Install via CLI
npx skills add https://github.com/zaxbysauce/opencode-swarm --skill running-tests
Repository Details
star Stars 351
call_split Forks 35
navigation Branch main
article Path SKILL.md
More from Creator