name: stellar-forge-test-harness-orchestrator description: Use whenever stellar-forge work mentions tests, coverage, harness, unit tests, contract tests, integration tests, heavy integration, production-grade validation, e2e, smoke, CI, flake, real tooling, live Stellar CLI, Docker, local network, testnet, deterministic, idempotent, resilient, dry-run, generated scaffold drift, JSON contracts, rerun, update, fix, improve, or partial re-run. This skill must orchestrate the specialist test team, tier heavy/live checks clearly, and preserve offline deterministic defaults.
Stellar Forge Test Harness Orchestrator
Purpose
Coordinate repo-wide test and resilience work for stellar-forge, a Rust CLI that generates Stellar workspaces and delegates chain-facing behavior to the official stellar CLI. The orchestrator separates normal deterministic PR checks from heavy local integration labs and explicit live/manual Stellar gates.
Phase 0: Context Check
- Read
AGENTS.MD,CLAUDE.MD,CONTRIBUTING.md,README.md, and relevant docs. - Inspect
.claude/agents/,.claude/skills/, current git status, and_workspace/test-harness/if present. - Decide execution mode:
- Initial run: build a wave plan and create
_workspace/test-harness/. - Partial rerun: use prior artifacts and touch only requested slices.
- New input: archive old artifacts under
_workspace/test-harness/previous/before regenerating.
- Initial run: build a wave plan and create
Execution Mode
Prefer an agent team when at least two specialists are useful. In Claude Code, use the project agent files. In Codex, use multi_agent subagents when explicitly authorized by the user or by a harness request, and keep write scopes disjoint.
Core team:
rust-cli-test-architectjson-contract-qagenerated-workspace-e2eidempotence-resilience-auditorci-quality-gate-auditorheavy-integration-lab-runnergenerated-stack-stress-qastellar-cli-live-smoke-qa
Use heavy-integration-lab-runner, generated-stack-stress-qa, and stellar-cli-live-smoke-qa when the request asks for real confidence, heavy integration, generated app validation, live tooling, Docker/local network, or testnet checks.
Data Flow
Write intermediate notes under _workspace/test-harness/:
00_inventory.md01_gap_matrix.md02_wave_plan.mdjson-contracts.mdgenerated-workspace.mdidempotence.mdci-gates.md05_heavy_integration_lab.mdgenerated-stack-stress.mdlive-stellar-smoke.md
Keep final user-facing changes in code, tests, docs, or CI. Do not rely on _workspace as the only deliverable.
Wave Strategy
- Baseline inventory: map current tests, commands, docs, and CI gates.
- P0 deterministic coverage: unit tests for pure helpers, JSON contract tests, dry-run purity, template matrix, idempotent sync.
- Offline E2E: temp workspaces with fake
stellar, package managers, HTTP loopback, and no live network. - Generated output drift: demo and scaffold parity, stale file policy, broad doctor/project checks.
- Heavy local integration: real Node/package-manager/browser/SQLite generated-stack checks in temp workspaces, with missing tools recorded as environment gates.
- Real Stellar CLI boundary: plugin discovery, doctor/project validation, release dry-run planning, and command handoff checks using the installed official
stellarCLI. - Optional live/manual gates: Docker/local network, testnet, pubnet, funded-account, and long release drills only when explicitly tiered.
Test Tier Contract
Report each tier separately:
- PR-safe offline: deterministic Rust tests, fake binaries, dry runs, temp workspaces, and no live network.
- Heavy local: real generated stack, package manager, browser, SQLite, and broad matrices; still avoids funded live network by default.
- Live/manual: official
stellarCLI, Docker/local network, testnet/pubnet, and account-dependent flows with explicit opt-in.
Do not claim "everything works" from only one tier. Say which tier passed, which tier was skipped, and why.
Error Handling
- Reproduce failures with the narrowest command.
- Identify the failing boundary before patching.
- If an external tool is missing, report it as a tooling gate and keep offline tests honest.
- If a heavy or live test fails, classify the boundary first: product code, JSON/report contract, generated artifact, fake/real external command, Node/package manager, browser, SQLite, Docker/local network, Stellar CLI, RPC/network, or account funding.
- If three consecutive fixes fail, stop and reconsider the harness slice rather than layering patches.
Completion Criteria
Before claiming done, run the narrowest relevant tests plus cargo fmt --all. For shared command, model, runtime, or template changes, prefer:
rtk cargo clippy --locked --workspace --all-targets --all-features -- -D warnings
rtk cargo test --locked
If a full gate is too slow or blocked by missing external tooling, say exactly what ran and what remains.
Test Scenarios
Normal flow: user asks to improve test coverage. The orchestrator inventories tests, picks one deterministic wave, dispatches specialists, integrates patches, and verifies with focused cargo commands.
Heavy flow: user asks for production-grade validation. The orchestrator probes tools, runs PR-safe gates, expands into heavy local generated-stack tests, then records live-only Stellar/Docker/testnet gates separately.
Error flow: a test fails because stellar, Node, Docker, Playwright, or sqlite3 is unavailable. The orchestrator classifies the dependency and either fakes it for offline PR coverage or moves it to a live/manual gate.