testbox - SKILL.md Agent Skill

name: testbox description: Run CI commands against local changes in a real Semaphore environment. Use when the user wants to test code before pushing, run tests in CI env, or iterate fast on fixes — e.g. "run tests", "test before pushing", "try this in CI", "does this pass", "test my changes", "run the build", "check if tests pass", "spin up CI environment". user-invocable: false

Testbox — Run CI locally in a real Semaphore environment

Testbox creates a Semaphore CI job you can run commands against via SSH. Same machine type, OS image, secrets, and cache as your real pipelines. Zero new infrastructure.

Workflow

1. Warm up (once per session)

sem-ai testbox warmup --project my-app

Returns a testbox ID + SSH info. The VM is ready when the command returns.

Options:

--machine f1-standard-4       # bigger machine (default: f1-standard-2)
--os-image ubuntu2404          # different OS (default: ubuntu2204)
--duration 45m                 # longer session (default: 30m)
--idle-timeout 10m             # stop if no commands for N minutes (default: 30m)

2. Run commands (fast — rsync + SSH)

sem-ai testbox run --id <testbox-id> "go test ./..."
sem-ai testbox run --id <testbox-id> "make build"
sem-ai testbox run --id <testbox-id> "npm test"

Each run syncs only changed files (rsync checksum), then executes. After first sync, subsequent runs take 1-3 seconds for the sync.

3. Interactive SSH (optional)

sem-ai testbox ssh --id <testbox-id>

4. Stop when done

sem-ai testbox stop --id <testbox-id>

Or let it auto-expire after the duration/idle timeout.

Best practices for agents

Warm up immediately when starting a coding task that involves CI. Don't wait until tests need to run.
Reuse the testbox ID across multiple run commands. Don't create a new testbox per test run.
Route tests through testbox, not locally. The CI environment has the correct dependencies, services, and secrets.
After tests pass in testbox, push the code. CI will confirm via the real pipeline.
Stop the testbox when the task is complete to avoid unnecessary billing.

Typical agent loop

# Start of task
TESTBOX=$(sem-ai testbox warmup | jq -r '.testbox_id')   # --project auto-detected from origin (pass it only to override)

# Iterate on code
# ... make changes ...
sem-ai testbox run --id $TESTBOX "go test ./..."
# ... fix failures ...
sem-ai testbox run --id $TESTBOX "go test ./..."
# tests pass!

# After tests pass, push and verify in real CI — load the watch-after-push skill
# (it finds the run for your exact commit_sha and watches it).
git push

# Clean up
sem-ai testbox stop --id $TESTBOX