name: skillshare-cli-e2e-test description: >- Run isolated E2E tests in devcontainer from ai_docs/tests runbooks. Use this skill whenever the user asks to: run an E2E test, execute a test runbook, validate a feature end-to-end, create a new runbook, or test CLI behavior in isolation. If you need to run a multi-step CLI validation sequence (init → install → sync → verify), this is the skill — it handles ssenv isolation, flag verification, and structured reporting. Prefer this over ad-hoc docker exec sequences for any test that follows a runbook or needs reproducible isolation. argument-hint: "[runbook-name | new]" metadata: targets: [claude, universal]
Run isolated E2E tests in devcontainer. $ARGUMENTS specifies runbook name or "new".
Flow
Phase 0: Environment Check
Confirm devcontainer is running and get container ID:
CONTAINER=$(docker compose -f .devcontainer/docker-compose.yml ps -q skillshare-devcontainer)- If empty → prompt user:
docker compose -f .devcontainer/docker-compose.yml up -d - Ensure
CONTAINERis set for all subsequentdocker execcalls.
- If empty → prompt user:
Confirm Linux binary is available:
docker exec $CONTAINER bash -c \ '/workspace/.devcontainer/ensure-skillshare-linux-binary.sh && ss version'Confirm mdproof is installed:
docker exec $CONTAINER /workspace/.devcontainer/ensure-mdproof.shThis auto-installs from GitHub release, or falls back to
/workspace/bin/mdproof(local dev binary).Check for lessons learned from previous runs:
test -f /workspace/.mdproof/lessons-learned.md && cat /workspace/.mdproof/lessons-learned.mdIf the file exists, read it before writing or debugging runbooks — it contains known gotchas and assertion patterns.
Phase 1: Detect Scope
Preview all available runbooks via the container:
docker exec $CONTAINER mdproof --dry-run --report json /workspace/ai_docs/tests/This returns JSON with every runbook's steps, commands, and expected assertions — no manual markdown parsing needed. Use this to understand what each runbook covers.
Identify recent changes (unstaged + recent commits):
git diff --name-only HEAD~3Match changes to relevant runbooks (compare changed file paths against step commands in the JSON output).
Phase 2: Select Tests
Prompt user (via AskUserQuestion):
- Option A: Run existing runbook (list all available + mark those related to recent changes)
- Option B: Auto-generate new test script based on recent changes
- Option C: If $ARGUMENTS specifies a runbook, skip to Phase 3
Phase 3: Prepare & Execute
Running existing runbook:
Create isolated environment with auto-initialization:
ENV_NAME="e2e-$(date +%Y%m%d-%H%M%S)" # Use --init to automatically run 'ss init -g' with all targets docker exec $CONTAINER ssenv create "$ENV_NAME" --initExecute the entire runbook via mdproof inside the container:
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \ ssenv enter "$ENV_NAME" -- \ mdproof --report json \ /workspace/ai_docs/tests/<runbook_file>.mdmdproof executes each step (
bash -c <command>) in the ssenv-isolated HOME, then returns structured JSON:{ "version": "1", "runbook": "<runbook_file>.md", "duration_ms": 12345, "summary": { "total": 7, "passed": 5, "failed": 1, "skipped": 1 }, "steps": [ { "step": { "number": 1, "title": "...", "command": "...", "expected": ["..."] }, "status": "passed", // "passed" | "failed" | "skipped" "exit_code": 0, "stdout": "...", "stderr": "..." } ] }Analyze the JSON output:
- All passed → proceed to Phase 4
- Any failed → filter for failures only (full JSON can be too large for terminal output):
mdproof --report json runbook.md 2>&1 | jq '{ summary: .summary, failed: [.steps[] | select(.status == "failed") | { step: .step.number, title: .step.title, exit_code: .exit_code, failed_assertions: [.assertions[]? | select(.matched == false) | .pattern], stderr: (.stderr // "" | .[0:200]) }] }' - Skipped steps (executor=
manual) → these need manual verification, run them individually:docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \ ssenv enter "$ENV_NAME" -- <command from step.command>
For failed steps, debug individually using manual docker exec (same as before):
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \ ssenv enter "$ENV_NAME" -- bash -c '<failed step command>'- Prefer
--json+jqfor assertions — see the JSON Reference below
- Prefer
Generating new runbook:
- Read
git diff HEAD~3to find changed files incmd/skillshare/orinternal/ - Read changed files to understand new/modified functionality
- Validate all CLI flags before writing — for every
ss <command> <flag>in the runbook:- Grep
cmd/skillshare/<command>.gofor the exact flag string (e.g."--force") - Run
ss <command> --helpinside container if needed - Common mistakes to avoid:
uninstall --yes→ wrong, use--force/-finit --target <name>→ wrong,inithas no--targetflaginit -phas a completely separate flag set from globalinit— only supports--targets,--discover,--select,--mode,--dry-run. Global-only flags like--no-copy,--no-skill,--no-git,--all-targets,--forcedo NOT exist in project mode- Audit custom rules: disable by rule ID (e.g.
prompt-injection-0,prompt-injection-1), NOT pattern name (e.g.prompt-injection). Rule IDs are ininternal/audit/rules.yaml
- Grep
- Generate new runbook to
ai_docs/tests/<slug>_runbook.md, following existing conventions:- YAML-free, pure Markdown
- Has Scope, Environment, Steps (each with bash + Expected), Pass Criteria
- Use
jq:assertions in Expected blocks for JSON commands — e.g.- jq: .extras | length == 1. This is a native mdproof assertion type, NOT a bashjqpipe - Use
--json+jq -ein bash for inline verification within multi-command steps - Config idempotency — never bare
cat >> config.yaml; always prependsed -i '/^section:/,$d'to remove existing section first, or use CLI commands (ss extras init,ss extras remove --force) that handle duplicates - Check
ai_docs/tests/runbook.jsonfor project-level config (build, setup, teardown, step_setup, timeout) that affects all runbooks - Check
.mdproof/lessons-learned.mdfor known assertion patterns and gotchas
- Run the runbook quality checklist (see below) before executing
- Then execute the new runbook (same flow as above)
Phase 4: Cleanup & Report
Ask user before cleanup (via AskUserQuestion):
- Option A: Delete ssenv environment now
- Option B: Keep for manual debugging (print env name for later
ssenv delete)
If user chose Option A:
docker exec $CONTAINER ssenv delete "$ENV_NAME" --forceOutput summary (derived from the runbook JSON output):
── E2E Test Report ── Runbook: {runbook name} Env: {ENV_NAME} Duration: {duration_ms}ms Step 1: {title} PASS Step 2: {title} PASS Step 3: {title} FAIL ← exit_code={N}, stderr: {error detail} ... Result: {passed}/{total} passed ({skipped} skipped)All values come directly from mdproof's JSON output —
summary.passed,summary.total,steps[].step.title,steps[].status.If any FAIL → distinguish between runbook bug vs real bug:
- Runbook bug: wrong flag, wrong file path, stale assertion → fix runbook, re-run step
- Real bug: CLI misbehavior → analyze cause, provide fix suggestions
Retrospective — ask user (via AskUserQuestion):
Did you encounter any friction during this test run that the skill or runbook could handle better?
- Option A: Yes, improve e2e skill — review test friction (wrong flags, stale assertions, missing checklist items, unclear instructions), then update SKILL.md and/or runbooks
- Option B: Yes, but only fix the runbook — fix the specific runbook without changing the skill itself
- Option C: No, skip
Improvement targets:
- SKILL.md: add new checklist items, common-mistake examples, or rule clarifications learned from this run
- Runbooks: fix stale assertions (e.g. config.yaml → registry.yaml), wrong flags, outdated paths
- Both: when a systemic issue (e.g. a refactor changed file locations) affects both the skill's guidance and existing runbooks
Runbook Quality Checklist
Before executing a newly generated runbook, verify:
- All CLI flags exist — every
ss <cmd> --flagwas grep-verified against source -
--initinteraction — if runbook hasss init, account forssenv create --initalready initializing (add--forceto re-init, or skip init step) -
--initcreates default extras —ssenv create --initcreates arulesextra by default. Runbooks that assume an empty extras list must add cleanup first:ss extras remove rules --force -g 2>/dev/null || true+rm -rf ~/.claude/rules - Correct confirmation flags —
uninstalluses--force(not--yes);initre-run needs no flag (just fails gracefully) - Skill data in registry.yaml — assertions about installed skills check
registry.yaml, NOTconfig.yaml; config.yaml should never containskills: - File existence timing —
registry.yamlis only created after first install/reconcile, not onss init - Project mode paths — project commands use
.skillshare/not~/.config/skillshare/ - Project init flags —
init -ponly supports--targets,--discover,--select,--mode,--dry-run; global-only flags (--no-copy,--no-skill,--no-git,--all-targets,--force) are not available - Audit rule IDs — custom rules in
audit-rules.yamluse rule IDs (e.g.prompt-injection-0), not pattern names (e.g.prompt-injection). Verify IDs againstinternal/audit/rules.yaml - Use
--jsonfor assertions — if the command supports--json, use it withjqinstead of grepping human-readable output. Text output changes between versions; JSON structure is stable - Expected = actual substrings, NOT descriptions — the runbook assertion engine does case-insensitive substring matching. Write
- Installedor- cangjie-docs-navigator, NOT- Install completes without erroror- Output contains at least one skill. Negation: useNot <substring>prefix (e.g.- Not cangjie-docs-navigator) - Skill name ≠ repo name — after
ss install <repo>, the actual skill name may differ from the repo name (e.g. repocangjie-docs-mcp→ skillcangjie-docs-navigator). Always verify the installed skill name viass listbefore writing uninstall/check steps -
/tmp/cleanup — ssenv only isolates$HOME;/tmp/is shared across runs. Any step using/tmp/<path>must start withrm -rf /tmp/<path>to avoid stale state from previous runs -
echo > symlinkwrites through —echo "content" > pathwherepathis a symlink writes to the symlink's target, it does NOT replace the symlink with a real file. To create a local (non-managed) file at a symlinked path: either use a different filename, orrmthe symlink first thenecho -
cat >>is not idempotent — appending to config files (cat >> config.yaml) will duplicate sections on re-run. Preferss extras init(which validates duplicates) or full file replacement overcat >>when possible - Extras source path layout — extras use
~/.config/skillshare/extras/<name>/(not the legacy flat path~/.config/skillshare/<name>/). Symlink assertions must includeextras/in the path regex (e.g.regex: skillshare/extras/rules/tdd\.md) - Prefer
jq:overpython3 -c— for JSON output validation, use mdproof's nativejq:assertion type (e.g.- jq: .extras | length == 1) instead of piping topython3 -c. It's one line vs 10, and mdproof handles failure reporting automatically - Config append idempotency — when appending YAML sections with
cat >>, always prependsed -i '/^section_key:/,$d'to remove existing section. Or prefer CLI commands (ss extras init,ss extras remove --force) over manual config editing - Check lessons-learned — read
.mdproof/lessons-learned.mdbefore writing new runbooks for known gotchas and proven assertion patterns
Runbook Assertion Types
mdproof supports 6 assertion types under Expected: blocks. Use the most specific type for each check:
| Type | Syntax | When to use | Example |
|---|---|---|---|
| Substring | plain text | Simple output check | - hello world |
| Negated | Not/Should NOT prefix |
Verify absence | - Not FAIL |
| Exit code | exit_code: N |
Every step should have this | - exit_code: 0 |
| Regex | regex: prefix |
Pattern matching | - regex: v\d+\.\d+ |
| jq | jq: prefix |
JSON output (preferred) | - jq: .extras | length == 1 |
| Snapshot | snapshot: prefix |
Stable output comparison | - snapshot: api-response |
jq: best practices:
# Simple field check
- jq: .name == "rules"
# Array length
- jq: .extras | length == 3
# Sorted array comparison
- jq: [.extras[].name] | sort | . == ["a","b","c"]
# Null/missing field (omitempty)
- jq: .extras == null
# Nested access
- jq: .[0].targets[0].status == "synced"
# Boolean
- jq: .source_exists == true
Rules
- Always execute inside devcontainer — use
docker exec, never run CLI on host - Always use
ssenvfor HOME isolation — don't pollute container default HOME - Always create fresh ssenv environments — never reuse an environment from a previous run; stale config/state causes confusing cascade failures (e.g. duplicate YAML keys, "already exists" errors)
- ssenv only isolates
$HOME—/tmp/,/var/, and other system paths are shared across all environments. Runbook steps using/tmp/must includerm -rfcleanup at the start - Verify every step — never skip Expected checks
- Don't abort on failure — record FAIL, continue to next step, summarize at end
- Ask before cleanup — Phase 4 must prompt user before deleting ssenv environment
ss=skillshare— same binary in runbooks~= ssenv-isolated HOME —ssenv enterauto-setsHOME- Use
--init— simplify setup by usingssenv create <name> --init --initalready runs init — the env is pre-initialized; runbook steps callingss initagain will fail unless the step explicitly resets state first
ssenv Quick Reference
| Command | Purpose |
|---|---|
sshelp |
Show shortcuts and usage |
ssls |
List isolated environments |
ssnew <name> |
Create + enter isolated shell (interactive) |
ssuse <name> |
Enter existing isolated shell (interactive) |
ssback |
Leave isolated context |
ssenv enter <name> -- <cmd> |
Run single command in isolation (automation) |
- For interactive debugging:
ssnew <env>thenexitwhen done - For deterministic automation: prefer
ssenv enter <env> -- <command>one-liners
Test Command Policy
When running Go tests inside devcontainer (not via runbook):
# ssenv changes HOME, so always cd to /workspace first for Go test commands
cd /workspace
go build -o bin/skillshare ./cmd/skillshare
SKILLSHARE_TEST_BINARY="$PWD/bin/skillshare" go test ./tests/integration -count=1
go test ./...
Always run in devcontainer unless there is a documented exception.
Note: ssenv enter changes HOME, which may affect Go module resolution — always cd /workspace before running go test or go build.
--json Quick Reference
Most commands support --json for structured output, making assertions more reliable than text matching.
| Command | --json |
Notes |
|---|---|---|
ss status |
--json |
Skills, targets, sync status |
ss list |
--json / -j |
All skills with metadata |
ss target list |
--json |
Configured targets |
ss install <src> |
--json |
Implies --force --all (skip prompts) |
ss uninstall <name> |
--json |
Implies --force (skip prompts) |
ss collect <path> |
--json |
Implies --force (skip prompts) |
ss check |
--json |
Update availability per repo |
ss update |
--json |
Update results per skill |
ss diff |
--json |
Per-file diff details |
ss sync |
--json |
Sync stats per target |
ss audit |
--format json |
Also accepts --json (deprecated alias) |
ss log |
--json |
Raw JSONL (one object per line) |
Key behaviors:
--jsonthat implies--force/--allskips interactive prompts — safe for automation- Output goes to stdout only (progress/spinners suppressed)
auditprefers--format json;--jsonstill works but is the deprecated formlog --jsonoutputs JSONL (newline-delimited), not a JSON array
Assertion Patterns with jq
# Count installed skills
ss list --json | jq 'length'
# Check a specific skill exists
ss list --json | jq -e '.[] | select(.name == "my-skill")'
# Verify target is configured
ss target list --json | jq -e '.[] | select(.name == "claude")'
# Assert no critical audit findings
ss audit --format json | jq -e '.summary.critical == 0'
# Check update availability
ss check --json | jq -e '.tracked_repos | length > 0'
# Verify sync succeeded (zero errors)
ss sync --json | jq -e '.errors == 0'
# Install and verify result
ss install https://github.com/user/repo --json | jq -e '.skills | length > 0'
When a jq -e expression fails (exit code 1 = false, 5 = no output), the step FAILs — no ambiguous text matching needed.
Container Command Templates
# Single command
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- ss status
# JSON assertion (preferred for verification)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
ss list --json | jq -e ".[] | select(.name == \"my-skill\")"
'
# Multi-line compound command (use bash -c) — global mode flags
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
ss init --no-copy --all-targets --no-git --no-skill
ss status
'
# Project mode init (different flag set!)
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
ssenv enter "$ENV_NAME" -- bash -c '
cd /tmp/test-project && ss init -p --targets claude
'
# Check files (HOME is set to isolated path by ssenv)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
cat ~/.config/skillshare/config.yaml
'
# With environment variables
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
TARGET=~/.claude/skills
ls -la "$TARGET"
'
# Go tests (must cd /workspace because ssenv changes HOME)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
cd /workspace
go test ./internal/install -run TestParseSource -count=1
'
Relationship with /mdproof Skill
This skill (/cli-e2e-test) and the /mdproof skill are complementary, not competing:
| Concern | /cli-e2e-test |
/mdproof |
|---|---|---|
| Scope | Skillshare project-specific E2E | General-purpose runbook authoring |
| Infrastructure | Devcontainer, ssenv, binary build | None — format and assertions only |
| Config | ai_docs/tests/runbook.json (build, setup, teardown) |
Assertion types, snapshot, coverage |
| Lessons | Checklist items, CLI flag gotchas | .mdproof/lessons-learned.md |
| When | Running or debugging a test | Writing or improving a runbook |
How they work together
- Writing a new runbook → invoke
/mdprooffirst for format guidance (assertion types,jq:patterns, snapshot usage), then/cli-e2e-testto execute it in isolation - Improving existing runbooks → invoke
/mdprooffor assertion quality review (python3 → jq:, idempotency), then/cli-e2e-testto verify changes pass - Debugging failures →
/cli-e2e-testPhase 3 step 4 handles manual docker exec;/mdprooflessons-learned captures recurring patterns - After a test run →
/mdproofSelf-Learning section guides recording discoveries to.mdproof/lessons-learned.md
Rule of thumb
- Need to run tests or debug in devcontainer? →
/cli-e2e-test - Need to write assertions or improve runbook quality? →
/mdproof - User says "run extras E2E" →
/cli-e2e-test - User says "improve runbook assertions" →
/mdproofthen/cli-e2e-testto verify