skillshare-cli-e2e-test

name: skillshare-cli-e2e-test description: >- Run isolated E2E tests in devcontainer from ai_docs/tests runbooks. Use this skill whenever the user asks to: run an E2E test, execute a test runbook, validate a feature end-to-end, create a new runbook, or test CLI behavior in isolation. If you need to run a multi-step CLI validation sequence (init → install → sync → verify), this is the skill — it handles ssenv isolation, flag verification, and structured reporting. Prefer this over ad-hoc docker exec sequences for any test that follows a runbook or needs reproducible isolation. argument-hint: "[runbook-name | new]" metadata: targets: [claude, universal]

Run isolated E2E tests in devcontainer. $ARGUMENTS specifies runbook name or "new".

Flow

Phase 0: Environment Check

Confirm devcontainer is running and get container ID:
```
CONTAINER=$(docker compose -f .devcontainer/docker-compose.yml ps -q skillshare-devcontainer)
```
- If empty → prompt user: docker compose -f .devcontainer/docker-compose.yml up -d
- Ensure CONTAINER is set for all subsequent docker exec calls.

Confirm Linux binary is available:

docker exec $CONTAINER bash -c \
  '/workspace/.devcontainer/ensure-skillshare-linux-binary.sh && ss version'

Confirm mdproof is installed:
```
docker exec $CONTAINER /workspace/.devcontainer/ensure-mdproof.sh
```
This auto-installs from GitHub release, or falls back to /workspace/bin/mdproof (local dev binary).
Check for lessons learned from previous runs:
```
test -f /workspace/.mdproof/lessons-learned.md && cat /workspace/.mdproof/lessons-learned.md
```
If the file exists, read it before writing or debugging runbooks — it contains known gotchas and assertion patterns.

Phase 1: Detect Scope

Preview all available runbooks via the container:
```
docker exec $CONTAINER mdproof --dry-run --report json /workspace/ai_docs/tests/
```
This returns JSON with every runbook's steps, commands, and expected assertions — no manual markdown parsing needed. Use this to understand what each runbook covers.
Identify recent changes (unstaged + recent commits):
```
git diff --name-only HEAD~3
```
Match changes to relevant runbooks (compare changed file paths against step commands in the JSON output).

Phase 2: Select Tests

Prompt user (via AskUserQuestion):

Option A: Run existing runbook (list all available + mark those related to recent changes)
Option B: Auto-generate new test script based on recent changes
Option C: If $ARGUMENTS specifies a runbook, skip to Phase 3

Phase 3: Prepare & Execute

Running existing runbook:

Create isolated environment with auto-initialization:

ENV_NAME="e2e-$(date +%Y%m%d-%H%M%S)"

# Use --init to automatically run 'ss init -g' with all targets
docker exec $CONTAINER ssenv create "$ENV_NAME" --init

Execute the entire runbook via mdproof inside the container:

docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
  ssenv enter "$ENV_NAME" -- \
  mdproof --report json \
  /workspace/ai_docs/tests/<runbook_file>.md

mdproof executes each step (bash -c <command>) in the ssenv-isolated HOME, then returns structured JSON:

{
  "version": "1",
  "runbook": "<runbook_file>.md",
  "duration_ms": 12345,
  "summary": { "total": 7, "passed": 5, "failed": 1, "skipped": 1 },
  "steps": [
    {
      "step": { "number": 1, "title": "...", "command": "...", "expected": ["..."] },
      "status": "passed",    // "passed" | "failed" | "skipped"
      "exit_code": 0,
      "stdout": "...",
      "stderr": "..."
    }
  ]
}

Analyze the JSON output:

All passed → proceed to Phase 4

Any failed → filter for failures only (full JSON can be too large for terminal output):

mdproof --report json runbook.md 2>&1 | jq '{
  summary: .summary,
  failed: [.steps[] | select(.status == "failed") | {
    step: .step.number, title: .step.title,
    exit_code: .exit_code,
    failed_assertions: [.assertions[]? | select(.matched == false) | .pattern],
    stderr: (.stderr // "" | .[0:200])
  }]
}'

Skipped steps (executor=manual) → these need manual verification, run them individually:

docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
  ssenv enter "$ENV_NAME" -- <command from step.command>

For failed steps, debug individually using manual docker exec (same as before):
```
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
  ssenv enter "$ENV_NAME" -- bash -c '<failed step command>'
```
- Prefer --json + jq for assertions — see the JSON Reference below

Generating new runbook:

Read git diff HEAD~3 to find changed files in cmd/skillshare/ or internal/
Read changed files to understand new/modified functionality
Validate all CLI flags before writing — for every ss <command> <flag> in the runbook:
- Grep cmd/skillshare/<command>.go for the exact flag string (e.g. "--force")
- Run ss <command> --help inside container if needed
- Common mistakes to avoid:
  - uninstall --yes → wrong, use --force / -f
  - init --target <name> → wrong, init has no --target flag
  - init -p has a completely separate flag set from global init — only supports --targets, --discover, --select, --mode, --dry-run. Global-only flags like --no-copy, --no-skill, --no-git, --all-targets, --force do NOT exist in project mode
  - Audit custom rules: disable by rule ID (e.g. prompt-injection-0, prompt-injection-1), NOT pattern name (e.g. prompt-injection). Rule IDs are in internal/audit/rules.yaml
Generate new runbook to ai_docs/tests/<slug>_runbook.md, following existing conventions:
- YAML-free, pure Markdown
- Has Scope, Environment, Steps (each with bash + Expected), Pass Criteria
- Use jq: assertions in Expected blocks for JSON commands — e.g. - jq: .extras | length == 1. This is a native mdproof assertion type, NOT a bash jq pipe
- Use --json + jq -e in bash for inline verification within multi-command steps
- Config idempotency — never bare cat >> config.yaml; always prepend sed -i '/^section:/,$d' to remove existing section first, or use CLI commands (ss extras init, ss extras remove --force) that handle duplicates
- Check ai_docs/tests/runbook.json for project-level config (build, setup, teardown, step_setup, timeout) that affects all runbooks
- Check .mdproof/lessons-learned.md for known assertion patterns and gotchas
Run the runbook quality checklist (see below) before executing
Then execute the new runbook (same flow as above)

Phase 4: Cleanup & Report

Ask user before cleanup (via AskUserQuestion):
- Option A: Delete ssenv environment now
- Option B: Keep for manual debugging (print env name for later ssenv delete)

If user chose Option A:

docker exec $CONTAINER ssenv delete "$ENV_NAME" --force

Output summary (derived from the runbook JSON output):

── E2E Test Report ──

Runbook:  {runbook name}
Env:      {ENV_NAME}
Duration: {duration_ms}ms

Step 1: {title}  PASS
Step 2: {title}  PASS
Step 3: {title}  FAIL ← exit_code={N}, stderr: {error detail}
...

Result: {passed}/{total} passed ({skipped} skipped)

All values come directly from mdproof's JSON output — summary.passed, summary.total, steps[].step.title, steps[].status.

If any FAIL → distinguish between runbook bug vs real bug:
- Runbook bug: wrong flag, wrong file path, stale assertion → fix runbook, re-run step
- Real bug: CLI misbehavior → analyze cause, provide fix suggestions
Retrospective — ask user (via AskUserQuestion):

Did you encounter any friction during this test run that the skill or runbook could handle better?
- Option A: Yes, improve e2e skill — review test friction (wrong flags, stale assertions, missing checklist items, unclear instructions), then update SKILL.md and/or runbooks
- Option B: Yes, but only fix the runbook — fix the specific runbook without changing the skill itself
- Option C: No, skip
Improvement targets:
- SKILL.md: add new checklist items, common-mistake examples, or rule clarifications learned from this run
- Runbooks: fix stale assertions (e.g. config.yaml → registry.yaml), wrong flags, outdated paths
- Both: when a systemic issue (e.g. a refactor changed file locations) affects both the skill's guidance and existing runbooks

Runbook Quality Checklist

Before executing a newly generated runbook, verify:

All CLI flags exist — every ss <cmd> --flag was grep-verified against source
--init interaction — if runbook has ss init, account for ssenv create --init already initializing (add --force to re-init, or skip init step)
--init creates default extras — ssenv create --init creates a rules extra by default. Runbooks that assume an empty extras list must add cleanup first: ss extras remove rules --force -g 2>/dev/null || true + rm -rf ~/.claude/rules
Correct confirmation flags — uninstall uses --force (not --yes); init re-run needs no flag (just fails gracefully)
Skill data in registry.yaml — assertions about installed skills check registry.yaml, NOT config.yaml; config.yaml should never contain skills:
File existence timing — registry.yaml is only created after first install/reconcile, not on ss init
Project mode paths — project commands use .skillshare/ not ~/.config/skillshare/
Project init flags — init -p only supports --targets, --discover, --select, --mode, --dry-run; global-only flags (--no-copy, --no-skill, --no-git, --all-targets, --force) are not available
Audit rule IDs — custom rules in audit-rules.yaml use rule IDs (e.g. prompt-injection-0), not pattern names (e.g. prompt-injection). Verify IDs against internal/audit/rules.yaml
Use --json for assertions — if the command supports --json, use it with jq instead of grepping human-readable output. Text output changes between versions; JSON structure is stable
Expected = actual substrings, NOT descriptions — the runbook assertion engine does case-insensitive substring matching. Write - Installed or - cangjie-docs-navigator, NOT - Install completes without error or - Output contains at least one skill. Negation: use Not <substring> prefix (e.g. - Not cangjie-docs-navigator)
Skill name ≠ repo name — after ss install <repo>, the actual skill name may differ from the repo name (e.g. repo cangjie-docs-mcp → skill cangjie-docs-navigator). Always verify the installed skill name via ss list before writing uninstall/check steps
/tmp/ cleanup — ssenv only isolates $HOME; /tmp/ is shared across runs. Any step using /tmp/<path> must start with rm -rf /tmp/<path> to avoid stale state from previous runs
echo > symlink writes through — echo "content" > path where path is a symlink writes to the symlink's target, it does NOT replace the symlink with a real file. To create a local (non-managed) file at a symlinked path: either use a different filename, or rm the symlink first then echo
cat >> is not idempotent — appending to config files (cat >> config.yaml) will duplicate sections on re-run. Prefer ss extras init (which validates duplicates) or full file replacement over cat >> when possible
Extras source path layout — extras use ~/.config/skillshare/extras/<name>/ (not the legacy flat path ~/.config/skillshare/<name>/). Symlink assertions must include extras/ in the path regex (e.g. regex: skillshare/extras/rules/tdd\.md)
Prefer jq: over python3 -c — for JSON output validation, use mdproof's native jq: assertion type (e.g. - jq: .extras | length == 1) instead of piping to python3 -c. It's one line vs 10, and mdproof handles failure reporting automatically
Config append idempotency — when appending YAML sections with cat >>, always prepend sed -i '/^section_key:/,$d' to remove existing section. Or prefer CLI commands (ss extras init, ss extras remove --force) over manual config editing
Check lessons-learned — read .mdproof/lessons-learned.md before writing new runbooks for known gotchas and proven assertion patterns

Runbook Assertion Types

mdproof supports 6 assertion types under Expected: blocks. Use the most specific type for each check:

Type	Syntax	When to use	Example
Substring	plain text	Simple output check	`- hello world`
Negated	`Not`/`Should NOT` prefix	Verify absence	`- Not FAIL`
Exit code	`exit_code: N`	Every step should have this	`- exit_code: 0`
Regex	`regex:` prefix	Pattern matching	`- regex: v\d+\.\d+`
jq	`jq:` prefix	JSON output (preferred)	`- jq: .extras \| length == 1`
Snapshot	`snapshot:` prefix	Stable output comparison	`- snapshot: api-response`

jq: best practices:

# Simple field check
- jq: .name == "rules"

# Array length
- jq: .extras | length == 3

# Sorted array comparison
- jq: [.extras[].name] | sort | . == ["a","b","c"]

# Null/missing field (omitempty)
- jq: .extras == null

# Nested access
- jq: .[0].targets[0].status == "synced"

# Boolean
- jq: .source_exists == true

Rules

Always execute inside devcontainer — use docker exec, never run CLI on host
Always use ssenv for HOME isolation — don't pollute container default HOME
Always create fresh ssenv environments — never reuse an environment from a previous run; stale config/state causes confusing cascade failures (e.g. duplicate YAML keys, "already exists" errors)
ssenv only isolates $HOME — /tmp/, /var/, and other system paths are shared across all environments. Runbook steps using /tmp/ must include rm -rf cleanup at the start
Verify every step — never skip Expected checks
Don't abort on failure — record FAIL, continue to next step, summarize at end
Ask before cleanup — Phase 4 must prompt user before deleting ssenv environment
ss = skillshare — same binary in runbooks
~ = ssenv-isolated HOME — ssenv enter auto-sets HOME
Use --init — simplify setup by using ssenv create <name> --init
--init already runs init — the env is pre-initialized; runbook steps calling ss init again will fail unless the step explicitly resets state first

ssenv Quick Reference

Command	Purpose
`sshelp`	Show shortcuts and usage
`ssls`	List isolated environments
`ssnew <name>`	Create + enter isolated shell (interactive)
`ssuse <name>`	Enter existing isolated shell (interactive)
`ssback`	Leave isolated context
`ssenv enter <name> -- <cmd>`	Run single command in isolation (automation)

For interactive debugging: ssnew <env> then exit when done
For deterministic automation: prefer ssenv enter <env> -- <command> one-liners

Test Command Policy

When running Go tests inside devcontainer (not via runbook):

# ssenv changes HOME, so always cd to /workspace first for Go test commands
cd /workspace
go build -o bin/skillshare ./cmd/skillshare
SKILLSHARE_TEST_BINARY="$PWD/bin/skillshare" go test ./tests/integration -count=1
go test ./...

Always run in devcontainer unless there is a documented exception. Note: ssenv enter changes HOME, which may affect Go module resolution — always cd /workspace before running go test or go build.

`--json` Quick Reference

Most commands support --json for structured output, making assertions more reliable than text matching.

Command	`--json`	Notes
`ss status`	`--json`	Skills, targets, sync status
`ss list`	`--json` / `-j`	All skills with metadata
`ss target list`	`--json`	Configured targets
`ss install <src>`	`--json`	Implies `--force --all` (skip prompts)
`ss uninstall <name>`	`--json`	Implies `--force` (skip prompts)
`ss collect <path>`	`--json`	Implies `--force` (skip prompts)
`ss check`	`--json`	Update availability per repo
`ss update`	`--json`	Update results per skill
`ss diff`	`--json`	Per-file diff details
`ss sync`	`--json`	Sync stats per target
`ss audit`	`--format json`	Also accepts `--json` (deprecated alias)
`ss log`	`--json`	Raw JSONL (one object per line)

Key behaviors:

--json that implies --force / --all skips interactive prompts — safe for automation
Output goes to stdout only (progress/spinners suppressed)
audit prefers --format json; --json still works but is the deprecated form
log --json outputs JSONL (newline-delimited), not a JSON array

Assertion Patterns with `jq`

# Count installed skills
ss list --json | jq 'length'

# Check a specific skill exists
ss list --json | jq -e '.[] | select(.name == "my-skill")'

# Verify target is configured
ss target list --json | jq -e '.[] | select(.name == "claude")'

# Assert no critical audit findings
ss audit --format json | jq -e '.summary.critical == 0'

# Check update availability
ss check --json | jq -e '.tracked_repos | length > 0'

# Verify sync succeeded (zero errors)
ss sync --json | jq -e '.errors == 0'

# Install and verify result
ss install https://github.com/user/repo --json | jq -e '.skills | length > 0'

When a jq -e expression fails (exit code 1 = false, 5 = no output), the step FAILs — no ambiguous text matching needed.

Container Command Templates

# Single command
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- ss status

# JSON assertion (preferred for verification)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
  ss list --json | jq -e ".[] | select(.name == \"my-skill\")"
'

# Multi-line compound command (use bash -c) — global mode flags
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
  ss init --no-copy --all-targets --no-git --no-skill
  ss status
'

# Project mode init (different flag set!)
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
  ssenv enter "$ENV_NAME" -- bash -c '
  cd /tmp/test-project && ss init -p --targets claude
'

# Check files (HOME is set to isolated path by ssenv)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
  cat ~/.config/skillshare/config.yaml
'

# With environment variables
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
  TARGET=~/.claude/skills
  ls -la "$TARGET"
'

# Go tests (must cd /workspace because ssenv changes HOME)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
  cd /workspace
  go test ./internal/install -run TestParseSource -count=1
'

Relationship with `/mdproof` Skill

This skill (/cli-e2e-test) and the /mdproof skill are complementary, not competing:

Concern	`/cli-e2e-test`	`/mdproof`
Scope	Skillshare project-specific E2E	General-purpose runbook authoring
Infrastructure	Devcontainer, ssenv, binary build	None — format and assertions only
Config	`ai_docs/tests/runbook.json` (build, setup, teardown)	Assertion types, snapshot, coverage
Lessons	Checklist items, CLI flag gotchas	`.mdproof/lessons-learned.md`
When	Running or debugging a test	Writing or improving a runbook

How they work together

Writing a new runbook → invoke /mdproof first for format guidance (assertion types, jq: patterns, snapshot usage), then /cli-e2e-test to execute it in isolation
Improving existing runbooks → invoke /mdproof for assertion quality review (python3 → jq:, idempotency), then /cli-e2e-test to verify changes pass
Debugging failures → /cli-e2e-test Phase 3 step 4 handles manual docker exec; /mdproof lessons-learned captures recurring patterns
After a test run → /mdproof Self-Learning section guides recording discoveries to .mdproof/lessons-learned.md

Rule of thumb

Need to run tests or debug in devcontainer? → /cli-e2e-test
Need to write assertions or improve runbook quality? → /mdproof
User says "run extras E2E" → /cli-e2e-test
User says "improve runbook assertions" → /mdproof then /cli-e2e-test to verify