plugin-tests

star 1

Static-analysis test harness for Claude Code plugins. Triggers on "test plugin", "validate plugin", "check skill resolution", "run plugin tests", "lint plugin", "verify manifest", "namesake collision", "MCP registration check". Runs Python stdlib pytest scripts that catch namesake collisions, manifest drift, MCP misregistration, trigger-phrase coverage gaps, bridge pre-flight gaps, agent-surface drift, and cache-prune regressions. Routed as build-loop's 4th orchestrator mode (Build / Optimize / Research / Test).

tyroneross By tyroneross schedule Updated 6/4/2026

name: plugin-tests description: Static-analysis test harness for Claude Code plugins. Triggers on "test plugin", "validate plugin", "check skill resolution", "run plugin tests", "lint plugin", "verify manifest", "namesake collision", "MCP registration check". Runs Python stdlib pytest scripts that catch namesake collisions, manifest drift, MCP misregistration, trigger-phrase coverage gaps, bridge pre-flight gaps, agent-surface drift, and cache-prune regressions. Routed as build-loop's 4th orchestrator mode (Build / Optimize / Research / Test). version: 0.1.0 user-invocable: false

Plugin Tests — Static-Analysis Harness

A pytest-stdlib test suite that validates a Claude Code plugin's structure, manifest, MCP wiring, skill descriptions, and bridge preflights. Designed to catch the bug classes that have actually shipped through build-loop's history (namesake collisions, manifest version drift, missing frontmatter on commands, bare MCP server names).

This is the static tier — pure text/JSON validation, zero runtime dependencies, runs in under a second. Runtime tests (live MCP round-trip, real bridge delegation) are out of scope here and live in the orchestrator's Phase 4 Review-B Validate.

When this skill fires

  • User direct/build-loop:test [test-name] or any of: "test plugin", "validate plugin", "lint plugin", "verify manifest", "check skill resolution", "run plugin tests"
  • Orchestrator intent routing — TEST mode (4th mode alongside Build / Optimize / Research) — agents/build-orchestrator.md §Intent Routing
  • Orchestrator auto-dispatch in Phase 4 Review-B Validate (always on, since v0.7.3; advisory-not-blocking since v0.7.4) — when Phase 3 Execute's diff touches any plugin metadata path: *.claude-plugin/*.json, commands/*.md, skills/*/SKILL.md, agents/*.md, .mcp.json, hooks/hooks.json, or any path referenced by mcpServers. Findings are written to .build-loop/state.json.pluginTests and surfaced in the Review-F Report — they do NOT block the build or auto-route to Iterate. Static-analysis is heuristic (regex over markdown frontmatter); the user decides whether each finding is a real defect or a false positive.
  • Pre-publish gate when bumping a plugin version (the human-driven equivalent of the auto-dispatch — run /build-loop:test --strict before pushing)

What's tested

Script What it catches Reference defects
scripts/test_skill_resolution.py Namesake collisions (commands/X.md + skills/X/SKILL.md), frontmatter name: drift from dir name build-loop:build-loop collision (shipped through 6 versions undetected before 0.4.1)
scripts/test_plugin_manifest.py Required manifest fields, version sync between plugin.json and marketplace.json, MCP path resolution, every command has frontmatter, skill name uniqueness 0.4.0/0.3.2 marketplace drift; missing frontmatter on commands/debugger-detail.md
scripts/test_mcp_registration.py .mcp.json shape, referenced binaries exist, server-name hygiene (warns on bare names that collide across plugins) Historical debugger bare-name collision risk; Build Loop now skips this test because it does not ship an MCP server
scripts/test_trigger_phrases.py Curated (skill, phrase) coverage in skill description: fields Trigger-phrase gaps after the multi-provider auth audit
scripts/test_bridge_preflight.py Every *-bridge/SKILL.md has an availability/absence check + user-invocable: false api-registry-bridge missing user-invocable: false; bridges that hard-fail when their target plugin isn't installed
scripts/test_agent_surface_policy.py Codex public wrappers, Claude user-invocable flags, and host-neutral Cursor/AGENTS policy stay aligned Helper skills crowding Codex/ChatGPT # picker while Claude internals still need the full tree
scripts/test_prune_plugin_cache.py Host-aware stale cache pruning for Claude Code and Codex Old Build Loop versions staying visible after a new version is installed
scripts/test_prune_codex_plugin_cache.py Backward-compatible Codex-only cache pruning wrapper Existing codex:prune-cache workflow breaking after host-aware pruning was added

How to run

From the plugin repo root:

# Run all 8
for t in scripts/test_skill_resolution.py scripts/test_plugin_manifest.py \
         scripts/test_mcp_registration.py scripts/test_trigger_phrases.py \
         scripts/test_bridge_preflight.py scripts/test_agent_surface_policy.py \
         scripts/test_prune_plugin_cache.py scripts/test_prune_codex_plugin_cache.py; do
  echo "=== $t ==="
  python3 "$t"
done

# Or via the slash command
/build-loop:test

Each script is independent — running one without the others is fine. Each follows the build-loop pytest-stdlib convention from scripts/test_plan_verify.py (subprocess-against-sibling-script + tempfile fixtures + unittest.TestCase).

Exit codes

  • 0 — all tests pass
  • 1 — at least one test failed (advisory in Review-B; hard CI gate when invoked via /build-loop:test --strict or run directly in CI)
  • 2 — runner error (test script itself crashed)

The orchestrator records findings under .build-loop/state.json.pluginTests and surfaces a one-line summary in the Review-F Report. It does not gate the build on plugin-tests results. Why: static-analysis findings are heuristic (regex/pattern matches over markdown frontmatter) and false positives are common — 3 hit on initial rollout (legitimate user-invocable bridge, filesystem-based pre-flight, PRD-absence prose). A blocking gate that flaps gets disabled, which destroys the value. Surfacing as advisory keeps the signal alive while letting the user judge edge cases.

For pre-publish checks, run with --strict to make the CLI exit non-zero on any finding (without involving the orchestrator).

Adding new tests

When a new bug class ships through, add a test that would have caught it. Pattern:

  1. Write scripts/test_<bug_class>.py matching the existing pattern (stdlib only, unittest.TestCase, subprocess against any helper script)
  2. Add a row to the table in §"What's tested" above
  3. Add the script to the runner in §"How to run"
  4. Add a (skill, phrase) to EXPECTED_TRIGGERS in test_trigger_phrases.py if the new test covers a class that should be discoverable by user phrasing

Don't migrate to pytest, vitest, or Playwright. The stdlib pattern keeps the harness portable, zero-install, and CI-friendly. (See the testing survey at the head of KNOWN-ISSUES.md 2026-05-02 entry — IBR's vitest, example-app's Jest+Playwright, prompt-test-lab's Playwright stratification all have their place; for plugin metadata validation specifically, stdlib Python wins.)

What this skill does NOT do

  • Runtime testing (live MCP calls, actual Skill() invocation) — that's Review-B Validate's job, executed by the orchestrator with the live runtime
  • UI testing — for plugins that build UIs, use the orchestrator's build-loop-owned UI validation route (ui-validator, project-native tests, screenshots/static scanner). Dispatch IBR only when the user explicitly asks for IBR / Interface Built Right / .ibr-test.json.
  • Performance / Lighthouse — separate concern, not a plugin metadata issue
  • Cross-plugin integration — that's the bridge skills' job at runtime

Cross-references

  • agents/build-orchestrator.md §Intent Routing — TEST mode classification
  • commands/test.md — slash-command surface (/build-loop:test)
  • scripts/collision_scan.py — the static detector that test_skill_resolution.py wraps
  • KNOWN-ISSUES.md 2026-05-02 entry — testing survey across 13 projects that informed this design
  • agents/ui-validator.md and skills/build-loop/phases/ui-validation.md — default UI validation route. IBR's .ibr-test.json remains an explicit-only auxiliary format.

History

  • 2026-05-02 — initial release with 5 static-analysis scripts. Caught 2 real defects on first run (debugger-detail frontmatter, api-registry-bridge user-invocable). Pattern borrowed from build-loop's existing scripts/test_plan_verify.py + survey of test setups across 13 user projects.
Install via CLI
npx skills add https://github.com/tyroneross/build-loop --skill plugin-tests
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator