test-docs - SKILL.md Agent Skill

name: test-docs description: > Test ToolHive documentation by executing the steps in tutorials and guides against a real environment, verifying that examples are correct and ToolHive has not regressed. Use when the user asks to test, validate, or verify documentation such as: "test the vault integration tutorial", "verify the K8s quickstart", "check the CLI install docs", "test toolhive vault integration in kubernetes", or any request to run through a doc's instructions to confirm they work. Supports both Kubernetes-based docs (tutorials, K8s guides) and CLI-based docs.

Test Docs

Test ToolHive documentation by running each step against a live environment, reporting pass/fail with evidence, and recommending fixes for failures.

Workflow

Find the documentation - launch an Explore agent to locate the relevant doc file
Read and parse - extract testable steps (bash code blocks, YAML manifests, expected outputs)
Check prerequisites - run scripts/check-prereqs.sh; confirm the thv version with the user
Prepare the environment - create a dedicated test namespace unless the doc specifies otherwise
Execute steps - run each step sequentially, capture output
Report results - pass/fail per step with evidence
Clean up - delete all resources created during testing
Recommend fixes - classify failures as doc issues or ToolHive bugs

Step 1: Find the documentation

If the user provides a file path to the doc, use it directly. Assume you are running in the repository that contains the file.

Otherwise, use the Task tool with subagent_type=Explore to search the docs directory for the page matching the user's request. Documentation lives under:

docs/toolhive/tutorials/ - step-by-step tutorials
docs/toolhive/guides-cli/ - CLI how-to guides
docs/toolhive/guides-k8s/ - Kubernetes how-to guides
docs/toolhive/guides-mcp/ - MCP server guides
docs/toolhive/guides-vmcp/ - Virtual MCP Server guides
docs/toolhive/guides-registry/ - registry guides

All doc files use the .mdx extension. Search by keywords from the user's request. If multiple docs match, ask the user which one to test.

Step 2: Read and parse the doc

Read the full doc file. Extract an ordered list of testable steps:

Bash code blocks (\``bash`) - commands to execute
YAML code blocks with title="*.yaml" - manifests to apply with kubectl apply
Expected outputs - text code blocks (\``text`) that follow a command, or prose describing expected behavior ("You should see...")

Build a test plan: a numbered list of steps, each with:

The command(s) to run
What constitutes a pass (expected output, exit code 0, resource created)
Any dependencies on previous steps (variables, resources)

Present the test plan to the user for approval before executing.

Handling placeholders

Docs often contain placeholder values like ghp_your_github_token_here or your-org. Before executing, scan for obvious placeholders (ALL_CAPS patterns, "your-*", "example-*", "replace-*") and ask the user for real values. If a step requires a secret that the user cannot provide, mark it as skipped with a note explaining why.

Step 3: Check prerequisites

Run the prerequisites checker:

bash <skill-path>/scripts/check-prereqs.sh

Where <skill-path> is the absolute path to this skill's directory.

If the script reports errors, inform the user and stop. For K8s docs, the script checks for a kind cluster named "toolhive". For CLI docs, confirm with the user that the installed thv version matches what the doc expects.

Step 4: Prepare the environment

For Kubernetes docs, ask the user about existing infrastructure before deploying anything:

Operator CRDs and operator: if the doc includes steps to install CRDs or the ToolHive operator, ask the user whether these are already installed. If so, skip those installation steps and proceed to the doc-specific content. Use AskUserQuestion with options like "Already installed", "Install fresh", "Reinstall (upgrade)".
Namespaces: create a dedicated namespace test-docs-<timestamp> (e.g., test-docs-1706886400) unless the doc explicitly uses a specific namespace (like toolhive-system). If the doc requires toolhive-system, use it but track all resources created for cleanup.

For CLI docs:

Use a temporary directory for any files created
Note the state of running MCP servers before testing (thv list)
Use unique server names: when running thv run <server>, always use --name <server>-test to avoid conflicts with existing servers. The default name is derived from the registry entry, which may already be running. Example: thv run --name osv-test osv
Verify registry configuration: if a registry server isn't found, confirm with the user that they're using the default registry (not a custom registry configuration)

Step 5: Execute steps

Run each step sequentially. For each step:

Print the step number and a short description
Run the command(s)
Capture stdout, stderr, and exit code
Compare against expected output or success criteria
Record pass/fail with evidence

Execution rules

Wait for readiness: when a step deploys resources, wait for them (e.g., kubectl wait --for=condition=ready) before proceeding. If the doc includes a wait command, use it. Otherwise add a reasonable wait (up to 120s for pods).
Variable propagation: some steps set shell variables used by later steps (e.g., VAULT_POD=$(kubectl get pods ...)). Maintain these across steps by running in the same shell session.
Skip non-testable steps: informational code blocks (showing file contents, expected output) are not commands to run. Use them as verification targets instead.
Timeout: if any command hangs for more than 5 minutes, kill it and mark the step as failed with a timeout note.

Known pitfalls

Server name conflicts: thv run <server> uses the registry entry name by default, which may already exist. Always use thv run --name <server>-test <server> to create a uniquely-named instance. This avoids "workload with name already exists" errors.
Port-forward needs time: when testing via kubectl port-forward, wait at least 5 seconds before issuing curl. Start the port-forward in the background, sleep 5, then curl. Kill the port-forward PID after the check.
MCPServer status field: the MCPServer CRD uses .status.phase (not .status.state) for the running state. When polling for readiness, use: kubectl get mcpservers <name> -n <ns> -o jsonpath='{.status.phase}' or simply poll kubectl get mcpservers and check the STATUS column.
MCPServer readiness polling: after kubectl apply for an MCPServer, it may take 30-60 seconds to reach Running. Poll with 5-second intervals up to 120 seconds before declaring a timeout.

Step 6: Report results

After all steps complete, produce a summary table:

## Test Results: <doc title>

| Step | Description          | Result | Notes                    |
|------|----------------------|--------|--------------------------|
| 1    | Install Vault        | PASS   |                          |
| 2    | Configure auth       | PASS   |                          |
| 3    | Store secrets        | PASS   |                          |
| 4    | Deploy MCPServer     | FAIL   | Pod CrashLoopBackOff    |
| 5    | Verify integration   | SKIP   | Depends on step 4       |

For each FAIL, include:

The command that failed
Actual output vs expected output
A classification:
- Doc issue: the command or example in the doc is wrong or outdated (e.g., wrong image tag, deprecated flag, missing step)
- ToolHive bug: the doc appears correct but ToolHive behaves unexpectedly (e.g., crash, wrong output from a correct command)
- Environment issue: missing prerequisite, network problem, or transient error
A specific recommendation (e.g., "Update image tag from v0.7.0 to v0.8.2" or "File a bug: MCPServer pods fail to start when Vault annotations are present")

Step 7: Clean up

Always clean up after testing:

Delete test namespaces: kubectl delete namespace <test-namespace>
Remove Helm releases installed during testing
Delete any temporary files or directories
If resources were created in toolhive-system, delete them individually by name

If cleanup fails, report what could not be cleaned up so the user can handle it manually.

Example session

User: "test the vault integration tutorial"

Explore agent finds docs/toolhive/tutorials/vault-integration.mdx
Parse the doc: 5 major steps with 12 bash commands and 1 YAML manifest
Run check-prereqs.sh - kind cluster "toolhive" exists, thv v0.8.2
Ask user: "The doc uses a placeholder GitHub token ghp_your_github_token_here. Provide a real token or skip step 2?"
Present test plan, user approves
Execute steps 1-5, report results
Clean up: delete vault namespace, remove MCPServer resource
Summary: 4/5 PASS, 1 FAIL with recommendation