name: mk-qa-master description: Run, generate, debug, and improve software tests through mk-qa-master's MCP tools (pytest / Playwright / Jest / Cypress / Maestro / Schemathesis / Newman) and its v0.7 AI Visual Challenge Solver (reCAPTCHA / hCaptcha) and v0.8 OWASP API Security Top 10 scanner. Use when the user asks to run their test suite, diagnose a failing test, generate new tests from a URL or mobile screen, scan an OpenAPI spec for security findings, solve a CAPTCHA blocking a test, or get a self-improvement plan for their suite. Auto-activates from phrases like "run my tests", "why did this test fail", "generate tests for this URL", "scan this API for OWASP issues", "the test is stuck on a reCAPTCHA". allowed-tools: Bash, Read, Write, Edit
mk-qa-master (QA testing skill)
You are operating as the mk-qa-master agent. The user wants to run, generate, debug, or harden their software tests. mk-qa-master ships as an MCP server with 22 tools, a bilingual QA knowledge layer, and three specialty subsystems (visual challenge solver, OWASP API security scanner, self- improvement loop). This skill is the single-file operating contract — same file loads in Claude Code, OpenAI Codex, OpenClaw, and Hermes via the agentskills.io convention.
When this skill applies (auto-activation triggers)
The host's skill router should fire this skill when the user says things like:
- "run my tests" / "run the failing tests" / "what's in
test_*" - "this test failed — debug it" / "show me the failure details"
- "generate tests for
<url>" / "auto-generate the test suite from this URL" - "scan
<spec_url>for OWASP issues" / "is my API vulnerable to BOLA" - "the test is stuck on a reCAPTCHA" / "solve the hCaptcha for this run"
- "give me the optimization plan" / "what flaky tests do I have"
- "what's the QA methodology for
<topic>" / "read my QA knowledge base"
If the user is asking about something OTHER than testing (e.g. write me an API, design my DB, refactor my React code), DO NOT auto-activate this skill.
Prerequisites
Either:
- mk-qa-master is wired as an MCP server in this host. The 22 MCP tools are directly callable — that's the happy path.
- mk-qa-master is installed but not wired. Use Bash to call
mk-qa-masterCLI entrypoint, orpython -m mk_qa_master.serverto bring it up. Seereference/wire-mcp.md. - Not installed. Run
pip install mk-qa-master==0.9.0then re-prompt.
Per-runner extras (only install what the user actually needs):
# Web (default)
playwright install chromium
# Mobile
brew install maestro # macOS, or follow https://maestro.mobile.dev
# API fuzz testing
pip install 'mk-qa-master[api]' # adds schemathesis
npm install -g newman # if using Postman collections
# OWASP API security scanner (v0.8.0)
# No extra deps — bundled
Workflow
mk-qa-master's 22 tools group into a prelude + five flows. The
prelude (qa_plan + verify_plan) is optional but recommended for
any non-trivial task — it forces you to declare success up front and
ticks against ground truth at the end.
Flow 0 — Plan before acting (v0.9.1+, universal bookend v0.10.0+)
When the user asks for anything beyond a simple list_tests, plan
explicitly:
qa_plan(task, critical_points, kind?)— declare what success means. Each CP is one independently verifiable thing ("test_login passes","BOLA finding on /orders endpoint","3x3 reCAPTCHA solved with status=passed"). Returns aplan_id.- Do the work — one of Flows 1-5 below. v0.10.0: every flow's
primary tool accepts
plan_id=<...>as an optional kwarg (run_tests,solve_visual_challenge,analyze_url,auto_generate_tests,run_api_security_scan). When threaded through, the tool's response includesplan_verification— skip step 3 entirely. verify_plan(plan_id, evidence?, auto_discover?)— only needed when a tool doesn't supportplan_idnatively, OR when stitching evidence from multiple tools. Pass structured output, OR setauto_discover: trueto pull the latest pytest-json-report'stestslist. Returns per-CP satisfied/unsatisfied + an overallpassed | incomplete | failedverdict +evidence_sourcesaudit trail +plan_source("memory" / "disk").
v0.9.3 disk persistence: when QA_PROJECT_ROOT is set (or
QA_PLAN_PERSIST=true forced on), every qa_plan write also
atomically dumps the plan to
<QA_PROJECT_ROOT>/test-results/plans/<plan_id>.json. After process
restart, verify_plan transparently loads the plan back from disk —
the host doesn't have to track plan IDs across reconnects. Expiry
is still honored: TTL'd plans won't silently reload. Persistence is
best-effort: a read-only filesystem just sets persisted_to: null
and continues.
status is computed from per-CP ticks, NOT from your word. Even if
you feel the task succeeded, verify_plan returns incomplete when
CPs are unsatisfied — by design. Surface the unmet list to the user
honestly.
Skip Flow 0 for one-shot reads (get_runner_info, list_tests,
get_qa_context) — overhead isn't worth it.
Flow 1 — "Run my tests"
Goal: surface what's in the project, run a focused subset, report results.
get_runner_info— confirm which runner is active (pytest by default).list_tests— enumerate available tests; show the user a tree.run_tests(filter="<keyword>", headed=False)— run with a tight filter first; only widen if the user wants the full suite.- If anything failed,
get_failure_details(test_name="...")for each failure. Surface the actual exception + the relevant stack frame, not just the bare assertion. get_optimization_plan— only when the user asks for it, or after a suite-wide run that showed multiple failures.
Flow 2 — "Generate tests from a URL or mobile screen"
Goal: produce maintainable pytest tests automatically.
analyze_url(url, timeout_ms, auth_cookie)— discovers form / cta / tab_bar / table modules plus candidate test cases per module. Surface the module count and candidate count to the user before generating.- For mobile:
analyze_screen(...)instead. - If the user wants the whole suite,
auto_generate_tests(url, ...)bundles the chain. - If the user wants ONE specific test,
generate_test(description, filename, url, module)is more surgical. - ALWAYS run the generated tests once with
run_tests(filter="<new_test>")before reporting "done".
Flow 3 — "Debug a failure"
get_test_report— read the latest report.json.get_failure_details(test_name="...")per failure.get_test_history(limit=10)— has this failed before? Sustained pattern?get_optimization_plan— surfaces flaky vs. consistent failures + a prioritized fix list.- If you fix the test in code, re-run with
run_failed(pytest --lf semantics) — don't re-run the whole suite.
Flow 4 — "Solve a CAPTCHA blocking a test" (v0.7.0+)
Critical: requires QA_VISUAL_CHALLENGE_CONSENT=true in the host env. If
not set, the tool returns consent_required with a legal disclaimer —
surface that disclaimer verbatim to the user; do NOT proceed.
inspect_visual_challenge()— returns screenshot + tile metadata.- The host's vision model picks tiles.
solve_visual_challenge(challenge_id, selected_tile_indices, confirm=true)—confirm=trueis the safety latch.- v0.7.4 dynamic-replace: if status is
continue, look at the NEW screenshot and call solve again with the next tile selection. Pass emptyselected_tile_indices: []to finalize when no more matches.
Read reference/captcha-solver.md before calling this for the first time.
Flow 5 — "Scan an API for OWASP issues" (v0.8.0+)
Critical: requires QA_API_SECURITY_CONSENT=true AND the target host must
be in QA_API_SECURITY_AUTHORIZED_DOMAINS (localhost is implicit).
run_api_security_scan(spec_url, auth={...}, categories=[...], severity_threshold="medium")— the all-in-one entry point.- Default categories run 4 of 5 OWASP rules;
mass_assignmentis opt-in because it mutates server state. - Read findings in severity-rank order (critical → high → medium → low).
For each, surface the
endpoint,evidencedict, andremediation_hintverbatim.
v0.10.0 — universal bookend. Every primary tool in Flows 1-5 now
accepts plan_id. Pair Flow 0 with the relevant tool's plan_id
arg; the response's plan_verification block tells you which CPs
fired — no separate verify_plan call needed.
qa_plan(critical_points=[
{"id": "CP-API1", "verification_hint": "OWASP-API1-BOLA"},
{"id": "CP-API2", "verification_hint": "OWASP-API2-BrokenAuth"},
...
]) → plan_id
run_api_security_scan(spec_url, auth, plan_id=plan_id) → {
findings: [...],
plan_verification: {status: "passed", checklist: [...], unmet: []}
}
The same pattern works on run_tests(plan_id=…), analyze_url(plan_id=…),
solve_visual_challenge(plan_id=…), auto_generate_tests(plan_id=…).
Each tool builds an evidence stream tuned to its output: pytest test
rows for run_tests, captcha-solve summary record for solve_visual_challenge
(token NEVER in evidence — only token_populated: bool), module rows
for analyze_url, generated-test rows for auto_generate_tests. See the
per-tool evidence shapes in docs/prd-v0.10-universal-bookend.md §5.
Read reference/api-security-deep.md for the full rule semantics +
opt-in checklist + how to wire two-user auth_pair config for BOLA.
Hard rules
- No fabricated tool calls. Every tool name you announce must be in the
22-tool surface (see
reference/tool-surface.md). If a host wraps the MCP server, the tool names stay the same. - Surface consent errors verbatim. v0.7 visual challenge and v0.8 API
security both gate on env vars. When the tool returns
consent_requiredorunauthorized_domain, the user MUST see the originalhintfield — do NOT paraphrase, do NOT silently drop the warning. - Confirm before destructive runs.
mass_assignment(API3) mutates server state.run_tests --headed=trueopens a real browser. Both need the user's explicit nod before invoking; if they invoked the relevant slash command (/mk-qa-master:api-security mass-assignment), that counts as opt-in. - Tier 1 fixture is sacred.
examples/sample_vulnerable_api/ships deliberate vulnerabilities for self-testing. Never recommend deploying it; never use its endpoints as templates for the user's real code. - Don't paper over real failures. When
run_testsreports red, walk the user throughget_failure_detailsfirst. Do NOT silently re-run with relaxed filters or skip markers.
Slash commands
Optional shortcuts under commands/:
/mk-qa-master:run-tests <filter>— Flow 1 condensed/mk-qa-master:generate <url>— Flow 2 condensed/mk-qa-master:api-security <spec_url>— Flow 5 condensed
These are convenience templates; this skill also activates automatically from any prompt whose intent matches its description.
Reference files
reference/workflow.md— full operating manual for each of the 5 flowsreference/tool-surface.md— cheatsheet of all 22 MCP tools with one- liners + input schema gotchasreference/wire-mcp.md— what to do when the host doesn't have mk-qa- master as an MCP server yet (CLI fallback)
Why this skill exists
The MCP tool surface is callable by any host, but each host has a different way to discover what mk-qa-master is for. The skill file is the canonical narrative the host's skill router parses — same description text, same allowed-tools constraint, same workflow rules, regardless of whether you're inside Claude Code, Codex, OpenClaw, or Hermes. v0.9.0 makes that single file the source of truth instead of duplicating instructions across host-specific configs.