name: mcp-builder description: "Builds production MCP servers via 4-phase methodology: research, implement, test, evaluate. Triggers: build MCP, new MCP, MCP integration, MCP server scaffold." effort: high disable-model-invocation: true argument-hint: "[service name or API description]" allowed-tools: Read, Write, Edit, Bash, Grep, Glob
MCP Builder
$ARGUMENTS
Build a production-grade MCP server following Anthropic's 4-phase methodology.
When to Use
- Wrapping a third-party REST API as MCP tools
- Exposing an internal database or service to Claude
- Creating reusable integrations for the team
- Migrating a custom tool into the MCP ecosystem
For MCP protocol theory, see mcp-patterns knowledge skill (auto-loaded).
4-Phase Workflow
Phase 1 — Research & Planning
- Read the target API's documentation (OpenAPI spec, README, changelog).
- Identify the 5-15 most useful operations. Prefer workflow-oriented tools over 1:1 API mirror.
- Decide transport:
stdiofor local dev tools,streamable-httpfor remote/shared. - Decide language: TypeScript recommended (best SDK), Python acceptable (
mcppackage). - List required secrets (API keys, tokens) and their env var names.
Output: PLAN.md with tool list, transport choice, auth model.
Phase 2 — Implementation
Scaffold:
my-mcp/
├── package.json # or pyproject.toml
├── src/
│ ├── server.ts # entry point
│ ├── client.ts # API client (axios/httpx)
│ ├── tools/ # one file per tool
│ ├── schemas.ts # Zod/Pydantic schemas
│ └── errors.ts # typed errors
├── .env.example
└── README.md
Per tool:
- Input/output schemas (Zod for TS, Pydantic for Python)
- Clear
namewith service prefix (e.g.github_create_issue) - Description starts with a verb, mentions trigger keywords
- Annotations:
readOnlyHint,destructiveHint,idempotentHint,openWorldHint - Pagination support via
cursororpageparameters - Focused responses — filter noise, don't dump raw API payloads
Phase 3 — Review & Testing
- TypeScript:
npm run typecheck && npm run lint && npm test - Python:
ruff check . && mypy --strict src/ && pytest - MCP Inspector dry-run:
npx @modelcontextprotocol/inspector node dist/server.js - Verify each tool's schema validates a real request and rejects malformed input.
Phase 4 — Evaluation
Write 10 realistic end-user questions that an LLM should be able to answer using your server. Run them through Claude with the server attached. Grade: did the model call the right tool? Did the response give enough to answer? Fix the description, schema, or response format of any tool that failed.
Example eval questions for a github-mcp:
- "What issues are open on repo X with label
bug?" - "Create an issue titled Y in repo Z"
- "Who has the most commits this month in repo X?"
Tool Design Checklist
- Name has service prefix and is verb-led
- Description mentions when to use it and includes trigger keywords
- Input schema is strict, no free-form
objectwithadditionalProperties: true - Output is focused — essential fields only, with pagination cursor if applicable
- Error responses are actionable ("API returned 403 — check
GITHUB_TOKENenv var") - Annotations set correctly (readonly/destructive/idempotent)
- No secrets logged or echoed in errors
- Rate limiting respects the upstream API
Transport Cheat Sheet
| Scenario | Transport |
|---|---|
| Local dev tool, 1 user | stdio |
| Remote server, multiple users | streamable-http with SSE |
| Internal company tool, auth required | streamable-http + OAuth proxy |
| Embedded in IDE/editor | stdio spawned by editor |
Registration Cheat Sheet
Local Claude Code (.mcp.json):
{
"mcpServers": {
"my-mcp": {
"command": "node",
"args": ["dist/server.js"],
"env": { "API_KEY": "$MY_API_KEY" }
}
}
}
Global Claude Code (user-scope):
claude mcp add my-mcp --scope user -- node /path/to/server.js
Claude Desktop: same JSON, placed in ~/Library/Application Support/Claude/claude_desktop_config.json (macOS).
Common Pitfalls
| Mistake | Fix |
|---|---|
| 1:1 API mirror with 80 tools | Pick 10 workflow-oriented tools |
description: "wrapper for /users endpoint" |
description: "Find users by email, role, or team. Use when the user mentions employees, staff, or access" |
| Dumping raw JSON responses | Filter to 3-5 fields the agent actually needs |
| Logging API keys on error | Redact all env vars in error formatters |
exit 1 on transient errors |
Retry with exponential backoff, surface final error |
| Stdout pollution (MCP stdio) | All logs go to stderr, stdout is JSON-RPC only |
Rules
- MUST pick 5-15 workflow-oriented tools, not a 1:1 API mirror. The model routes by task, not by endpoint.
- MUST use strict input schemas (Zod for TS, Pydantic for Python).
additionalProperties: truelets the model invent fields and drift. - MUST set correct tool annotations:
readOnlyHint,destructiveHint,idempotentHint,openWorldHint— the host uses these for safety UIs and auto-approval policies - NEVER expose an MCP server on a public network without auth. MCP clients default to trusting the transport — attackers reach tools directly.
- NEVER log API keys, tokens, or env vars in error messages. A verbose error thrown at the model becomes a stored credential in the conversation.
- CRITICAL: with
stdiotransport, all logs go to stderr. Any stdout write that is not a JSON-RPC message breaks the client. - MANDATORY: every server ships with a README documenting env vars, required scopes, rate limits, and a minimal invocation example.
Gotchas
stdiotransport sends the server's stdout directly to the client as protocol frames. A strayprint()orconsole.log()crashes the client with a parse error and no clear diagnostic. Route all logs through a logger that writes to stderr.- MCP tool descriptions are the only thing the LLM sees when routing.
description: "calls POST /api/v2/tickets"tells the model nothing about intent. Describe when to use, not what it does at the HTTP level. - Annotations (
readOnlyHint, etc.) are optional in the spec but some hosts (Claude Desktop, Cursor) gate auto-approval on them. MissingdestructiveHint: trueon a delete tool may cause the client to run it silently. streamable-httpwith SSE requires the server to handle client reconnects with aLast-Event-IDheader. Many quick-start templates skip this and drop events on flaky networks.- Pagination cursors must be opaque from the client's perspective but stable across retries. A timestamp cursor that advances on every poll fails if the client retries the same cursor after a transient error.
- Claude Desktop caches server capabilities on first connection. After changing tool schemas, users must explicitly reload the server (quit + reopen or remove/re-add the server) — simply restarting the server process is not enough.
When NOT to Use
- For in-toolkit skills (slash commands, knowledge docs) — use
/skill-creator - For agents inside ai-toolkit — use
/agent-creator - For plugin packs bundling multiple agents/skills — use
/plugin-creator - For protocol-level MCP theory and transport trade-offs — use
/mcp-patterns(knowledge skill) - For conformance/integration testing of an MCP server — delegate to the
mcp-testing-engineeragent
Related
mcp-patterns— protocol reference (auto-loaded knowledge skill)mcp-specialistagent — for deep MCP design questionsmcp-testing-engineeragent — for protocol conformance testing- https://modelcontextprotocol.io/
- https://github.com/anthropics/skills/tree/main/skills/mcp-builder