benchmark-report - SKILL.md Agent Skill

name: benchmark-report description: Generate a benchmark report from stress test results (registration, API performance, search concurrency). Reads JSON result files and produces a markdown report suitable for docs/benchmarks/. license: Apache-2.0 metadata: author: mcp-gateway-registry version: "1.0"

Benchmark Report Skill

Generate a markdown benchmark report from stress test result files. The report documents the registry's performance characteristics under load, including registration throughput, API latency, and semantic search concurrency scaling.

Prerequisites

Stress test results must exist in tests/stress/results/<backend>/size-<N>/
Required files: registration.json, api_perf.json, search_concurrency.json
All JSON files should include registry_info (deployment configuration snapshot)

Input

/benchmark-report [RESULTS_DIR] [OUTPUT_PATH]

RESULTS_DIR - Path to the results directory (default: tests/stress/results/documentdb/size-100)
OUTPUT_PATH - Where to write the report (default: docs/benchmarks/benchmark-report.md)

Workflow

Step 1: Run the Report Generator

/usr/bin/python3 .claude/skills/benchmark-report/generate_benchmark_report.py \
  --results-dir tests/stress/results/documentdb/size-100 \
  --output docs/benchmarks/benchmark-report.md

The script reads all three JSON files, extracts the key metrics, and produces a structured markdown report.

Step 2: Review and Present

After generation:

Display the executive summary in the conversation
Tell the user the output path
Note any missing data (e.g., if one of the JSON files is absent)

Report Structure

The generated report includes:

Deployment Configuration - from registry_info (version, cloud, compute, storage, auth, embeddings, corpus size)
Registration Throughput - from registration.json (success rates, latency percentiles per entity type)
API Latency (Serial) - from api_perf.json (list endpoints and semantic search at k=5/10/50)
Search Concurrency Scaling - from search_concurrency.json (latency and throughput at concurrency 1/10/100)
Scaling Observations - interpretation of how latency degrades under load

Output

docs/benchmarks/
  benchmark-report.md    # The generated report (committed to repo)