name: rubydex-mcp-benchmark description: Use when benchmarking rubydex MCP against plain Grep/Glob for Ruby code tasks. Guides A/B comparison setup, candidate selection, and metrics collection.
Rubydex MCP Benchmark
Compare rubydex MCP-assisted development vs standard Grep/Glob for Ruby code tasks.
Prerequisites
Ensure the rubydex_mcp binary is installed:
# From the rubydex repo root
cargo install --path rust/rubydex-mcp
This places rubydex_mcp in ~/.cargo/bin/. Verify: rubydex_mcp --help
Step 1: Find a Good Benchmark Candidate
Use rubydex MCP tools to find a class or module suitable for testing:
- Run
codebase_statsto get an overview of the target project - Run
search_declarationswith kind "Class" or "Module" to list candidates - For each candidate, run
find_constant_referencesand count references - Pick a target with 10-30 references across 4+ files that has test coverage
Good candidates are:
- Self-contained (not deeply entangled with the whole codebase)
- Referenced in both lib and test directories
- Have a dedicated test file
Step 2: Set Up the Experiment
Create two git worktrees for the target Ruby project:
cd /path/to/target-project
git worktree add ../project-with-mcp -b bench-with-mcp HEAD
git worktree add ../project-without-mcp -b bench-without-mcp HEAD
Add .mcp.json to the with-mcp worktree only:
cat > ../project-with-mcp/.mcp.json << 'EOF'
{
"mcpServers": {
"rubydex": {
"command": "${HOME}/.cargo/bin/rubydex_mcp"
}
}
}
EOF
Step 3: Generate Commands for the Developer
Pick a testing prompt based on the scenario. Keep prompts vague and natural - like a lazy developer would type. Don't specify full namespaces, file paths, or implementation details. Let the agent figure it out. This tests the MCP advantage fairly since the MCP agent can resolve ambiguity semantically while the non-MCP agent has to grep around.
Guidelines for writing prompts:
- Use short class names, not fully qualified (
ContextnotIRB::Context) - Don't specify file locations or paths
- Don't tell the agent how to do it, just what to do
- Always end with how to run tests
Examples:
Rename:
Rename Locale to LocaleManager in IRB. All existing tests must continue to pass. Run tests with: bundle exec rake test
Extract module:
Extract all history-related methods from Context into a new module Context::HistoryManager. All existing tests must continue to pass. Run tests with: bundle exec rake test
Then generate two ready-to-copy commands and present them to the developer:
With MCP (run in terminal 1):
cd /path/to/project-with-mcp && claude "<testing prompt>" --dangerously-skip-permissions
Without MCP (run in terminal 2):
cd /path/to/project-without-mcp && claude "<testing prompt>" --dangerously-skip-permissions
Tell the developer to copy and run these commands in separate terminals simultaneously.
Step 4: Compare Results
After both sessions complete, compare:
| Metric | With MCP | Without MCP |
|---|---|---|
| Duration | (from session) | (from session) |
| Tokens | (from session stats) | (from session stats) |
| Files changed | git diff --stat in with-mcp worktree |
git diff --stat in without-mcp worktree |
| Tests | pass/fail | pass/fail |
Run git diff --stat in each worktree to compare the changes made.
Step 5: Cleanup
cd /path/to/target-project
git worktree remove ../project-with-mcp
git worktree remove ../project-without-mcp
git branch -D bench-with-mcp bench-without-mcp
Reference: Our Benchmark Results
| Scenario | Metric | With MCP | Without MCP | MCP Advantage |
|---|---|---|---|---|
| Rename (IRB::Locale, 16 refs) | Tokens | 22k | 31.8k | 30% fewer |
| Rename (IRB::Locale, 16 refs) | Time | 2m 9s | 2m 44s | 25% faster |
| Extract module (IRB::Context) | Tokens | 33.2k | 63.0k | 47% fewer |
| Extract module (IRB::Context) | Time | 1m 49s | 2m 39s | 31% faster |