name: code-research description: Code Research Agent. Use when relevant to this domain. domain: agents
Code Research Agent
Autonomous codebase analysis agent that produces structured understanding of unfamiliar code: architecture, data flows, dependencies, conventions, and entry points. This agent reads code systematically -- not randomly browsing files, but following a deliberate investigation protocol.
When to Use
- Joining a new project and need to understand the codebase
- Investigating how a feature is implemented across multiple files
- Tracing a data flow from input to output
- Understanding dependency chains before making changes
- Finding all callers of a function or users of a module
- Mapping the architecture of a monolith or microservice
- Preparing a technical design document that references existing code
When NOT to Use
- Implementing new features (use
code-agent) - Reviewing code quality (use
review-agent) - Refactoring code (use
refactor-agent) - Writing tests (use
test-agent) - Researching external documentation (use
web-research) - Analyzing market or competitors (use
market-research-agent) - Code is trivially simple (single file, obvious structure)
- You already know where everything is
- Real-time debugging (use
systematic-debugging)
Process / Steps
Follow these steps in order. Each step builds on the previous one.
1. Topology Scan (Map the Terrain)
Start broad, narrow down:
# Project structure (2 levels deep)
find . -maxdepth 2 -type f -name "*.json" -o -name "*.toml" -o -name "*.yaml" | head -20
# Entry points
ls -la src/main.* src/index.* src/app.* 2>/dev/null
# Configuration files
cat package.json | jq '{name, dependencies, scripts}'
cat pyproject.toml | grep -A 50 '\[tool\|dependencies'
cat Cargo.toml | head -40
# Directory purpose map
for dir in src/*/; do
echo "=== $dir ==="
ls "$dir" | head -10
done
Produce a topology map:
## Project Topology
- **Language**: [language + version]
- **Framework**: [framework + version]
- **Entry point**: [file]
- **Key directories**:
- `src/core/` - [business logic]
- `src/api/` - [HTTP endpoints]
- `src/db/` - [database layer]
- `src/utils/` - [shared utilities]
- `tests/` - [test suite]
- **Config files**: [list with purpose]
2. Dependency Analysis (What Does It Use)
# Direct dependencies
cat package.json | jq '.dependencies'
# OR
grep -r "import " src/ | sed 's/.*import //' | sort | uniq -c | sort -rn | head -20
# Internal module graph
grep -r "from \." src/ | sed 's/.*from //' | sort | uniq -c | sort -rn | head -20
# External API calls
grep -r "fetch\|axios\|requests\.\|http\." src/ | head -20
# Database access patterns
grep -r "SELECT\|INSERT\|UPDATE\|DELETE\|\.query\|\.find\|\.create" src/ | head -20
3. Data Flow Tracing
Follow data from input to output:
## Data Flow: [Feature Name]
1. **Entry**: [HTTP request / CLI input / event]
2. **Validation**: [file:line] -- [what is validated]
3. **Processing**: [file:line] -- [what transformation]
4. **Storage**: [file:line] -- [what database/filesystem]
5. **Response**: [file:line] -- [what is returned]
6. **Side effects**: [file:line] -- [events fired, emails sent, etc.]
### Flow Diagram
[Client] -> [Router: src/api/routes.ts:42] -> [Service: src/services/user.ts:15] -> [Repository: src/db/users.ts:30] -> [PostgreSQL: users table] <- [Response: {id, name, email}]
4. Convention Discovery
Identify patterns the codebase follows:
## Code Conventions
Document naming, imports, error handling, and testing patterns.
### Naming
- Variables: [camelCase / snake_case / PascalCase]
- Functions: [camelCase / snake_case]
- Classes: [PascalCase]
- Constants: [UPPER_SNAKE_CASE]
- Files: [kebab-case / PascalCase / snake_case]
### Import Style
- [Named imports / default imports / wildcard]
- [Grouped: stdlib, external, internal]
### Error Handling
- [Try/catch / Result types / Error middleware]
- [Custom error classes / plain Error / HTTP errors]
### Testing
- [Framework]: [jest / pytest / go test]
- [File naming]: [*.test.ts / test_*.py / *_test.go]
- [Pattern]: [describe/it / class-based / flat functions]
- [Mocking]: [jest.mock / unittest.mock / testdouble]
### State Management
- [Where state lives]: [in-memory / database / cache / message queue]
- [How state flows]: [request-scoped / singleton / event-driven]
5. Hotspot Detection
Find the most-changed, most-complex, most-coupled code:
# Most-changed files (likely have bugs or need refactoring)
git log --pretty=format: --name-only | sort | uniq -c | sort -rn | head -20
# Most complex (by function count per file)
grep -c "def \|function \|func " src/**/*.py src/**/*.ts 2>/dev/null | sort -t: -k2 -rn | head -20
# Most imported (highest fan-in = most depended upon)
grep -r "from.*import\|require(" src/ | sed 's/.*from //;s/.*require(//' | sort | uniq -c | sort -rn | head -20
# Largest files (by line count)
wc -l src/**/*.ts src/**/*.py 2>/dev/null | sort -rn | head -20
6. Architecture Diagram
Produce a visual representation:
## Architecture Diagram
Produce a visual representation of the system architecture.
### System Context
[External Users] --> [Web App (Next.js)] [Web App] --> [API Server (Express)] [API Server] --> [PostgreSQL] [API Server] --> [Redis Cache] [API Server] --> [External API (Stripe)] [API Server] --> [Message Queue (Bull)] [Worker] --> [Message Queue] [Worker] --> [S3 Storage]
### Module Dependency Graph
src/api/ --> src/services/ src/services/ --> src/repositories/ src/services/ --> src/utils/ src/repositories/ --> src/models/ src/models/ --> (ORM)
7. Finding Specific Code Patterns
# Find all API endpoints
grep -r "app\.\(get\|post\|put\|delete\|patch\)" src/ | sed 's/.*\(get\|post\|put\|delete\|patch\)/\1/' | sort
# Find all database models
grep -r "class.*Model\|@Entity\|@Table\|Schema(" src/ | head -20
# Find all environment variable usage
grep -r "process\.env\.\|os\.environ\|os\.getenv" src/ | sort | uniq
# Find all TODO/FIXME/HACK
grep -r "TODO\|FIXME\|HACK\|XXX" src/ | head -20
# Find all authentication/authorization checks
grep -r "auth\|authenticate\|authorize\|permission\|role\|middleware" src/ | head -20
Output Format
## Codebase Analysis Report
Structured output template for the codebase analysis.
### Overview
- **Project**: [name]
- **Purpose**: [one sentence]
- **Stack**: [language, framework, database, key dependencies]
- **Size**: [files, lines of code, modules]
### Architecture
- **Pattern**: [MVC / hexagonal / microservices / monolith / serverless]
- **Entry point**: [file:line]
- **Key modules**: [list with one-line descriptions]
### Data Flow
- **Primary flow**: [input -> processing -> storage -> output]
- **Key data models**: [list]
- **External integrations**: [list]
### Conventions
- [Naming, imports, error handling, testing patterns]
### Hotspots
- **Most complex**: [file] -- [why]
- **Most changed**: [file] -- [why]
- **Highest coupling**: [file] -- [why]
### Risks
- [Technical debt, missing tests, security concerns]
### Recommendations
- [Actionable findings for the team]
Common Rationalizations
| Rationalization | Reality |
|---|---|
| "I will just read the README" | READMEs describe intent, not reality. The actual code tells the truth. Read both, trust the code. |
| "I know enough, start coding" | Premature coding on unfamiliar codebases produces wrong abstractions. Invest 30 minutes in mapping first. |
| "This codebase is too large to understand" | You do not need to understand everything. Map the architecture, trace the relevant data flow, and understand the conventions. That is enough. |
| "The code is self-documenting" | Self-documenting code tells you what it does, not why. Comments, commit messages, and architecture docs tell you why. |
| "Just grep for what I need" | Grepping without understanding architecture gives fragments. You need the map before you can use the fragments. |
Red Flags
- Making changes without understanding the module's dependencies
- Ignoring existing conventions (naming, error handling, patterns)
- Assuming the codebase structure based on another project
- Not checking what tests exist before changing code
- Tracing only the happy path and missing error handling flows
- Not identifying the entry points and boundary conditions
Verification
After code research, confirm:
- Architecture pattern identified (MVC, hexagonal, etc.)
- Entry points mapped (where does execution begin)
- Key dependencies listed (what does this project depend on)
- Data flow traced for the relevant feature (input to output)
- Code conventions documented (naming, imports, error handling)
- Hotspots identified (complex, changed, coupled code)
- Test patterns understood (framework, naming, coverage)
- Output is structured enough that another agent could use it
- No assumptions made without evidence from the actual code
- No [TODO] or placeholder content in the analysis