code-research

star 3

Code Research Agent. Use when relevant to this domain.

oyi77 By oyi77 schedule Updated 6/8/2026

name: code-research description: Code Research Agent. Use when relevant to this domain. domain: agents

Code Research Agent

Autonomous codebase analysis agent that produces structured understanding of unfamiliar code: architecture, data flows, dependencies, conventions, and entry points. This agent reads code systematically -- not randomly browsing files, but following a deliberate investigation protocol.

When to Use

  • Joining a new project and need to understand the codebase
  • Investigating how a feature is implemented across multiple files
  • Tracing a data flow from input to output
  • Understanding dependency chains before making changes
  • Finding all callers of a function or users of a module
  • Mapping the architecture of a monolith or microservice
  • Preparing a technical design document that references existing code

When NOT to Use

  • Implementing new features (use code-agent)
  • Reviewing code quality (use review-agent)
  • Refactoring code (use refactor-agent)
  • Writing tests (use test-agent)
  • Researching external documentation (use web-research)
  • Analyzing market or competitors (use market-research-agent)
  • Code is trivially simple (single file, obvious structure)
  • You already know where everything is
  • Real-time debugging (use systematic-debugging)

Process / Steps

Follow these steps in order. Each step builds on the previous one.

1. Topology Scan (Map the Terrain)

Start broad, narrow down:

# Project structure (2 levels deep)
find . -maxdepth 2 -type f -name "*.json" -o -name "*.toml" -o -name "*.yaml" | head -20

# Entry points
ls -la src/main.* src/index.* src/app.* 2>/dev/null

# Configuration files
cat package.json | jq '{name, dependencies, scripts}'
cat pyproject.toml | grep -A 50 '\[tool\|dependencies'
cat Cargo.toml | head -40

# Directory purpose map
for dir in src/*/; do
    echo "=== $dir ==="
    ls "$dir" | head -10
done

Produce a topology map:

## Project Topology
- **Language**: [language + version]
- **Framework**: [framework + version]
- **Entry point**: [file]
- **Key directories**:
  - `src/core/` - [business logic]
  - `src/api/` - [HTTP endpoints]
  - `src/db/` - [database layer]
  - `src/utils/` - [shared utilities]
  - `tests/` - [test suite]
- **Config files**: [list with purpose]

2. Dependency Analysis (What Does It Use)

# Direct dependencies
cat package.json | jq '.dependencies'
# OR
grep -r "import " src/ | sed 's/.*import //' | sort | uniq -c | sort -rn | head -20

# Internal module graph
grep -r "from \." src/ | sed 's/.*from //' | sort | uniq -c | sort -rn | head -20

# External API calls
grep -r "fetch\|axios\|requests\.\|http\." src/ | head -20

# Database access patterns
grep -r "SELECT\|INSERT\|UPDATE\|DELETE\|\.query\|\.find\|\.create" src/ | head -20

3. Data Flow Tracing

Follow data from input to output:

## Data Flow: [Feature Name]
1. **Entry**: [HTTP request / CLI input / event]
2. **Validation**: [file:line] -- [what is validated]
3. **Processing**: [file:line] -- [what transformation]
4. **Storage**: [file:line] -- [what database/filesystem]
5. **Response**: [file:line] -- [what is returned]
6. **Side effects**: [file:line] -- [events fired, emails sent, etc.]

### Flow Diagram

[Client] -> [Router: src/api/routes.ts:42] -> [Service: src/services/user.ts:15] -> [Repository: src/db/users.ts:30] -> [PostgreSQL: users table] <- [Response: {id, name, email}]


4. Convention Discovery

Identify patterns the codebase follows:

## Code Conventions

Document naming, imports, error handling, and testing patterns.

### Naming
- Variables: [camelCase / snake_case / PascalCase]
- Functions: [camelCase / snake_case]
- Classes: [PascalCase]
- Constants: [UPPER_SNAKE_CASE]
- Files: [kebab-case / PascalCase / snake_case]

### Import Style
- [Named imports / default imports / wildcard]
- [Grouped: stdlib, external, internal]

### Error Handling
- [Try/catch / Result types / Error middleware]
- [Custom error classes / plain Error / HTTP errors]

### Testing
- [Framework]: [jest / pytest / go test]
- [File naming]: [*.test.ts / test_*.py / *_test.go]
- [Pattern]: [describe/it / class-based / flat functions]
- [Mocking]: [jest.mock / unittest.mock / testdouble]

### State Management
- [Where state lives]: [in-memory / database / cache / message queue]
- [How state flows]: [request-scoped / singleton / event-driven]

5. Hotspot Detection

Find the most-changed, most-complex, most-coupled code:

# Most-changed files (likely have bugs or need refactoring)
git log --pretty=format: --name-only | sort | uniq -c | sort -rn | head -20

# Most complex (by function count per file)
grep -c "def \|function \|func " src/**/*.py src/**/*.ts 2>/dev/null | sort -t: -k2 -rn | head -20

# Most imported (highest fan-in = most depended upon)
grep -r "from.*import\|require(" src/ | sed 's/.*from //;s/.*require(//' | sort | uniq -c | sort -rn | head -20

# Largest files (by line count)
wc -l src/**/*.ts src/**/*.py 2>/dev/null | sort -rn | head -20

6. Architecture Diagram

Produce a visual representation:

## Architecture Diagram

Produce a visual representation of the system architecture.


### System Context

[External Users] --> [Web App (Next.js)] [Web App] --> [API Server (Express)] [API Server] --> [PostgreSQL] [API Server] --> [Redis Cache] [API Server] --> [External API (Stripe)] [API Server] --> [Message Queue (Bull)] [Worker] --> [Message Queue] [Worker] --> [S3 Storage]


### Module Dependency Graph

src/api/ --> src/services/ src/services/ --> src/repositories/ src/services/ --> src/utils/ src/repositories/ --> src/models/ src/models/ --> (ORM)


7. Finding Specific Code Patterns

# Find all API endpoints
grep -r "app\.\(get\|post\|put\|delete\|patch\)" src/ | sed 's/.*\(get\|post\|put\|delete\|patch\)/\1/' | sort

# Find all database models
grep -r "class.*Model\|@Entity\|@Table\|Schema(" src/ | head -20

# Find all environment variable usage
grep -r "process\.env\.\|os\.environ\|os\.getenv" src/ | sort | uniq

# Find all TODO/FIXME/HACK
grep -r "TODO\|FIXME\|HACK\|XXX" src/ | head -20

# Find all authentication/authorization checks
grep -r "auth\|authenticate\|authorize\|permission\|role\|middleware" src/ | head -20

Output Format

## Codebase Analysis Report

Structured output template for the codebase analysis.


### Overview
- **Project**: [name]
- **Purpose**: [one sentence]
- **Stack**: [language, framework, database, key dependencies]
- **Size**: [files, lines of code, modules]

### Architecture
- **Pattern**: [MVC / hexagonal / microservices / monolith / serverless]
- **Entry point**: [file:line]
- **Key modules**: [list with one-line descriptions]

### Data Flow
- **Primary flow**: [input -> processing -> storage -> output]
- **Key data models**: [list]
- **External integrations**: [list]

### Conventions
- [Naming, imports, error handling, testing patterns]

### Hotspots
- **Most complex**: [file] -- [why]
- **Most changed**: [file] -- [why]
- **Highest coupling**: [file] -- [why]

### Risks
- [Technical debt, missing tests, security concerns]

### Recommendations
- [Actionable findings for the team]

Common Rationalizations

Rationalization Reality
"I will just read the README" READMEs describe intent, not reality. The actual code tells the truth. Read both, trust the code.
"I know enough, start coding" Premature coding on unfamiliar codebases produces wrong abstractions. Invest 30 minutes in mapping first.
"This codebase is too large to understand" You do not need to understand everything. Map the architecture, trace the relevant data flow, and understand the conventions. That is enough.
"The code is self-documenting" Self-documenting code tells you what it does, not why. Comments, commit messages, and architecture docs tell you why.
"Just grep for what I need" Grepping without understanding architecture gives fragments. You need the map before you can use the fragments.

Red Flags

  • Making changes without understanding the module's dependencies
  • Ignoring existing conventions (naming, error handling, patterns)
  • Assuming the codebase structure based on another project
  • Not checking what tests exist before changing code
  • Tracing only the happy path and missing error handling flows
  • Not identifying the entry points and boundary conditions

Verification

After code research, confirm:

  • Architecture pattern identified (MVC, hexagonal, etc.)
  • Entry points mapped (where does execution begin)
  • Key dependencies listed (what does this project depend on)
  • Data flow traced for the relevant feature (input to output)
  • Code conventions documented (naming, imports, error handling)
  • Hotspots identified (complex, changed, coupled code)
  • Test patterns understood (framework, naming, coverage)
  • Output is structured enough that another agent could use it
  • No assumptions made without evidence from the actual code
  • No [TODO] or placeholder content in the analysis
Install via CLI
npx skills add https://github.com/oyi77/1ai-skills --skill code-research
Repository Details
star Stars 3
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator