module-discover - SKILL.md Agent Skill

name: module-discover description: Discover and document a dependency library or submodule — analyzes all uses within the codebase, divides the library into logical modules, identifies which modules are used, and generates LLM-consumable API documentation for each module. Use when the user wants to understand a library dependency, map its modules, or generate API reference docs for a submodule.

Library Module Discovery & Documentation

You are analyzing a dependency library or submodule used by this project. Your goal is to produce a structured, LLM-consumable module map and API reference that helps agents quickly understand what the library provides and how this project uses it.

Before Starting

Ask the user:

Which library/submodule? (e.g., cucascade, duckdb, cudf, rmm, spdlog, or a path to any dependency)
Where is the library source? If it's a submodule, it's already in the repo. If it's an installed dependency, ask for the include path (e.g., $CONDA_PREFIX/include/cudf or $LIBCUDF_ENV_PREFIX/include/rmm).
Output location? Default: .claude/skills/module-discover/docs/<library_name>/

If the user gives just a name (e.g., "cudf"), try these locations in order:

Submodule in repo root: ./<name>/
Conda prefix: $CONDA_PREFIX/include/<name>/ or $LIBCUDF_ENV_PREFIX/include/<name>/
System include: /usr/include/<name>/ or /usr/local/include/<name>/

Workflow

Phase 1: Discover Our Usage

Find every place in our codebase (under src/, test/, CMakeLists.txt, *.cmake) that references the target library.

Step 1a: Find includes

# Find all #include directives referencing the library
grep -rn '#include.*<LIBRARY_NAME/' src/ test/ --include='*.hpp' --include='*.cpp' --include='*.cu' --include='*.cuh'
grep -rn '#include.*"LIBRARY_NAME/' src/ test/ --include='*.hpp' --include='*.cpp' --include='*.cu' --include='*.cuh'

Step 1b: Find API calls Search for namespace-qualified calls and type references:

# For namespaced libraries (e.g., cudf::, rmm::, cucascade::)
grep -rn 'NAMESPACE::' src/ test/ --include='*.hpp' --include='*.cpp' --include='*.cu' --include='*.cuh'

Step 1c: Find CMake references

grep -rn 'LIBRARY_NAME' CMakeLists.txt third_party/*.cmake extension_config.cmake

Step 1d: Compile usage inventory Create a list of:

Every header we include from the library
Every function/method we call
Every type we use (classes, enums, typedefs)
Every macro we reference

Group these by source file to understand usage patterns.

Phase 2: Map the Library's Module Structure

Explore the library's source/headers to identify logical modules.

Step 2a: Survey top-level structure

# List top-level directories and key header files
ls -la LIBRARY_PATH/
ls -la LIBRARY_PATH/include/ 2>/dev/null
find LIBRARY_PATH/include/ -maxdepth 2 -type d 2>/dev/null

Step 2b: Identify modules A "module" is a logical grouping of related functionality. Look for:

Top-level subdirectories under include/ (strongest signal)
Namespace subdivisions (e.g., cudf::io, cudf::strings, rmm::mr)
Separate header groups that serve a distinct purpose
README or docs that describe the library's architecture

Guidelines for module count:

Aim for 3-8 modules for most libraries
A very large library (cudf, duckdb) may have 10-12
A small library (spdlog, abseil component) may have 2-4
Each module should represent a coherent unit of functionality
Prefer fewer, broader modules over many granular ones

Step 2c: Classify modules For each module, determine:

USED: Our codebase includes headers from or calls APIs in this module
UNUSED: No references found in our codebase

Phase 3: Document Used Modules (Deep)

For each USED module, produce detailed API documentation.

Step 3a: List all public headers

find LIBRARY_PATH/include/MODULE_PATH -name '*.hpp' -o -name '*.h' | sort

Step 3b: Extract API surface For each public header that we use (or that's closely related to what we use), extract:

Classes/Structs: Name, brief purpose, key public methods with signatures
Free functions: Signature and brief description
Enums: Values and meaning
Type aliases: What they resolve to
Constants/Macros: Name and value/purpose

Focus on:

APIs we actually call (highest priority — include usage examples from our code)
APIs in the same headers we include (medium priority — we might need them)
Other public APIs in the module (lower priority — available but unused)

Step 3c: Document usage patterns For each used API, find 1-2 representative call sites in our codebase showing how we use it. Include file path and line number.

Phase 4: Document Unused Modules (Light)

For each UNUSED module, produce a brief summary:

Module name and path
2-3 sentence description of what it provides
Key classes/functions (names only, no signatures)
Potential relevance to our project (if any)

Phase 5: Generate Output

Write the documentation in the output directory with this structure:

docs/<library_name>/
  README.md           — Overview: library purpose, version, module map, usage summary
  modules/
    <module_name>.md  — Per-module documentation (deep for USED, light for UNUSED)

Output Format

README.md Template

# <Library Name> — Module Reference

**Version**: <version or git commit>
**Location**: <path to library source>
**Namespace**: <primary namespace>

## Module Map

| Module | Status | Description | Key APIs Used |
|--------|--------|-------------|---------------|
| <name> | USED   | <one-line>  | <top 3 APIs>  |
| <name> | UNUSED | <one-line>  | —             |

## Our Usage Summary

We use <N> of <M> modules. Primary integration points:
- <bullet summary of how we use the library>

## Files That Reference This Library

| Source File | Modules Used | Key APIs |
|-------------|-------------|----------|
| <path>      | <modules>   | <APIs>   |

Per-Module Documentation (USED — Deep)

# <Module Name>

**Status**: USED
**Path**: <path within library>
**Headers we include**: <list>

## Summary
<2-3 sentences: what this module does and how we use it>

## API Reference

### <Class/Function Name>

**Header**: `<header_path>`
**Signature**:
\`\`\`cpp
<full signature>
\`\`\`

**Description**: <what it does, key parameters, return value>

**Our usage**:
- `<our_file.cpp>:<line>` — <brief context of how we call it>

### <Next API...>

## APIs Available but Not Used

These APIs exist in this module but are not currently called by our codebase:

| API | Header | Brief Description |
|-----|--------|-------------------|
| <name> | <header> | <one-line> |

Per-Module Documentation (UNUSED — Light)

# <Module Name>

**Status**: UNUSED
**Path**: <path within library>

## Summary
<2-3 sentences: what this module provides>

## Key APIs
- `<ClassName>` — <one-line>
- `<function_name>()` — <one-line>

## Potential Relevance
<1-2 sentences on whether/how this could be useful to our project, or "Not applicable to our use case">

Important Guidelines

Be thorough in Phase 1. Missing a usage means misclassifying a module as UNUSED.
Read headers, don't guess. Extract actual signatures from the source code.
Include line references. Every usage example should include file:line.
Keep unused module docs brief. Don't spend time documenting APIs we don't use in detail.
Use consistent formatting. LLMs parse markdown tables and code blocks reliably.
Note version-specific APIs. If you notice the library version matters (e.g., API changed between versions), flag it.
Parallelize where possible. Phase 1 searches and Phase 3 header reads can be parallelized across modules.
For very large libraries (100+ headers in a module), focus deep documentation on the headers we actually include, and list the rest in a summary table.

Updating Existing Documentation

If documentation already exists for a library in the output directory:

Read the existing README.md to understand what was previously documented
Re-run Phase 1 to detect any new or removed usages
Update only the modules/files that changed
Update the README.md module map and usage summary

Example Invocations

User: "Document the cucascade submodule"
→ Library: cucascade, Path: ./cucascade/, Output: .claude/skills/module-discover/docs/cucascade/

User: "Map out how we use cudf"
→ Library: cudf, Path: $CONDA_PREFIX/include/cudf/ (or $LIBCUDF_ENV_PREFIX/include/cudf/), Output: .claude/skills/module-discover/docs/cudf/

User: "What parts of rmm do we use?"
→ Library: rmm, Path: $CONDA_PREFIX/include/rmm/ (or $LIBCUDF_ENV_PREFIX/include/rmm/), Output: .claude/skills/module-discover/docs/rmm/

User: "Analyze spdlog dependency"
→ Library: spdlog, Path: (find via FetchContent in CMake), Output: .claude/skills/module-discover/docs/spdlog/