name: cartographer description: Analyze a pull request diff and group code churn into semantically meaningful major features based on code intent. Prioritize why the changes exist, how they are implemented, and what they change. Use when asked to break a PR into components, explain major change boundaries, quantify churn with line counts, and describe motivation/implementation/impact of each major PR component.
Cartographer
Overview
Produce an intent-first decomposition of a PR plus local working tree changes. Treat "major features" as behaviorally coherent change components, not just folders or file types, and explain why each component exists, how it is implemented, and what it changes.
Always include:
- Committed branch diff churn.
- Uncommitted local churn (staged, unstaged, and untracked files).
- A persisted markdown report written to
CARTOGRAPH.MDat the repository root.
Workflow
- Establish analysis scope.
- Resolve the base/head context first and use merge-base as the committed diff base.
- Default committed scope setup:
TARGET_BRANCH=<target-branch> # e.g. latest
HEAD_REF=HEAD
BASE_REF=$(git merge-base "$TARGET_BRANCH" "$HEAD_REF")
- Hitchhiker commit detection (CRITICAL): Before proceeding, check whether the branch contains commits from other PRs that were separately merged to the target branch (common on un-rebased branches with cherry-picks or branch merges). This is especially likely when
git log TARGET..HEADshows commits with PR numbers (e.g.,#12345) authored by different people.
# Count commits on branch
ALL_COMMITS=$(git log --oneline "$TARGET_BRANCH..HEAD" | wc -l)
# Get PR commits from GitHub (if a PR exists)
BRANCH_NAME=$(git branch --show-current)
PR_SHAS=$(gh pr list --head "$BRANCH_NAME" --json commits --jq '.[0].commits[].oid' 2>/dev/null)
PR_COMMIT_COUNT=$(echo "$PR_SHAS" | grep -c . 2>/dev/null || echo 0)
# If PR exists and commit counts differ, there are hitchhiker commits
if [ -n "$PR_SHAS" ] && [ "$PR_COMMIT_COUNT" -lt "$ALL_COMMITS" ]; then
echo "HITCHHIKER COMMITS DETECTED: $ALL_COMMITS on branch, $PR_COMMIT_COUNT in PR"
# Scope analysis to ONLY files touched by PR commits
PR_FILES=$(for sha in $PR_SHAS; do git diff-tree --no-commit-id --name-only -r "$sha"; done | sort -u)
# Use TARGET_BRANCH..HEAD scoped to PR files only (simulates post-rebase diff)
# This is now your authoritative diff — NOT merge-base..HEAD
fi
When hitchhiker commits are detected, use
git diff "$TARGET_BRANCH..HEAD" -- <PR_FILES>as the committed diff base instead of the fullBASE_REF..HEADdiff. State this scoping in the output.When NO hitchhiker commits are detected (or no GitHub PR exists), use
BASE_REFas the committed diff base for all committed churn commands.Pull full file-level and hunk-level data for both committed branch diff and local working tree changes.
Use commands that preserve both totals and detail:
# Committed branch range — use scoped diff if hitchhikers detected
git diff --numstat "$BASE_REF..$HEAD_REF" [-- <PR_FILES if scoped>]
git diff --name-status "$BASE_REF..$HEAD_REF" [-- <PR_FILES if scoped>]
git diff "$BASE_REF..$HEAD_REF" [-- <PR_FILES if scoped>]
# Local changes (staged + unstaged)
git diff --numstat --cached
git diff --name-status --cached
git diff --cached
git diff --numstat
git diff --name-status
git diff
# Untracked files
git ls-files --others --exclude-standard
- Build one combined churn dataset:
- committed (scoped to PR-only files if hitchhikers detected, otherwise full
BASE_REF..HEAD_REF) - staged (
--cached) - unstaged (working tree)
- untracked file additions (treat as
+N/-0, whereNis line count)
- committed (scoped to PR-only files if hitchhikers detected, otherwise full
- Reconcile totals across this full combined scope.
- Infer intent and implementation signals from the diff.
- Map each changed area to probable intent using:
- Domain language in identifiers and comments.
- Implementation shape (algorithms, data flow, control-flow changes, memory/perf tactics).
- Data model/schema changes.
- Test additions/updates that reveal expected behavior.
- Migration or config updates that indicate rollout/infrastructure intent.
- Separate mechanical churn (renames, formatting, generated files) from behavioral churn.
- Build candidate feature clusters.
- Start from behavior themes, then attach files/hunks.
- Split a single file across multiple features when hunks represent different intents.
- Merge clusters only when they ship one coherent behavior/outcome.
- Keep explicitly cross-cutting work separate (
refactor,plumbing,rename,test infra) unless tightly bound to one feature.
- Quantify churn per cluster.
- Compute per-feature churn from combined
--numstattotals; include additions, deletions, and combined total. - If a file is split across features, estimate hunk-level allocation and state this assumption.
- Keep totals internally consistent:
- Sum(feature totals) + sum(cross-cutting/unassigned) = total combined churn (committed + local).
- Explain motivation, implementation, and impact for each major feature.
- Motivation: Why this component exists (problem pressure, product intent, technical constraint).
- Implementation: How the change works technically (approach, major mechanisms, notable tradeoffs).
- Impact: What behavior, architecture, operations, or user flow changes because of it.
- Cite concrete evidence from the diff, not speculation.
- Declare boundaries and dependencies.
- Define what is in scope vs out of scope for each feature cluster.
- Identify dependencies between features and sequencing risk.
- Mention files/hunks or interfaces only when needed to clarify implementation details.
- Persist output.
- Write the full final analysis to
CARTOGRAPH.MDin the repository root. - The file content must match the final reported analysis exactly (same sections and totals).
- Overwrite existing
CARTOGRAPH.MDif present.
Grouping Rules
- Group by intent over file location.
- Prefer fewer major components (typically 2-7) unless the PR is genuinely broad.
- Avoid collapsing unrelated refactors into feature work.
- Keep a dedicated bucket for "supporting/enablement" changes when they do not deliver user-facing behavior directly.
- Mark uncertain mappings explicitly instead of overconfident labeling.
Output Contract
Always return the following structure:
- PR Summary
- One paragraph explaining the high-level intent and why this PR exists.
- Churn Totals
Total additions,Total deletions,Total changed linesfor combined scope.- Optional:
Files changedcount. Scope: explicitly state whether totals include uncommitted local changes.
- Major Feature Components
- For each component, include:
Component nameIntent boundary: what belongs here and what does not.Line count: additions, deletions, total.Motivation: why this component was needed.Implementation: how the component is built (approach/mechanisms/tradeoffs).Impact: behavior/system/user-facing effect.- Optional: concise
Evidence(representative files/hunks) if needed to support claims. - Optional:
Key interfaces touchedonly when interface changes are material to understanding impact.
- Cross-Cutting or Non-Feature Churn
- Refactors, renames, formatting, generated files, test harness work, or infra-only edits.
- Include line counts and rationale for why this is not a major feature.
- Dependency Map
- Short list of component-to-component dependencies and coupling risk.
- Ambiguities and Assumptions
- Explicitly list uncertain mappings and any line-count allocation assumptions.
- Verification Check
- Provide reconciliation line:
Component totals + cross-cutting + unassigned = combined totaland whether it balances.
- Artifact
- Confirm
CARTOGRAPH.MDwas written.
Quality Bar
- Use evidence from actual diff content, not path-name guesswork.
- Explain motivation, implementation, and impact for every major component; never omit any.
- Include line counts for every component and every non-feature bucket.
- Preserve uncertainty explicitly when intent is mixed or unclear.
- Keep feature boundaries decision-useful: another engineer should understand why it was built, how it works, and what it changes.
- De-emphasize exhaustive file/interface inventories; include them only as supporting evidence.
Response Template
## PR Summary
...
## Churn Totals
- Additions: ...
- Deletions: ...
- Changed lines: ...
- Files changed: ...
- Scope: committed diff only | committed diff + uncommitted local changes
## Major Feature Components
### 1) <Component Name>
- Intent boundary: ...
- Line count: +... / -... (total ...)
- Motivation: ...
- Implementation: ...
- Impact: ...
- Evidence (optional): ...
- Key interfaces touched (optional): ...
### 2) <Component Name>
...
## Cross-Cutting or Non-Feature Churn
- <Bucket>: +... / -... (total ...)
- Why non-feature: ...
## Dependency Map
- <Component A> -> <Component B>: ...
## Ambiguities and Assumptions
- ...
## Verification Check
- Component totals + cross-cutting + unassigned = combined total
- Status: balanced | not balanced
## Artifact
- Wrote `CARTOGRAPH.MD`: yes | no