cartographer

name: cartographer description: Analyze a pull request diff and group code churn into semantically meaningful major features based on code intent. Prioritize why the changes exist, how they are implemented, and what they change. Use when asked to break a PR into components, explain major change boundaries, quantify churn with line counts, and describe motivation/implementation/impact of each major PR component.

Overview

Produce an intent-first decomposition of a PR plus local working tree changes. Treat "major features" as behaviorally coherent change components, not just folders or file types, and explain why each component exists, how it is implemented, and what it changes.

Always include:

Committed branch diff churn.
Uncommitted local churn (staged, unstaged, and untracked files).
A persisted markdown report written to CARTOGRAPH.MD at the repository root.

Workflow

Establish analysis scope.

Resolve the base/head context first and use merge-base as the committed diff base.
Default committed scope setup:

TARGET_BRANCH=<target-branch>   # e.g. latest
HEAD_REF=HEAD
BASE_REF=$(git merge-base "$TARGET_BRANCH" "$HEAD_REF")

Hitchhiker commit detection (CRITICAL): Before proceeding, check whether the branch contains commits from other PRs that were separately merged to the target branch (common on un-rebased branches with cherry-picks or branch merges). This is especially likely when git log TARGET..HEAD shows commits with PR numbers (e.g., #12345) authored by different people.

# Count commits on branch
ALL_COMMITS=$(git log --oneline "$TARGET_BRANCH..HEAD" | wc -l)

# Get PR commits from GitHub (if a PR exists)
BRANCH_NAME=$(git branch --show-current)
PR_SHAS=$(gh pr list --head "$BRANCH_NAME" --json commits --jq '.[0].commits[].oid' 2>/dev/null)
PR_COMMIT_COUNT=$(echo "$PR_SHAS" | grep -c . 2>/dev/null || echo 0)

# If PR exists and commit counts differ, there are hitchhiker commits
if [ -n "$PR_SHAS" ] && [ "$PR_COMMIT_COUNT" -lt "$ALL_COMMITS" ]; then
  echo "HITCHHIKER COMMITS DETECTED: $ALL_COMMITS on branch, $PR_COMMIT_COUNT in PR"
  # Scope analysis to ONLY files touched by PR commits
  PR_FILES=$(for sha in $PR_SHAS; do git diff-tree --no-commit-id --name-only -r "$sha"; done | sort -u)
  # Use TARGET_BRANCH..HEAD scoped to PR files only (simulates post-rebase diff)
  # This is now your authoritative diff — NOT merge-base..HEAD
fi

When hitchhiker commits are detected, use git diff "$TARGET_BRANCH..HEAD" -- <PR_FILES> as the committed diff base instead of the full BASE_REF..HEAD diff. State this scoping in the output.
When NO hitchhiker commits are detected (or no GitHub PR exists), use BASE_REF as the committed diff base for all committed churn commands.
Pull full file-level and hunk-level data for both committed branch diff and local working tree changes.
Use commands that preserve both totals and detail:

# Committed branch range — use scoped diff if hitchhikers detected
git diff --numstat "$BASE_REF..$HEAD_REF" [-- <PR_FILES if scoped>]
git diff --name-status "$BASE_REF..$HEAD_REF" [-- <PR_FILES if scoped>]
git diff "$BASE_REF..$HEAD_REF" [-- <PR_FILES if scoped>]

# Local changes (staged + unstaged)
git diff --numstat --cached
git diff --name-status --cached
git diff --cached
git diff --numstat
git diff --name-status
git diff

# Untracked files
git ls-files --others --exclude-standard

Build one combined churn dataset:
- committed (scoped to PR-only files if hitchhikers detected, otherwise full BASE_REF..HEAD_REF)
- staged (--cached)
- unstaged (working tree)
- untracked file additions (treat as +N/-0, where N is line count)
Reconcile totals across this full combined scope.

Infer intent and implementation signals from the diff.

Map each changed area to probable intent using:
Domain language in identifiers and comments.
Implementation shape (algorithms, data flow, control-flow changes, memory/perf tactics).
Data model/schema changes.
Test additions/updates that reveal expected behavior.
Migration or config updates that indicate rollout/infrastructure intent.
Separate mechanical churn (renames, formatting, generated files) from behavioral churn.

Build candidate feature clusters.

Start from behavior themes, then attach files/hunks.
Split a single file across multiple features when hunks represent different intents.
Merge clusters only when they ship one coherent behavior/outcome.
Keep explicitly cross-cutting work separate (refactor, plumbing, rename, test infra) unless tightly bound to one feature.

Quantify churn per cluster.

Compute per-feature churn from combined --numstat totals; include additions, deletions, and combined total.
If a file is split across features, estimate hunk-level allocation and state this assumption.
Keep totals internally consistent:
Sum(feature totals) + sum(cross-cutting/unassigned) = total combined churn (committed + local).

Explain motivation, implementation, and impact for each major feature.

Motivation: Why this component exists (problem pressure, product intent, technical constraint).
Implementation: How the change works technically (approach, major mechanisms, notable tradeoffs).
Impact: What behavior, architecture, operations, or user flow changes because of it.
Cite concrete evidence from the diff, not speculation.

Declare boundaries and dependencies.

Define what is in scope vs out of scope for each feature cluster.
Identify dependencies between features and sequencing risk.
Mention files/hunks or interfaces only when needed to clarify implementation details.

Persist output.

Write the full final analysis to CARTOGRAPH.MD in the repository root.
The file content must match the final reported analysis exactly (same sections and totals).
Overwrite existing CARTOGRAPH.MD if present.

Grouping Rules

Group by intent over file location.
Prefer fewer major components (typically 2-7) unless the PR is genuinely broad.
Avoid collapsing unrelated refactors into feature work.
Keep a dedicated bucket for "supporting/enablement" changes when they do not deliver user-facing behavior directly.
Mark uncertain mappings explicitly instead of overconfident labeling.

Output Contract

Always return the following structure:

PR Summary

One paragraph explaining the high-level intent and why this PR exists.

Churn Totals

Total additions, Total deletions, Total changed lines for combined scope.
Optional: Files changed count.
Scope: explicitly state whether totals include uncommitted local changes.

Major Feature Components

For each component, include:
Component name
Intent boundary: what belongs here and what does not.
Line count: additions, deletions, total.
Motivation: why this component was needed.
Implementation: how the component is built (approach/mechanisms/tradeoffs).
Impact: behavior/system/user-facing effect.
Optional: concise Evidence (representative files/hunks) if needed to support claims.
Optional: Key interfaces touched only when interface changes are material to understanding impact.

Cross-Cutting or Non-Feature Churn

Refactors, renames, formatting, generated files, test harness work, or infra-only edits.
Include line counts and rationale for why this is not a major feature.

Dependency Map

Short list of component-to-component dependencies and coupling risk.

Ambiguities and Assumptions

Explicitly list uncertain mappings and any line-count allocation assumptions.

Verification Check

Provide reconciliation line:
Component totals + cross-cutting + unassigned = combined total and whether it balances.

Artifact

Confirm CARTOGRAPH.MD was written.

Quality Bar

Use evidence from actual diff content, not path-name guesswork.
Explain motivation, implementation, and impact for every major component; never omit any.
Include line counts for every component and every non-feature bucket.
Preserve uncertainty explicitly when intent is mixed or unclear.
Keep feature boundaries decision-useful: another engineer should understand why it was built, how it works, and what it changes.
De-emphasize exhaustive file/interface inventories; include them only as supporting evidence.

Response Template

## PR Summary
...

## Churn Totals
- Additions: ...
- Deletions: ...
- Changed lines: ...
- Files changed: ...
- Scope: committed diff only | committed diff + uncommitted local changes

## Major Feature Components
### 1) <Component Name>
- Intent boundary: ...
- Line count: +... / -... (total ...)
- Motivation: ...
- Implementation: ...
- Impact: ...
- Evidence (optional): ...
- Key interfaces touched (optional): ...

### 2) <Component Name>
...

## Cross-Cutting or Non-Feature Churn
- <Bucket>: +... / -... (total ...)
- Why non-feature: ...

## Dependency Map
- <Component A> -> <Component B>: ...

## Ambiguities and Assumptions
- ...

## Verification Check
- Component totals + cross-cutting + unassigned = combined total
- Status: balanced | not balanced

## Artifact
- Wrote `CARTOGRAPH.MD`: yes | no