name: architecture-review description: Perform a thorough architecture review of a .NET project examining structure, maintainability, clarity, robustness, and goal achievement. Use when the user asks to review, audit, or examine a codebase. disable-model-invocation: true user-invocable: true argument-hint: [project-path]
Architecture Review
Perform a comprehensive architecture review of the project at $ARGUMENTS (or the current working directory if no argument provided).
Review Dimensions
Evaluate the codebase across these five dimensions:
- Structure - Project organization, layer separation, dependency direction, naming conventions
- Maintainability - DRY, consistency, ease of modification, test coverage, convention enforcement
- Clarity - Readability, documentation, dead code, misleading patterns, surprising behavior
- Robustness - Error handling, validation, edge cases, security, data integrity
- Goal Achievement - Does the project accomplish what it sets out to do? Is it fit for purpose?
Approach
Phase 1: Parallel Exploration
Launch up to 5 Explore agents in parallel, each focused on a different area:
- Project Structure agent: Solution file, all .csproj files, Directory.Build.props, Program.cs, DI wiring, configuration files
- Domain Layer agent: All entities, value objects, enums, domain events, base classes, validation patterns
- Application Layer agent: Commands, queries, handlers, validators, DTOs, mapping, pipeline behaviors, error handling
- Infrastructure Layer agent: DbContext, entity configurations, repository implementations, migrations, external integrations
- API & Tests agent: Endpoints/controllers, middleware, filters, ALL test files (unit, integration, convention), test infrastructure
Each agent should READ EVERY FILE in its area — do not skim.
Phase 2: Cross-Cutting Analysis
After exploration, analyze the codebase for issues across all dimensions. Focus on:
- Exception → HTTP status mapping: Trace every exception type thrown by handlers through the middleware/global handler to verify correct HTTP status codes
- Convention test coverage: Do the convention tests actually catch what they claim to? Check for gaps in assembly scanning scope
- Validation consistency: Compare domain guards vs application validators — look for drift or missing coverage
- Data access patterns: Verify CQRS boundaries, check for accidental cross-reads (EF in queries, Dapper in commands)
- Dead code: Unused attributes, unreachable branches, ghost database columns, no-op methods
- Logging consistency: Duplicate logging across layers, missing logs for important paths
- Security: Injection vectors, authentication gaps, sensitive data exposure
Phase 3: Findings Document
Produce findings using this template for each issue:
### N. [SHORT TITLE]
**Severity: High|Medium|Low** | Files: `file.cs:line`, `other.cs:line`
[1-3 sentence description of the problem and its impact]
**Fix**: [Concrete, actionable fix — not vague advice]
Severity definitions:
- High: Bugs, incorrect behavior, security issues, data integrity risks
- Medium: Inconsistencies, test gaps, maintainability hazards that will cause issues as the project grows
- Low: Clarity improvements, polish, minor optimizations
Phase 4: Summary
End with:
- A summary table:
| # | Finding | Severity | Category | - An overall assessment (2-3 sentences)
- A recommended fix order (highest-impact first)
- Verification steps (build, test, manual checks)
Comprehensive Multi-Agent Mode (for "thorough"/"comprehensive"/"audit everything" requests)
When the user wants an exhaustive audit (and especially when multi-agent orchestration is opted in), scale Phase 1–2 into a find → dedup → adversarially verify workflow via the Workflow tool instead of a single pass. This codebase is mature and hardened, with a long history of dismissing plausible "HIGH/CRITICAL" findings as false positives (the current self-assessed score lives in docs/ARCHITECTURE_REVIEW.md — never trust a number embedded in skill text), so the dominant failure mode is plausible-but-wrong findings, not missed bugs. Structure the workflow to refute, not to accumulate:
- Find — fan out one finder per subsystem. The eight that map to this repo: security/auth (gateway assertion, scopes, MFA, owner-only policy), CQRS/domain, data-access/persistence, eventing/outbox/Service Bus, payload-capture/PII, build/CI/reproducibility, concurrency/correctness, and convention-test rigor. Tell each finder the bar is high, to ground every finding in
file:linewith quoted evidence, and that zero findings is an acceptable answer. - Dedup — merge by file + normalized title in plain code (a barrier is correct here).
- Verify — for each candidate, spawn 3 independent skeptics with different lenses (code-truth: re-read the path and its callers; defense-in-depth: is it already covered by a guard/validator/constraint/convention; exploitability: is it actually reachable). Each is instructed to refute; keep a finding only on a 2-of-3 majority.
Prior dismissed false positives that recur — do not re-raise without new evidence: the cancel-path "stock double-restore" (blocked by the state machine), Money.Subtract negative escape (routes through Create()), the AppHost packages.lock.json exemption (intentional — Aspire host-RID packages), and "convention tests assert presence not behaviour" (most have already been hardened to IL/SQL behaviour checks).
Verifying Fixes (do this before marking any finding resolved)
A convention/regression test that always passes is worthless. For every fix, prove the test fails on the regression it guards:
- Inject the exact regression the test should catch (e.g. change
Guid.CreateVersion7()→Guid.NewGuid(), or point a dependency check at a type nothing injects). - Run the test — confirm it fails with the intended message.
- Revert the regression and confirm green.
Guard against vacuous passes: any test that filters a discovered set (handlers, aggregates, endpoints) must Assert.NotEmpty on that set, so a renamed suffix or moved assembly can't make it silently pass.
Stale-build caveat: restoring a mutated source file via mv/cp of a backup can give it an older mtime than the regression-compiled DLL, so MSBuild skips the rebuild and a test looks like it's still failing. After reverting, touch the source (or build with --no-incremental) before re-running. Don't mistake a stale-DLL failure for a real one.
Shared Artifact: docs/ARCHITECTURE_REVIEW.md
Read it before starting (it carries the open findings and current score; the dated history and dismissed false positives live in docs/reviews/) and update it after: mark findings resolved with the fix + the regression test added, record dismissed false positives so they aren't re-raised, and adjust the score conservatively. It is the sync point across concurrent agent sessions. Keep CLAUDE.md/AGENTS.md and .claude/skills↔.agents/skills in sync per the repo's drift rule.
Rules
- DO NOT penalize for absent features (outbox, distributed systems, event sourcing, etc.) unless the project claims to support them
- DO NOT recommend adding libraries/frameworks unless there's a concrete problem they solve
- DO recommend removing dead code and unnecessary complexity
- ALWAYS trace exception flows end-to-end — this is the #1 source of bugs in .NET APIs
- ALWAYS check that convention tests actually scan the assemblies they should
- Prefer fixing the root cause over adding workarounds (e.g., fix the global handler rather than adding try-catches everywhere)
Learnings from Past Reviews
These are real issues found in production .NET projects. Check for each of these specifically:
Exception Mapping Gaps
InvalidOperationExceptionis commonly thrown for business rule violations (e.g., "cannot delete entity with dependents", "insufficient stock") but often falls through to the default 500 handler. Map it to 409 Conflict.- Endpoints that catch exceptions locally create inconsistency — some endpoints catch, some don't. Fix the global handler and remove all endpoint-level try-catches.
Convention Test Blind Spots
- Convention tests that scan only one assembly (e.g.,
ApiAssembly) may miss violations in other assemblies (e.g.,DomainAssembly). Always verify the scan scope matches the intent. - Rename tests to reflect their actual scope (e.g.,
ApiTypes_MustNotResolveCurrentTimeViaDateTimenotTypes_MustNotResolve...).
Dead Code Patterns
- DataAnnotation attributes (
[Required],[StringLength],[Range]) on DTOs that use a customIValidator<T>framework — these are never evaluated but mislead readers. Remove them AND their correspondingTryValidateObject()unit tests. - Ghost database columns: when a migration drops some columns from a table, check if ALL dead columns were dropped. Dapper queries may silently reference leftover columns via fallbacks.
Duplicate Logging
- Endpoints and handlers both logging the same operation creates noise. Handlers are the authoritative layer — remove logging from endpoints. Exception: endpoint-level null-check logs (e.g., query returns null, endpoint returns 404) are fine since the handler doesn't log these.
Value Object Completeness
- Value objects that override
Equals(object)andGetHashCode()should also implementIEquatable<T>to avoid boxing in LINQ operations.
Test Impact of Fixes
- Changing exception-to-status-code mapping (e.g.,
InvalidOperationExceptionfrom 500 to 409) will break integration tests that assert on HTTP status codes. Search forHttpStatusCode.BadRequestandHttpStatusCode.InternalServerErrorin test files and update assertions. - Removing DataAnnotation attributes from DTOs will break any tests using
Validator.TryValidateObject(). These tests should be deleted, not fixed — they test dead validation.
Presence-vs-Behaviour Convention Gaps (recurring finding class)
- A convention test that asserts a marker/type/interface is present often does not prove the behaviour happens. Real gaps found this way: injecting
IOwnerOnlyPolicywithout invoking it; injectingICacheInvalidatorwithout calling it; declaring aGuidId without minting it viaGuid.CreateVersion7(); enforcing only the negative CQRS rule (command handlers must not useIDbConnection) without the positive one (they MUST depend onApplicationDbContext). - The fix pattern is IL/SQL scanning: reuse the existing
ContainsCallToMethod(il, module, name)helper to scan constructor/method IL (including async state machines) for the required call, and the SQL-literal regex checks for owner-scoped Dapper predicates. Prefer behaviour checks over name/source-text scans.
Verify Before Trusting the Score
- A high score with "no open findings" is a prompt to verify against code, not to trust prior claims. Each rerun should re-read the actual paths. Most candidate findings on a mature codebase are false positives — the value is in the adversarial verification that filters them, and in the small number of real presence-vs-behaviour gaps that survive.