evaluate-against-architecture-principles

name: evaluate-against-architecture-principles description: Use when user references architecture principles, at start of fresh conversation with design work, before creating any requirements/design/implementation documents, or when reviewing for compliance - grounds problem framing and solution making in selected architecture principle set argument-hint: effect-js | ai-coding | object-oriented

Evaluate Against Architecture Principles

%% Last Modified: 04/10/26 10:08:21 %%

Overview

%% Last Modified: 04/10/26 10:08:21 %%

Grounding in architecture principles during creation prevents violations before they happen. Systematic evaluation during review catches what slipped through.

This skill serves two purposes:

Before creating: Read principles to ground problem framing and solution design
When reviewing: Systematically evaluate documents against all principle categories for compliance verification

Both prevent technical debt — grounding prevents violations during creation, evaluation catches them during review.

Principle Set Resolution

%% Last Modified: 04/10/26 10:08:21 %%

The skill evaluates against one of several principle sets. Which set to use is resolved in this order:

Step 1: Check `$ARGUMENTS[0]`

%% Last Modified: 04/10/26 10:08:21 %%

If the caller provided a principle set argument ($0), use it directly. No inference needed.

Example invocation: /evaluate-against-architecture-principles effect-js

Step 2: Infer from files being evaluated (when `$0` is absent)

%% Last Modified: 04/10/26 10:44:57 %%

Examine the files passed in the prompt or plan. The file types determine which principle set applies:

File signature	Principle set	Rationale
`.ts`/`.js` files with Effect imports (`import { Effect`, `import { Schema`, `@effect/`)	`effect-js`	Functional programming principles govern Effect code
`SKILL.md`, agent configs (`agents/*.md`), hook scripts, rules files	`ai-coding`	Agentic workflow principles govern AI harness artifacts
`.ts`/`.js`/`.py` without Effect imports, deterministic scripts, utilities	`object-oriented`	OO/structural principles govern standard code

Detection order:

Check file names/paths first — SKILL.md, agents/, hooks/ → ai-coding immediately
For code files, scan imports — Effect imports → effect-js
Everything else → object-oriented

Multiple files with mixed types: When the prompt includes files spanning different principle sets (e.g., a SKILL.md and an Effect service), group files by detected principle set and run a separate evaluation for each group. Each group gets its own sub-agent dispatch with the matching principle set. Report results per group.

Single file or uniform types: All files map to one principle set — proceed with that set.

Step 3: Halt-and-confirm gate (D12)

%% Last Modified: 04/10/26 10:45:42 %%

When `$0` is absent and inference fires, you MUST halt for user confirmation before proceeding.

Emit this message and STOP:

Detected file types: <list of files with detected types>. Using <set> principles (inferred from file type). Override: specify set as argument. Proceed? [Y/override]

For mixed file types, list each group:

Detected mixed file types:

Effect code (Service.ts, Layer.ts) → effect-js

AI harness (SKILL.md) → ai-coding Will run separate evaluations per group. Override or proceed? [Y/override]

Do NOT dispatch the evaluation sub-agent until the user confirms or provides an override.

For multiple files with mixed types, the halt message shows the full grouping:

Detected mixed file types. Proposed evaluation groups:

effect-js: PiiHashService.ts (Effect imports), AnonymizePipeline.ts (Effect imports)

ai-coding: SKILL.md (skill file) Override any group? Proceed? [Y/override]

This is a Type 1 decision — wrong principle set means a wasted evaluation cycle that produces misleading results. One-key approval keeps it low-friction. The gate only fires on inference, never when $0 is explicit.

When to Use

%% Last Modified: 04/10/26 10:08:21 %%

Use BEFORE creating to ground thinking in principles:

User explicitly references principles — "using our architecture principles", "following the principles"
Fresh conversation with design work — Empty context and user requests design, requirements, or architecture work
Any design-type work — Starting requirements documents, framing problems, planning implementation, brainstorming architecture

Use when reviewing for compliance verification:

Completed requirements documents (PRDs, user stories)
Finished design documents
Written implementation plans
Code review requests for architectural patterns

Workflows

%% Last Modified: 04/10/26 10:08:21 %%

Workflow: Before Creating

%% Last Modified: 04/29/26 22:25:24 %%

When starting to create a new document:

Resolve principle set — Follow Principle Set Resolution above to determine $0
Locate principles files — The principle set is composed of TWO files:
- Core (paradigm-agnostic, design-phase): architecture-principles/core/DESIGN-PRINCIPLES.md — required for ALL principle sets
- Paradigm-specific (impl-phase): Run Glob architecture-principles/$0/*.md. If Glob returns empty (symlink directories are not traversed by Glob), fall back to Bash: find architecture-principles/$0/ -name "*.md" -type f. Use first result.
Load headings — Run jact ast <core-path> | jq '.headings' AND jact ast <paradigm-path> | jq '.headings' to see all category structure (union of both files)
Read principles — Run jact extract file <core-path> AND jact extract file <paradigm-path> to load full content of both
Create grounded — Write with both files' principles actively in mind. The core file governs design-phase decisions; the paradigm file specializes for implementation-phase choices.

Purpose: Ground thinking in principles BEFORE structure locks in. Violations that become embedded assumptions cost hours to unwind.

Why two files: Paradigm-agnostic principles (Modular Design, Data-First, MVP, etc.) live once in core/DESIGN-PRINCIPLES.md and compose with paradigm-specific deltas. This eliminates duplication across paradigms and separates design-phase concerns from implementation-phase idioms. Loading explicitly (rather than relying on link-following hooks) makes the dependency visible and robust against hook drift.

Workflow: When Reviewing

%% Last Modified: 04/29/26 22:25:24 %%

When evaluating an existing document:

Resolve principle set — Follow Principle Set Resolution above to determine $0
Locate principles files — Two files (see Workflow: Before Creating step 2):
- Core: architecture-principles/core/DESIGN-PRINCIPLES.md
- Paradigm: architecture-principles/$0/*.md via Glob (or find fallback)
Discover categories — Run jact ast <core-path> | jq '.headings' AND jact ast <paradigm-path> | jq '.headings' to extract the union of ## headings across both files (Phase 1: headings only, full files come later)
Read the document — Read the file being evaluated (the PostToolUse:Read hook auto-extracts linked content)
Dispatch evaluation sub-agent — Follow Sub-Agent Evaluation Dispatch below. The sub-agent receives BOTH principle file paths.
Integrate results — Link the sub-agent's eval report in your synthesis output

Purpose: Catch violations systematically before they reach production.

Graduated Loading (Token Economy)

%% Last Modified: 04/10/26 10:08:21 %%

Load principles in two phases to avoid wasting context on content you may not need:

Phase 1 — Headings only (category discovery):

jact ast <principles-path> | jq '.headings'

Extract ## headings. Each heading becomes a checklist category. Skip structural headings (e.g., "Act I — Foundations") that have no bullet-point principles beneath them.

Phase 2 — Full content (evaluation time):

jact extract file <principles-path>

Load the complete principles file only when you need principle details to evaluate against. The sub-agent loads this, not the primary context.

Why two phases: The heading list tells you what categories exist (5-15 depending on the set). The full file can be 200+ lines. Loading everything upfront wastes tokens when you only need the category list for planning and dispatch.

Evaluation Checklist

%% Last Modified: 04/10/26 10:08:21 %%

MANDATORY: Use TodoWrite to create todos for EACH discovered category.

TodoWrite ensures:

You don't skip categories under pressure
User can track evaluation progress
No category gets "mentally checked" but not actually evaluated

Phase 1: Extract Context

%% Last Modified: 04/10/26 10:08:21 %%

Resolve principle set and locate principles file
Run jact ast <principles-path> | jq '.headings' to discover ## categories
Read the document being evaluated (Read tool; hook auto-extracts linked content)
Identify document type (Requirements/Design/Implementation)

Phase 2: Dynamic Category Evaluation

%% Last Modified: 04/10/26 10:08:21 %%

For each ## heading discovered in Phase 1, create a todo item. Mark each as:

"✅ Compliant" — document satisfies this category's principles
"❌ Violates" — document contradicts or ignores principles in this category
"➖ Not mentioned" — document is silent on this category

IMPORTANT: "➖ Not mentioned" means potential gap, not "compliant by default."

If the document lacks detail on a principle, that's a finding
Recommend adding detail in your recommendations section
Silence is not compliance

Phase 3: Structured Output

%% Last Modified: 04/10/26 10:08:21 %%

Provide evaluation in this format. Categories come from Phase 1 discovery — never hardcode them:

## Architecture Evaluation: <Document Name>

*Last Modified: 04/10/26 10:08:21*

### Citation Context

*Last Modified: 04/10/26 10:08:21*

- Extracted X linked documents
- [List key linked files]

### Principle Compliance

*Last Modified: 04/10/26 10:08:21*

| Category | Status | Details |
|----------|--------|---------|
| <Category from principles> | ✅/❌/➖ | Specific finding with line references |
| <Category from principles> | ✅/❌/➖ | Specific finding with line references |
| ... | ... | ... |

### Critical Issues (Severity: High)

*Last Modified: 04/10/26 10:08:21*

1. [Issue with specific principle citation and line numbers]
2. [Issue with specific principle citation and line numbers]

### Recommendations

*Last Modified: 04/10/26 10:08:21*

1. [Specific change with principle reference]
2. [Specific change with principle reference]

### Verdict

*Last Modified: 04/10/26 10:08:21*

- [ ] Ready to proceed
- [X] Requires revision

Phase 4: Findings Prioritization (Cost-Based, Not Scope-Based)

%% Last Modified: 04/17/26 06:14:38 %%

MANDATORY when evaluating design documents. Optional but recommended for other types.

Classification rule: Classify findings by actual implementation cost, never by perceived scope disruption.

After producing the structured evaluation output, filter findings:

Cross-reference against the document's own risk assessment — already-mitigated findings get lower priority
Estimate fix cost in minutes — "How long does this fix actually take?"
Apply the 30-minute rule:
- Fix Now (default bucket): Any finding that can be fixed in < 30 minutes of implementation. This includes: adding a column, writing BECAUSE clauses, renaming a file, adding a sentence, annotating a schema field, codifying a named anti-pattern.
- Architectural Rework Required (exception bucket): ONLY findings that require all three of: (a) ≥30 minutes of implementation, (b) a named rework scope (new subsystem, schema migration, multi-file restructure), and (c) an explicit trigger condition for revisiting.

**NEVER place trivial fixes in "Architectural Rework Required" to avoid scope expansion.** BECAUSE tech debt accepted in evaluation reports becomes stale context for future agents — every "post-MVP" item that gets deferred pollutes the LLM's training signal when it reads the doc later. The purpose of evaluation is to improve the baseline, not to generate a deferral list.

Categorize and present filtered findings:

### Prioritized Findings

#### Fix Now

1. [Finding with principle citation] — [Cost estimate in minutes] — [What to change]

#### Architectural Rework Required

1. [Finding] — [Cost: ≥30 min] — [Named rework scope] — [Trigger condition to revisit]

#### Already Mitigated

1. [Finding] — Addressed by [risk mitigation from document]

Self-check before emitting "Architectural Rework Required":

Is the cost genuinely ≥30 minutes of implementation? (Not "feels like scope creep")
Have I named the rework scope concretely? (Not "general refactor")
Have I named the trigger that will bring it back? (Not "eventually")

If any checkbox fails → finding belongs in Fix Now.

Lead with "Fix Now" items. "Architectural Rework Required" items must include the three-item justification inline. Note "Already Mitigated" as confirmation, not action items.

Named anti-pattern this phase guards against: post-mvp-deferral-for-trivial-fixes — deferring sub-30-min fixes creates tech debt that pollutes future agent context.

Document Hygiene Cross-Cutting Checks

%% Last Modified: 04/17/26 06:43:11 %%

MANDATORY for any markdown design/requirements/eval document review, regardless of principle set. These checks run in addition to the principle-set evaluation — they catch Obsidian-rendering and citation-integrity defects that principle sets do not cover.

The sub-agent MUST emit a Document Hygiene subsection in the eval report listing each check's status.

H1 — Evidence-tag references use block-anchor links

%% Last Modified: 04/17/26 06:43:11 %%

Every in-doc reference to an evidence tag ID (OBS-NNN, H-NNN, A-NNN, C-NNN, D-NNN, Q-NNN, O-NNN, G-NNN, F-ID) defined elsewhere in the same document MUST be rendered as a markdown link to a block anchor:

Definition site: trailing ^ID anchor on the defining line
```
[C-001: local-first testing only] [^S-C1] ^C-001
```
Reference site: escaped-bracket link to the anchor
```
OUT OF SCOPE [\[C-001\]](#^C-001)
```

Detection: Grep the doc for [A-Z]+-\d{3} (or OBS-\d+, C-\d+, etc.) outside their definition lines. Any bare occurrence not wrapped in [\[ID\]](#^ID) is a violation.

Fix: Add ^ID to the defining line; rewrite reference as [\[ID\]](#^ID). For removed/superseded IDs, strikethrough (~~C-005~~) — never link to a missing anchor.

Rationale: Obsidian renders block-anchor links as hover previews. Bare references force the reader (and future LLM sessions) to scroll/grep to verify the citation, and citation drift goes undetected until full re-read.

Linked learning: in-doc-references-to-evidence-tags-need-block-anchor-links-for-obsidian-preview

H2 — Acronym table cells use header-anchor links

%% Last Modified: 04/17/26 06:43:11 %%

Any table cell containing a defined-elsewhere acronym (NS1, BI, JTBD) MUST link to the percent-encoded header anchor ([NS1](#NS1%20—%20...)).

Linked learning: markdown-table-acronym-columns-use-internal-links-for-obsidian-preview

H3 — File references use markdown link syntax

%% Last Modified: 04/17/26 06:43:11 %%

Every file path mentioned in prose or evidence content MUST be a clickable markdown link, not a bare backtick path.

✅ [\filename.md`](/absolute/path/to/filename.md)`
❌ `/absolute/path/to/filename.md`

Linked learning: use-markdown-links-not-bare-paths-for-file-references-in-documents

Hygiene output format

%% Last Modified: 04/17/26 06:43:11 %%

Append to the sub-agent's eval report:

### Document Hygiene

| Check | Status | Violations |
|-------|--------|------------|
| H1 — Evidence-tag references use block-anchor links | ✅/❌ | <line refs + offending IDs, or "none"> |
| H2 — Acronym table cells use header-anchor links | ✅/❌/➖ | <line refs, or "none"/"no acronym tables"> |
| H3 — File references use markdown link syntax | ✅/❌ | <line refs, or "none"> |

Hygiene violations are always Fix Now (sub-30-min mechanical edits) — they MUST be reclassified into the Phase 4 "Fix Now" bucket.

Sub-Agent Evaluation Dispatch

%% Last Modified: 04/10/26 10:08:21 %%

Evaluation runs in a sub-agent that writes a persistent report file. Primary context stays lean.

Why a sub-agent

%% Last Modified: 04/10/26 10:08:21 %%

Evaluating every category from a principle set (5-15 categories) generates substantial output. Keeping that inline bloats primary context and crowds out synthesis thinking. The sub-agent writes a structured report to disk; primary context links it.

Dispatch instructions

%% Last Modified: 04/17/26 06:44:22 %%

Launch an Agent with this structure:

Role identity: Give the sub-agent a specific reviewer title drawn from the principle set:
- effect-js → "Architecture compliance reviewer — functional programming systems"
- ai-coding → "Architecture compliance reviewer — agentic workflow systems"
- object-oriented → "Architecture compliance reviewer — object-oriented systems"
Domain vocabulary block: Extract 15-30 key terms from the principle set's ## headings and include them in the sub-agent prompt. This vocabulary routes the model's attention to the right domain. A sub-agent without role identity and vocabulary produces generalist behavior, not specialized review.
Sub-agent prompt must include:
- The principles file path (for jact extract file)
- The document path being evaluated
- The discovered category list from Phase 1
- The structured output format from Phase 3
- The MVP prioritization instructions from Phase 4
- The Document Hygiene Cross-Cutting Checks (H1, H2, H3) — sub-agent MUST grep for evidence-tag references, acronym table cells, and bare file paths, and emit the Hygiene table in the report
- Instruction to write the complete eval report to a .md file
Output path: If $ARGUMENTS includes an output path (second argument), the sub-agent writes the eval report there. Otherwise, write it alongside the evaluated file with suffix -eval-report.md.
Evaluation Metadata section: The sub-agent MUST include this section at the top of every eval report, before the compliance table. This captures the process decisions so they're verifiable from the output file alone:

## Evaluation Metadata

*Last Modified: 04/10/26 10:46:19*

- **Reviewer Role**: <role identity from step 1, e.g., "Architecture compliance reviewer — functional programming systems">
- **Domain Vocabulary**: <comma-separated list of 15-30 terms extracted from principle headings>
- **Principle Set**: <which set was used, e.g., `effect-js`>
- **Principle Set Resolution**: <how the set was determined — explicit `$0`, or inferred from file type with detection details>
- **Graduated Loading**: Phase 1 headings discovered: <count> categories via `jact ast`. Phase 2 full extract: <line count> lines loaded for evaluation.
- **Output Path**: <where the eval report was written>
- **Files Evaluated**: <list of files that were evaluated>
- **TodoWrite Categories**: <list of category todos created>

This metadata is not optional — it's the only way to verify that graduated loading, role identity, domain vocabulary, and TodoWrite were actually used. Without it, those behaviors are invisible in the output.

After sub-agent completes

%% Last Modified: 04/10/26 10:08:21 %%

Primary context receives the eval report file path. Integrate it:

[Evaluation Report](<path-to-eval-report>)

Read the report file to extract key findings for synthesis. Do not reproduce the full report inline — link it.

Document Type Guidance

%% Last Modified: 04/10/26 10:08:21 %%

Focus on categories most relevant to your document type. Use your judgment based on the principle set's category names.

Requirements documents: Emphasize scope, data model, and safety categories
Design documents: Emphasize modularity, data structures, interfaces, and file organization categories
Implementation plans: Emphasize file organization, naming, and safety categories

Still evaluate all discovered categories. Mark "➖ Not mentioned" for categories that genuinely don't apply.

Common Rationalizations

%% Last Modified: 04/10/26 10:08:21 %%

Excuse	Reality
"User said 'principles' but meant generally"	User mentioned principles = trigger to read the principles file. No exceptions. BECAUSE the word "principles" is the strongest signal of intent, and ignoring it means the agent substitutes its own judgment for the user's explicit request.
"Fresh conversation, I can start directly"	Empty context = read principles first to ground thinking. BECAUSE without principles loaded, the agent generates from training priors which drift from project-specific standards over a conversation.
"User wants quick design, no time for principles"	Fresh conversation with design work = mandatory principles read. BECAUSE 5 minutes loading principles prevents days of rework when violations are discovered late. The cost asymmetry is 100:1.
"I'll check principles during review"	Grounding happens during creation, not after. BECAUSE first drafts lock in structure, and violations become embedded assumptions that cost hours to unwind. Review catches symptoms; grounding prevents root causes.
"I already know the principles"	Reading refreshes all categories and prevents blind spots under pressure. BECAUSE LLM attention degrades across long conversations, and categories not recently loaded get dropped from active consideration.
"Just need to start writing quickly"	5 minutes reading principles prevents days of rework. Start grounded. BECAUSE the first structural decisions in a document constrain all downstream choices. Wrong framing propagates.
"Will evaluate after first draft"	First draft locks in structure. BECAUSE reviewers and downstream consumers treat draft structure as load-bearing, and refactoring structure is 10x more expensive than getting it right initially.
"Document labeled 'MVP' = only check MVP category"	"MVP" label is a WARNING SIGN to check scope violations, not an excuse to skip categories. BECAUSE MVP documents still define data models, interfaces, and naming patterns that outlive the MVP phase.
"Not enough detail to evaluate X"	Mark "➖ Not mentioned" and recommend adding detail. BECAUSE silence on a principle category is a finding, not a pass. Missing detail becomes implicit decisions made by implementers without guidance.
"Time pressure, focus on big issues"	Systematic check takes 5-10 minutes and prevents weeks of rework. BECAUSE "big issues" are only identifiable after checking all categories. The issues you skip are the ones that surprise you in production.
"This principle doesn't apply"	Check anyway. Explain why it doesn't apply in output. BECAUSE documenting non-applicability proves you considered it, and future readers need that reasoning when the context changes.
"Already followed best practices"	Best practices = these principles. Check systematically. BECAUSE "best practices" without a specific checklist means "whatever I remember right now," which varies by fatigue and context length.
"I already read the file, no need for linked context"	The Read hook extracts linked content automatically — review it. BECAUSE linked documents contain constraints and decisions that change interpretation of the primary document. Ignoring them produces evaluations against incomplete context.

Red Flags — STOP and Re-evaluate

%% Last Modified: 04/10/26 10:08:21 %%

You're rationalizing if you think:

"I'll just check the relevant principles"
"This doc is too simple for full evaluation"
"I don't need to review the linked context"
"Some categories obviously don't apply"

All of these mean: Follow the checklist systematically.

Real-World Impact

%% Last Modified: 04/10/26 10:08:21 %%

Without systematic evaluation:

Scope creep (many features labeled "minimal")
Scattered validation logic requiring refactor
Component god objects violating single responsibility
Missing safety checks causing data loss

With systematic evaluation:

Violations caught in design phase (2-hour fix vs. 2-week refactor in production)
Clear principle citations enable quick fixes
Complete coverage prevents "obvious in hindsight" issues