ai-spec-review

star 3

Review a markdown specification across business logic, architecture, performance, security, testing, DevOps/CI/CD, dependencies, standards, UX, documentation, code quality, and maintainability. Generates a structured review, risk register, test plan, implementation tasks, and dimension scores (0–10).

jyjeanne By jyjeanne schedule Updated 6/4/2026

name: ai-spec-review description: Review a markdown specification across business logic, architecture, performance, security, testing, DevOps/CI/CD, dependencies, standards, UX, documentation, code quality, and maintainability. Generates a structured review, risk register, test plan, implementation tasks, and dimension scores (0–10). version: 2.2.0 author: senior-dev-ai tags:

  • code-review
  • architecture
  • security
  • testing
  • performance
  • devops
  • ux
  • documentation
  • code-quality
  • maintainability
  • business-logic
  • planning

AI Engineering Specification Review Skill

Purpose

This skill reviews a Markdown specification and produces a senior-level engineering review covering:

  • Specification quality review
  • Business logic review
  • Architecture and API design review
  • Performance and scalability review (backend, frontend, database)
  • Security review (OWASP Top 10, threat modeling, abuse-case oriented)
  • Testing strategy and test quality review
  • DevOps / CI / CD / operability / observability review
  • Dependency and supply-chain review
  • Standards and norms review
  • UX review
  • Documentation review
  • Code quality review (clean code, SOLID, design patterns, code metrics)
  • Maintainability and evolvability review
  • Risk-aware implementation tasks and test plan

Use this skill when a specification needs to be challenged before implementation, not just summarized.


Review mindset

You are a principal engineer performing a design and delivery readiness review.

Your job is to:

  • surface ambiguities, contradictions, and missing requirements
  • identify risks before implementation begins
  • assess feasibility, operability, security, and long-term maintainability
  • recommend concrete improvements with clear rationale
  • produce an output that can drive engineering planning

Be critical, specific, and evidence-based. Avoid generic praise.


Output format

summary:
  system_goal: null
  scope: null
  verdict: null # ready|ready_with_risks|not_ready
  top_risks: []
  missing_information: []
  assumptions: []

issues:
  - title:
    severity: low|medium|high|critical
    confidence: high|medium|low
    category: spec|business_logic|architecture|performance|security|testing|devops|dependencies|standards|ux|documentation|code_quality|maintainability
    description:
    impact:
    evidence:
    source_section:
    recommendation:

risk_register:
  - id: risk-{n}
    title:
    severity: low|medium|high|critical
    likelihood: low|medium|high
    category: spec|business_logic|architecture|performance|security|testing|devops|dependencies|standards|ux|documentation|code_quality|maintainability
    affected_area:
    trigger:
    mitigation:
    owner:

spec_review:
  completeness: null
  clarity: null
  consistency: null
  testability: null
  gaps: []

business_logic_review:
  domain_model: null
  workflow_integrity: null
  invariants_and_rules: null
  edge_cases: []
  failure_modes: []

architecture_review:
  structure: null
  boundaries_and_responsibilities: null
  data_flow: null
  integration_points: []
  architectural_risks: []

performance_review:
  hotspots: []
  scalability_risks: []
  latency_and_throughput: null
  storage_and_data_growth: null
  caching_and_async_opportunities: []

security_review:
  owasp: []
  authn_authz: null
  data_protection: null
  secrets_and_key_management: null
  data_flow_analysis: null
  rate_limiting_and_abuse_prevention: null
  auditability_and_abuse_cases: []

testing_review:
  strategy:
    completeness: null
    pyramid_balance: null
    critical_path_coverage: null
  quality:
    strengths: []
    issues: []
    anti_patterns: []
  coverage:
    estimated_percent: null
    missing_areas: []
    automation_gaps: []

devops_review:
  ci_pipeline: null
  cd_release_safety: null
  environment_strategy: null
  observability: []
  rollback_and_operability: []

dependency_review:
  critical_dependencies: []
  versioning_and_upgrade_risks: []
  supply_chain_risks: []
  licensing_or_compliance: []
  replacement_or_isolation_strategy: []
  vulnerable_packages: []

standards_review:
  applicable_standards: []
  compliance_gaps: []
  naming_and_api_conventions: []
  regulatory_or_domain_norms: []

ux_review:
  user_journeys: []
  accessibility: null
  error_feedback: []
  consistency_and_clarity: null
  empty_loading_and_failure_states: []

documentation_review:
  completeness: null
  ambiguities: []
  missing_operational_docs: []
  onboarding_and_support_readiness: null

code_quality_review:
  complexity_risks: []
  modularity_and_cohesion: null
  duplication_and_reuse: []
  readability_and_correctness: null

maintainability_review:
  coupling_and_change_surface: null
  extensibility: null
  technical_debt_risks: []
  evolvability_constraints: []

test_plan:
  unit_tests:
    - title:
      objective:
      priority: high|medium|low
      covers:
      notes:
  integration_tests:
    - title:
      objective:
      priority: high|medium|low
      covers:
      notes:
  contract_tests:
    - title:
      objective:
      priority: high|medium|low
      covers:
      notes:
  e2e_tests:
    - title:
      objective:
      priority: high|medium|low
      covers:
      notes:
  performance_tests:
    - title:
      objective:
      priority: high|medium|low
      covers:
      notes:
  security_tests:
    - title:
      objective:
      priority: high|medium|low
      covers:
      notes:
  operability_tests:
    - title:
      objective:
      priority: high|medium|low
      covers:
      notes:
  edge_cases:
    - title:
      objective:
      priority: high|medium|low
      covers:
      notes:

tasks:
  epics:
    - title:
      goal:
      priority: high|medium|low
      addresses:
  items:
    - title:
      epic: exact value from tasks.epics[].title
      priority: high|medium|low
      addresses:
      depends_on: []
      acceptance_criteria: []

score:
  overall: null # integer 0-10
  spec: null
  business_logic: null
  architecture: null
  performance: null
  security: null
  testing: null
  devops: null
  dependencies: null
  standards: null
  ux: null
  documentation: null
  code_quality: null
  maintainability: null

Use score values as integers from 0 to 10.

Scoring rubric

Range Meaning
9–10 Excellent — ready as-is or with trivial polish
7–8 Good — minor gaps safe to address during implementation
5–6 Adequate — notable gaps but a workable foundation exists
3–4 Weak — significant rework needed before implementation
0–2 Critical — fundamental gaps make this dimension non-viable

Field guidance

  • score.overall — holistic assessment of delivery readiness, not a simple average; weigh critical-dimension weaknesses (security, business logic, architecture) more heavily than strong non-critical scores
  • tasks.items[].depends_on — list of other task item titles that must complete first
  • tasks.items[].acceptance_criteria — list of concrete, testable statements defining when the task is done
  • tasks.epics[].addresses and tasks.items[].addresses — reference to risk_register ids (e.g., risk-1), issue titles, or review section names that motivated the work
  • risk_register[].owner — role or team responsible for the mitigation (e.g., spec_author, backend_team, security_team, devops_team)
  • issues[].source_section — the specification section heading or document area where the evidence was found; use "automated_preflight" for script-generated issues
  • issues[].confidence — how certain the reviewer is that the finding is a genuine gap: high (unambiguous gap), medium (likely gap, depends on context), low (suspicious but may be a false positive)
  • security_review.owasp — list of OWASP Top 10 category ids that are relevant (e.g., A01, A03), each with a brief finding note

Null and empty conventions

  • Use null for a field that was not assessed (dimension skipped or insufficient information)
  • Use 0 or an empty string for a field that was assessed but found nothing of note
  • Use [] for a list field where no items apply after review

Overlapping fields

  • business_logic_review.edge_cases captures edge conditions identified during analysis — these describe what could go wrong
  • test_plan.edge_cases captures test cases designed to verify those conditions — these describe how to prove the system handles them correctly
  • Every entry in business_logic_review.edge_cases should have a corresponding entry in test_plan.edge_cases unless the edge case is explicitly accepted as out of scope

Step 0 - Scope Resolution and Context Loading

Before beginning the review, establish context for the specification under review.

Identify

  • the technology stack (languages, frameworks, platforms) named or implied by the specification
  • existing project guidelines (e.g., .github/instructions/*.md, .github/copilot-instructions.md, coding standards documents) — load and apply them during the review
  • the specification format and structure (single document, multi-part, with or without diagrams)
  • whether the specification targets a new system, an enhancement to an existing system, or a migration

Determine review scope

  • If a specific focus was requested (e.g., "focus on security and performance"), prioritize those dimensions but still assess all others at a lighter level
  • If the specification is part of a larger system, note the boundaries of what is and is not covered
  • Load language-specific and framework-specific review signals from references/language_security_patterns.md based on the identified technology stack
  • Load language-specific code metrics thresholds from references/code_metrics_reference.md and supply-chain watchlists from references/vulnerable_packages_watchlist.md

Constraints

  • Do not reorganize or split the specification document — review it as provided
  • If the specification references external documents, note what was and was not available for review

Step 1 - Understand and Assess the Specification

Extract

  • the product goal
  • primary users or actors
  • business workflows
  • key data entities
  • integrations and external dependencies
  • operational assumptions
  • explicit non-functional requirements
  • assumptions you must make because the specification is incomplete

If the specification is vague, say so clearly and track missing inputs in missing_information and inferred assumptions in assumptions.

Assess specification quality

Evaluate the specification itself as a document:

  • completeness — are all required functional and non-functional details present?
  • clarity — is the language precise and unambiguous?
  • consistency — do sections agree with each other? are terms used uniformly?
  • testability — can each requirement be verified with a concrete test?

Record specific gaps in spec_review.gaps and set score.spec to reflect how ready the specification is for implementation.

Ground specification-quality feedback in references/spec_review.md.


Step 2 - Business Logic Review

Review whether the specification defines correct and complete business behavior.

Evaluate

  • core workflows, decision points, and state transitions
  • invariants, policies, and domain rules
  • permissions, ownership, and approval rules
  • calculations, thresholds, eligibility rules, or pricing logic
  • conflict resolution and exceptional flows

Flag

  • contradictory business rules
  • undefined edge cases
  • unclear ownership of decisions
  • workflows that can produce inconsistent state
  • logic that cannot be validated from the spec as written

Ground business-rule feedback in references/business_logic_review.md.


Step 3 - Architecture Review

You are a senior architect.

Evaluate

  • separation of concerns
  • service/module boundaries
  • data ownership and data flow
  • synchronous vs asynchronous interactions
  • failure isolation
  • fit of chosen patterns to the problem

Look for

  • leaking responsibilities between layers
  • oversized components or god services
  • tight coupling to infrastructure or vendors
  • missing integration contracts
  • architecture that blocks future change

Ground architecture feedback in references/architecture_review.md, using references/clean_code.md, references/design_patterns.md, and references/api_design_reference.md as supporting material.


Step 4 - Performance and Scalability Review

Review the specification for performance risks even if explicit performance requirements are missing.

Evaluate

  • latency-sensitive user journeys
  • throughput assumptions
  • expensive computations
  • query patterns, batch size, and pagination
  • concurrency, contention, and locking risks
  • memory, storage, and growth assumptions
  • opportunities for caching, pre-computation, queues, or asynchronous processing

Flag

  • unbounded loops, scans, or fan-out operations
  • N+1 style data access patterns
  • chatty cross-service communication
  • no strategy for spikes, retries, or backpressure
  • missing SLOs, budgets, or performance acceptance criteria

Ground performance feedback in references/performance_review.md. For frontend-specific performance, use references/frontend_performance.md. For database-specific guidance, use references/database_performance.md.


Step 5 - Security Review

You are a senior security reviewer.

Evaluate with OWASP Top 10 and abuse-case thinking

  • authentication and authorization boundaries
  • data classification and exposure risks
  • input validation and injection risks (SQL, XSS, command, SSRF, LDAP, XPath, header, log injection, XXE, SSTI)
  • cryptographic requirements (algorithms, key management, randomness)
  • secrets handling and credential management lifecycle
  • tenant isolation or data partitioning
  • logging, auditability, and incident response hooks
  • external service trust boundaries
  • session management (fixation, timeout, CSRF)
  • rate limiting on sensitive endpoints (login, 2FA, recovery, email/SMS sending)

Assess secrets and credential management

  • secret storage strategy (vault, KMS, environment variables)
  • secret rotation requirements and schedule
  • CI/CD and infrastructure secret handling (no secrets in images, build args, or env blocks)
  • files that must never be committed (.env, *.pem, *.key, credentials.json)
  • secret detection and prevention in development workflow (pre-commit hooks, CI gates)
  • incident response procedure for exposed secrets

Ground secrets management assessment in references/secret_management_checklist.md.

Evaluate data flow security

  • trace user-controlled input from entry points to data sinks (queries, commands, templates, file paths, outbound requests)
  • identify trust boundaries between components, services, and external systems
  • check for second-order vulnerabilities: data stored safely but used unsafely later
  • verify that validation happens at trust boundaries, not just at the UI layer

Flag

  • privilege escalation paths
  • missing access control rules
  • insecure defaults
  • vague data retention or deletion rules
  • integrity failures in workflow approvals or callbacks
  • missing controls for misuse and abuse
  • BOLA/IDOR risks (resource access by ID without ownership verification)
  • JWT weaknesses (algorithm confusion, missing expiry, insecure storage)
  • mass assignment or parameter pollution risks
  • missing rate limiting on authentication or expensive operations
  • predictable resource identifiers (sequential numeric IDs)
  • race conditions in financial or state-changing operations
  • sensitive data in logs, error messages, or API responses

Map findings to OWASP categories where relevant.

Ground security feedback in references/owasp_top10.md, references/security_vulnerability_patterns.md, references/language_security_patterns.md (for stack-specific patterns identified in Step 0), and references/threat_modeling_guide.md (for STRIDE, attack trees, and trust boundary analysis).


Step 6 - Testing Review

You are a senior QA engineer.

Evaluate Test Strategy

  • Is there a clear test pyramid?

    • unit > integration > e2e
  • Are business-critical paths covered?

  • Are non-functional concerns testable?

  • Can failures be reproduced deterministically?

Evaluate Test Quality

Apply best practices:

  • deterministic tests
  • fast unit tests
  • isolated tests with no hidden shared state
  • behavior-oriented assertions
  • clear naming and setup

Detect anti-patterns:

  • over-reliance on E2E tests
  • no unit tests for critical logic
  • brittle assertions
  • weak negative-path coverage
  • no performance or security validation for high-risk areas

Coverage Analysis

Estimate:

  • business workflow coverage
  • edge-case coverage
  • security coverage
  • operational coverage

Identify:

  • missing scenarios
  • high-risk untested paths
  • automation gaps

Ground testing feedback in references/testing_best_practices.md.


Step 7 - DevOps / CI / CD / Operability Review

Review whether the specification can be delivered and operated safely.

Evaluate

  • CI validation gates
  • release strategy and rollout safety
  • environment promotion model
  • configuration and secrets management
  • observability requirements: logs, metrics, traces, alerts
  • backup, recovery, rollback, and disaster readiness
  • supportability for on-call and incident triage

Flag

  • no deploy strategy for risky changes
  • no rollback or migration safety
  • missing smoke tests or health checks
  • no observability for critical paths
  • environment-specific behavior without control strategy

Ground DevOps and operability feedback in references/devops_ci_cd.md and references/observability_reference.md.


Step 8 - Dependency Review

Review external and internal dependencies as design risks.

Evaluate

  • critical libraries, services, and third-party platforms
  • versioning strategy and lockfile integrity
  • upgrade path and compatibility risk
  • lock-in or vendor dependency
  • package trust and supply-chain exposure
  • blast radius if a dependency degrades or disappears
  • ecosystem-specific vulnerability history (check against references/vulnerable_packages_watchlist.md for the identified technology stack)

Assess supply-chain risks

  • dependency scanning in CI/CD (automated audit gates)
  • policy for evaluating and approving new dependencies
  • transitive dependency tree size and risk concentration
  • typosquatting indicators (names one character off from popular packages, forks from unknown publishers, recently transferred packages)
  • deprecated or end-of-life dependencies still in use

Flag

  • transitive risk concentrated in one component
  • no isolation layer around critical providers
  • no fallback or degradation strategy
  • use of immature or unmaintained dependencies
  • dependencies with known critical CVEs in the specified or implied version range
  • no dependency scanning or audit gate in the CI pipeline
  • missing lockfile integrity verification

Ground dependency feedback in references/dependency_review.md, references/dependency_management_guide.md, and references/vulnerable_packages_watchlist.md.


Step 9 - Standards and Norms Review

Review alignment with explicit and implicit standards.

Evaluate

  • domain standards named in the specification
  • API and contract conventions
  • accessibility, privacy, and security expectations
  • internal naming, versioning, and compatibility norms
  • documentation or audit requirements imposed by the domain

Flag

  • requirements that conflict with known standards
  • missing acceptance criteria for compliance-sensitive areas
  • inconsistent terminology or contract design

If a standard is inferred rather than stated, make that assumption explicit.

Ground standards feedback in references/standards_and_norms.md.


Step 10 - UX Review

Review the user experience defined by the specification.

Evaluate

  • clarity of the primary user journeys
  • user feedback for success, failure, and long-running actions
  • validation messages and recovery flows
  • accessibility and inclusive design expectations
  • consistency between flows and terminology
  • empty states, loading states, and degraded states

Flag

  • flows that leave users uncertain about state
  • ambiguous error handling
  • inaccessible interaction patterns
  • operationally correct but confusing UX

Ground UX feedback in references/ux_review.md.


Step 11 - Documentation Review

Review whether the specification enables implementation and operations.

Evaluate

  • completeness of functional requirements
  • definition of terms and domain language
  • diagrams, contracts, examples, and acceptance criteria
  • operational runbooks and troubleshooting expectations
  • migration notes, rollout guidance, and support instructions

Flag

  • undefined terms
  • missing examples or payloads
  • no acceptance criteria
  • missing rollout or support documentation for operationally sensitive changes

Ground documentation feedback in references/documentation_review.md.


Step 12 - Code Quality, Maintainability, and Evolvability Review

Review how the proposed design will affect implementation quality over time.

Evaluate code quality risks

  • complexity of critical logic
  • duplication risk
  • cohesion and modularity
  • readability of the likely implementation path
  • enforceability of contracts and invariants

Evaluate maintainability and evolvability

  • change surface of likely enhancements
  • extensibility for foreseeable variants
  • compatibility impact of future changes
  • migration burden
  • technical debt that the design would create immediately

Flag

  • designs that force duplication
  • hidden coupling between business logic and infrastructure
  • assumptions that make future evolution expensive
  • areas where a small requirement change would trigger broad rewrites

Ground code-quality and maintainability feedback in references/code_quality_maintainability.md, with references/clean_code.md, references/design_patterns.md, references/code_metrics_reference.md, and references/refactoring_catalog.md as supporting material.


Step 13 - Test Plan

Generate:

Unit Tests

  • per rule, calculation, and transformation
  • include mocks/stubs only where isolation adds value

Integration Tests

  • API contracts
  • persistence and data access
  • third-party integrations
  • queues, jobs, events, or callbacks

Contract Tests

  • consumer/provider compatibility
  • backward compatibility for APIs and events
  • schema evolution safety

E2E Tests

  • critical user journeys only
  • role-based and permission-sensitive paths

Performance Tests

  • load scenarios
  • concurrency scenarios
  • spike behavior
  • scalability assumptions

Security Tests

  • injection attempts
  • auth bypass
  • privilege escalation
  • data exposure

Operability Tests

  • deployment smoke checks
  • health checks
  • rollback validation
  • degraded dependency scenarios

Edge Cases

  • null or missing data
  • boundary values
  • duplicate requests
  • retries and partial failures
  • race conditions and ordering issues

Ground test plan generation in references/testing_best_practices.md.


Step 14 - Task Breakdown

Create actionable tasks grouped into epics.

Include tasks for:

  • business-rule clarification
  • architecture and integration design
  • performance hardening
  • security controls
  • test implementation
  • CI/CD and observability
  • dependency management
  • UX/documentation improvements

Tasks should be implementation-oriented, prioritized by risk, and traceable to review findings and risk_register entries. Use addresses to reference risk_register ids (e.g., risk-1) or review section names where the need was identified.


Step 15 - Self-Verification Pass

Before producing the final output, re-examine every finding and risk.

For each issue

  1. Re-read the relevant specification section with fresh eyes
  2. Ask: "Is this actually a gap, or did I miss context elsewhere in the spec?"
  3. Check if another section of the spec already addresses the concern
  4. Verify the severity is justified — downgrade or discard findings that are not genuine gaps
  5. Assign a final confidence rating: high, medium, or low

Confidence ratings guide

Confidence When to use
high The gap is unambiguous. The spec clearly lacks the required control or definition.
medium The gap likely exists but depends on context not fully visible in the spec (e.g., handled by an external system or convention).
low Suspicious pattern but could be a false positive. Flag for author clarification.

For each risk register entry

  • Verify the trigger is specific and actionable
  • Verify the mitigation is concrete, not generic
  • Remove duplicate risks that are better captured as issues

Final checks

  • Ensure every entry in business_logic_review.edge_cases has a corresponding entry in test_plan.edge_cases (or is explicitly noted as out of scope)
  • Ensure every CRITICAL or HIGH issue has a concrete, actionable recommendation
  • Verify that tasks trace back to risk_register entries or review findings via addresses

Behavior Rules

  • be critical about missing information and contradictions
  • prioritize high-impact risks over cosmetic concerns
  • prefer precise, evidence-based findings over broad statements
  • call out assumptions explicitly
  • distinguish business-logic problems from implementation-detail problems
  • treat operability, security, and maintainability as first-class review dimensions
  • do not hide uncertainty — state what cannot be assessed from the current specification
  • populate the risk_register with every material risk surfaced during Steps 1–12; each entry must have a severity, likelihood, trigger, and mitigation
  • assign a confidence rating (high, medium, low) to every issue; never omit confidence — it helps engineers prioritize review effort
  • perform the self-verification pass (Step 15) before producing final output; downgrade or discard findings that do not survive re-examination
  • when reviewing a specification that names a specific technology stack, load the relevant patterns from references/language_security_patterns.md and references/vulnerable_packages_watchlist.md to ground stack-specific findings
  • when a review dimension does not apply to the specification (e.g., UX for a pure backend library), set its review section fields to null or [], set its score to null, and add a brief note in the section explaining why the dimension was skipped
  • if the same problem is relevant to multiple review dimensions, file it as a single issue under the most specific category and cross-reference the affected dimensions in the description
  • read and respect existing project coding guidelines and instructions (e.g., .github/instructions/*.md, .github/copilot-instructions.md) when they are available — factor them into review findings

Install via CLI
npx skills add https://github.com/jyjeanne/ai-spec-review-skill --skill ai-spec-review
Repository Details
star Stars 3
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator