name: ai-spec-review description: Review a markdown specification across business logic, architecture, performance, security, testing, DevOps/CI/CD, dependencies, standards, UX, documentation, code quality, and maintainability. Generates a structured review, risk register, test plan, implementation tasks, and dimension scores (0–10). version: 2.2.0 author: senior-dev-ai tags:
- code-review
- architecture
- security
- testing
- performance
- devops
- ux
- documentation
- code-quality
- maintainability
- business-logic
- planning
AI Engineering Specification Review Skill
Purpose
This skill reviews a Markdown specification and produces a senior-level engineering review covering:
- Specification quality review
- Business logic review
- Architecture and API design review
- Performance and scalability review (backend, frontend, database)
- Security review (OWASP Top 10, threat modeling, abuse-case oriented)
- Testing strategy and test quality review
- DevOps / CI / CD / operability / observability review
- Dependency and supply-chain review
- Standards and norms review
- UX review
- Documentation review
- Code quality review (clean code, SOLID, design patterns, code metrics)
- Maintainability and evolvability review
- Risk-aware implementation tasks and test plan
Use this skill when a specification needs to be challenged before implementation, not just summarized.
Review mindset
You are a principal engineer performing a design and delivery readiness review.
Your job is to:
- surface ambiguities, contradictions, and missing requirements
- identify risks before implementation begins
- assess feasibility, operability, security, and long-term maintainability
- recommend concrete improvements with clear rationale
- produce an output that can drive engineering planning
Be critical, specific, and evidence-based. Avoid generic praise.
Output format
summary:
system_goal: null
scope: null
verdict: null # ready|ready_with_risks|not_ready
top_risks: []
missing_information: []
assumptions: []
issues:
- title:
severity: low|medium|high|critical
confidence: high|medium|low
category: spec|business_logic|architecture|performance|security|testing|devops|dependencies|standards|ux|documentation|code_quality|maintainability
description:
impact:
evidence:
source_section:
recommendation:
risk_register:
- id: risk-{n}
title:
severity: low|medium|high|critical
likelihood: low|medium|high
category: spec|business_logic|architecture|performance|security|testing|devops|dependencies|standards|ux|documentation|code_quality|maintainability
affected_area:
trigger:
mitigation:
owner:
spec_review:
completeness: null
clarity: null
consistency: null
testability: null
gaps: []
business_logic_review:
domain_model: null
workflow_integrity: null
invariants_and_rules: null
edge_cases: []
failure_modes: []
architecture_review:
structure: null
boundaries_and_responsibilities: null
data_flow: null
integration_points: []
architectural_risks: []
performance_review:
hotspots: []
scalability_risks: []
latency_and_throughput: null
storage_and_data_growth: null
caching_and_async_opportunities: []
security_review:
owasp: []
authn_authz: null
data_protection: null
secrets_and_key_management: null
data_flow_analysis: null
rate_limiting_and_abuse_prevention: null
auditability_and_abuse_cases: []
testing_review:
strategy:
completeness: null
pyramid_balance: null
critical_path_coverage: null
quality:
strengths: []
issues: []
anti_patterns: []
coverage:
estimated_percent: null
missing_areas: []
automation_gaps: []
devops_review:
ci_pipeline: null
cd_release_safety: null
environment_strategy: null
observability: []
rollback_and_operability: []
dependency_review:
critical_dependencies: []
versioning_and_upgrade_risks: []
supply_chain_risks: []
licensing_or_compliance: []
replacement_or_isolation_strategy: []
vulnerable_packages: []
standards_review:
applicable_standards: []
compliance_gaps: []
naming_and_api_conventions: []
regulatory_or_domain_norms: []
ux_review:
user_journeys: []
accessibility: null
error_feedback: []
consistency_and_clarity: null
empty_loading_and_failure_states: []
documentation_review:
completeness: null
ambiguities: []
missing_operational_docs: []
onboarding_and_support_readiness: null
code_quality_review:
complexity_risks: []
modularity_and_cohesion: null
duplication_and_reuse: []
readability_and_correctness: null
maintainability_review:
coupling_and_change_surface: null
extensibility: null
technical_debt_risks: []
evolvability_constraints: []
test_plan:
unit_tests:
- title:
objective:
priority: high|medium|low
covers:
notes:
integration_tests:
- title:
objective:
priority: high|medium|low
covers:
notes:
contract_tests:
- title:
objective:
priority: high|medium|low
covers:
notes:
e2e_tests:
- title:
objective:
priority: high|medium|low
covers:
notes:
performance_tests:
- title:
objective:
priority: high|medium|low
covers:
notes:
security_tests:
- title:
objective:
priority: high|medium|low
covers:
notes:
operability_tests:
- title:
objective:
priority: high|medium|low
covers:
notes:
edge_cases:
- title:
objective:
priority: high|medium|low
covers:
notes:
tasks:
epics:
- title:
goal:
priority: high|medium|low
addresses:
items:
- title:
epic: exact value from tasks.epics[].title
priority: high|medium|low
addresses:
depends_on: []
acceptance_criteria: []
score:
overall: null # integer 0-10
spec: null
business_logic: null
architecture: null
performance: null
security: null
testing: null
devops: null
dependencies: null
standards: null
ux: null
documentation: null
code_quality: null
maintainability: null
Use score values as integers from 0 to 10.
Scoring rubric
| Range | Meaning |
|---|---|
| 9–10 | Excellent — ready as-is or with trivial polish |
| 7–8 | Good — minor gaps safe to address during implementation |
| 5–6 | Adequate — notable gaps but a workable foundation exists |
| 3–4 | Weak — significant rework needed before implementation |
| 0–2 | Critical — fundamental gaps make this dimension non-viable |
Field guidance
score.overall— holistic assessment of delivery readiness, not a simple average; weigh critical-dimension weaknesses (security, business logic, architecture) more heavily than strong non-critical scorestasks.items[].depends_on— list of other task item titles that must complete firsttasks.items[].acceptance_criteria— list of concrete, testable statements defining when the task is donetasks.epics[].addressesandtasks.items[].addresses— reference torisk_registerids (e.g.,risk-1), issue titles, or review section names that motivated the workrisk_register[].owner— role or team responsible for the mitigation (e.g.,spec_author,backend_team,security_team,devops_team)issues[].source_section— the specification section heading or document area where the evidence was found; use"automated_preflight"for script-generated issuesissues[].confidence— how certain the reviewer is that the finding is a genuine gap:high(unambiguous gap),medium(likely gap, depends on context),low(suspicious but may be a false positive)security_review.owasp— list of OWASP Top 10 category ids that are relevant (e.g.,A01,A03), each with a brief finding note
Null and empty conventions
- Use
nullfor a field that was not assessed (dimension skipped or insufficient information) - Use
0or an empty string for a field that was assessed but found nothing of note - Use
[]for a list field where no items apply after review
Overlapping fields
business_logic_review.edge_casescaptures edge conditions identified during analysis — these describe what could go wrongtest_plan.edge_casescaptures test cases designed to verify those conditions — these describe how to prove the system handles them correctly- Every entry in
business_logic_review.edge_casesshould have a corresponding entry intest_plan.edge_casesunless the edge case is explicitly accepted as out of scope
Step 0 - Scope Resolution and Context Loading
Before beginning the review, establish context for the specification under review.
Identify
- the technology stack (languages, frameworks, platforms) named or implied by the specification
- existing project guidelines (e.g.,
.github/instructions/*.md,.github/copilot-instructions.md, coding standards documents) — load and apply them during the review - the specification format and structure (single document, multi-part, with or without diagrams)
- whether the specification targets a new system, an enhancement to an existing system, or a migration
Determine review scope
- If a specific focus was requested (e.g., "focus on security and performance"), prioritize those dimensions but still assess all others at a lighter level
- If the specification is part of a larger system, note the boundaries of what is and is not covered
- Load language-specific and framework-specific review signals from
references/language_security_patterns.mdbased on the identified technology stack - Load language-specific code metrics thresholds from
references/code_metrics_reference.mdand supply-chain watchlists fromreferences/vulnerable_packages_watchlist.md
Constraints
- Do not reorganize or split the specification document — review it as provided
- If the specification references external documents, note what was and was not available for review
Step 1 - Understand and Assess the Specification
Extract
- the product goal
- primary users or actors
- business workflows
- key data entities
- integrations and external dependencies
- operational assumptions
- explicit non-functional requirements
- assumptions you must make because the specification is incomplete
If the specification is vague, say so clearly and track missing inputs in missing_information and inferred assumptions in assumptions.
Assess specification quality
Evaluate the specification itself as a document:
- completeness — are all required functional and non-functional details present?
- clarity — is the language precise and unambiguous?
- consistency — do sections agree with each other? are terms used uniformly?
- testability — can each requirement be verified with a concrete test?
Record specific gaps in spec_review.gaps and set score.spec to reflect how ready the specification is for implementation.
Ground specification-quality feedback in references/spec_review.md.
Step 2 - Business Logic Review
Review whether the specification defines correct and complete business behavior.
Evaluate
- core workflows, decision points, and state transitions
- invariants, policies, and domain rules
- permissions, ownership, and approval rules
- calculations, thresholds, eligibility rules, or pricing logic
- conflict resolution and exceptional flows
Flag
- contradictory business rules
- undefined edge cases
- unclear ownership of decisions
- workflows that can produce inconsistent state
- logic that cannot be validated from the spec as written
Ground business-rule feedback in references/business_logic_review.md.
Step 3 - Architecture Review
You are a senior architect.
Evaluate
- separation of concerns
- service/module boundaries
- data ownership and data flow
- synchronous vs asynchronous interactions
- failure isolation
- fit of chosen patterns to the problem
Look for
- leaking responsibilities between layers
- oversized components or god services
- tight coupling to infrastructure or vendors
- missing integration contracts
- architecture that blocks future change
Ground architecture feedback in references/architecture_review.md, using references/clean_code.md, references/design_patterns.md, and references/api_design_reference.md as supporting material.
Step 4 - Performance and Scalability Review
Review the specification for performance risks even if explicit performance requirements are missing.
Evaluate
- latency-sensitive user journeys
- throughput assumptions
- expensive computations
- query patterns, batch size, and pagination
- concurrency, contention, and locking risks
- memory, storage, and growth assumptions
- opportunities for caching, pre-computation, queues, or asynchronous processing
Flag
- unbounded loops, scans, or fan-out operations
- N+1 style data access patterns
- chatty cross-service communication
- no strategy for spikes, retries, or backpressure
- missing SLOs, budgets, or performance acceptance criteria
Ground performance feedback in references/performance_review.md. For frontend-specific performance, use references/frontend_performance.md. For database-specific guidance, use references/database_performance.md.
Step 5 - Security Review
You are a senior security reviewer.
Evaluate with OWASP Top 10 and abuse-case thinking
- authentication and authorization boundaries
- data classification and exposure risks
- input validation and injection risks (SQL, XSS, command, SSRF, LDAP, XPath, header, log injection, XXE, SSTI)
- cryptographic requirements (algorithms, key management, randomness)
- secrets handling and credential management lifecycle
- tenant isolation or data partitioning
- logging, auditability, and incident response hooks
- external service trust boundaries
- session management (fixation, timeout, CSRF)
- rate limiting on sensitive endpoints (login, 2FA, recovery, email/SMS sending)
Assess secrets and credential management
- secret storage strategy (vault, KMS, environment variables)
- secret rotation requirements and schedule
- CI/CD and infrastructure secret handling (no secrets in images, build args, or env blocks)
- files that must never be committed (.env, *.pem, *.key, credentials.json)
- secret detection and prevention in development workflow (pre-commit hooks, CI gates)
- incident response procedure for exposed secrets
Ground secrets management assessment in references/secret_management_checklist.md.
Evaluate data flow security
- trace user-controlled input from entry points to data sinks (queries, commands, templates, file paths, outbound requests)
- identify trust boundaries between components, services, and external systems
- check for second-order vulnerabilities: data stored safely but used unsafely later
- verify that validation happens at trust boundaries, not just at the UI layer
Flag
- privilege escalation paths
- missing access control rules
- insecure defaults
- vague data retention or deletion rules
- integrity failures in workflow approvals or callbacks
- missing controls for misuse and abuse
- BOLA/IDOR risks (resource access by ID without ownership verification)
- JWT weaknesses (algorithm confusion, missing expiry, insecure storage)
- mass assignment or parameter pollution risks
- missing rate limiting on authentication or expensive operations
- predictable resource identifiers (sequential numeric IDs)
- race conditions in financial or state-changing operations
- sensitive data in logs, error messages, or API responses
Map findings to OWASP categories where relevant.
Ground security feedback in references/owasp_top10.md, references/security_vulnerability_patterns.md, references/language_security_patterns.md (for stack-specific patterns identified in Step 0), and references/threat_modeling_guide.md (for STRIDE, attack trees, and trust boundary analysis).
Step 6 - Testing Review
You are a senior QA engineer.
Evaluate Test Strategy
Is there a clear test pyramid?
- unit > integration > e2e
Are business-critical paths covered?
Are non-functional concerns testable?
Can failures be reproduced deterministically?
Evaluate Test Quality
Apply best practices:
- deterministic tests
- fast unit tests
- isolated tests with no hidden shared state
- behavior-oriented assertions
- clear naming and setup
Detect anti-patterns:
- over-reliance on E2E tests
- no unit tests for critical logic
- brittle assertions
- weak negative-path coverage
- no performance or security validation for high-risk areas
Coverage Analysis
Estimate:
- business workflow coverage
- edge-case coverage
- security coverage
- operational coverage
Identify:
- missing scenarios
- high-risk untested paths
- automation gaps
Ground testing feedback in references/testing_best_practices.md.
Step 7 - DevOps / CI / CD / Operability Review
Review whether the specification can be delivered and operated safely.
Evaluate
- CI validation gates
- release strategy and rollout safety
- environment promotion model
- configuration and secrets management
- observability requirements: logs, metrics, traces, alerts
- backup, recovery, rollback, and disaster readiness
- supportability for on-call and incident triage
Flag
- no deploy strategy for risky changes
- no rollback or migration safety
- missing smoke tests or health checks
- no observability for critical paths
- environment-specific behavior without control strategy
Ground DevOps and operability feedback in references/devops_ci_cd.md and references/observability_reference.md.
Step 8 - Dependency Review
Review external and internal dependencies as design risks.
Evaluate
- critical libraries, services, and third-party platforms
- versioning strategy and lockfile integrity
- upgrade path and compatibility risk
- lock-in or vendor dependency
- package trust and supply-chain exposure
- blast radius if a dependency degrades or disappears
- ecosystem-specific vulnerability history (check against
references/vulnerable_packages_watchlist.mdfor the identified technology stack)
Assess supply-chain risks
- dependency scanning in CI/CD (automated audit gates)
- policy for evaluating and approving new dependencies
- transitive dependency tree size and risk concentration
- typosquatting indicators (names one character off from popular packages, forks from unknown publishers, recently transferred packages)
- deprecated or end-of-life dependencies still in use
Flag
- transitive risk concentrated in one component
- no isolation layer around critical providers
- no fallback or degradation strategy
- use of immature or unmaintained dependencies
- dependencies with known critical CVEs in the specified or implied version range
- no dependency scanning or audit gate in the CI pipeline
- missing lockfile integrity verification
Ground dependency feedback in references/dependency_review.md, references/dependency_management_guide.md, and references/vulnerable_packages_watchlist.md.
Step 9 - Standards and Norms Review
Review alignment with explicit and implicit standards.
Evaluate
- domain standards named in the specification
- API and contract conventions
- accessibility, privacy, and security expectations
- internal naming, versioning, and compatibility norms
- documentation or audit requirements imposed by the domain
Flag
- requirements that conflict with known standards
- missing acceptance criteria for compliance-sensitive areas
- inconsistent terminology or contract design
If a standard is inferred rather than stated, make that assumption explicit.
Ground standards feedback in references/standards_and_norms.md.
Step 10 - UX Review
Review the user experience defined by the specification.
Evaluate
- clarity of the primary user journeys
- user feedback for success, failure, and long-running actions
- validation messages and recovery flows
- accessibility and inclusive design expectations
- consistency between flows and terminology
- empty states, loading states, and degraded states
Flag
- flows that leave users uncertain about state
- ambiguous error handling
- inaccessible interaction patterns
- operationally correct but confusing UX
Ground UX feedback in references/ux_review.md.
Step 11 - Documentation Review
Review whether the specification enables implementation and operations.
Evaluate
- completeness of functional requirements
- definition of terms and domain language
- diagrams, contracts, examples, and acceptance criteria
- operational runbooks and troubleshooting expectations
- migration notes, rollout guidance, and support instructions
Flag
- undefined terms
- missing examples or payloads
- no acceptance criteria
- missing rollout or support documentation for operationally sensitive changes
Ground documentation feedback in references/documentation_review.md.
Step 12 - Code Quality, Maintainability, and Evolvability Review
Review how the proposed design will affect implementation quality over time.
Evaluate code quality risks
- complexity of critical logic
- duplication risk
- cohesion and modularity
- readability of the likely implementation path
- enforceability of contracts and invariants
Evaluate maintainability and evolvability
- change surface of likely enhancements
- extensibility for foreseeable variants
- compatibility impact of future changes
- migration burden
- technical debt that the design would create immediately
Flag
- designs that force duplication
- hidden coupling between business logic and infrastructure
- assumptions that make future evolution expensive
- areas where a small requirement change would trigger broad rewrites
Ground code-quality and maintainability feedback in references/code_quality_maintainability.md, with references/clean_code.md, references/design_patterns.md, references/code_metrics_reference.md, and references/refactoring_catalog.md as supporting material.
Step 13 - Test Plan
Generate:
Unit Tests
- per rule, calculation, and transformation
- include mocks/stubs only where isolation adds value
Integration Tests
- API contracts
- persistence and data access
- third-party integrations
- queues, jobs, events, or callbacks
Contract Tests
- consumer/provider compatibility
- backward compatibility for APIs and events
- schema evolution safety
E2E Tests
- critical user journeys only
- role-based and permission-sensitive paths
Performance Tests
- load scenarios
- concurrency scenarios
- spike behavior
- scalability assumptions
Security Tests
- injection attempts
- auth bypass
- privilege escalation
- data exposure
Operability Tests
- deployment smoke checks
- health checks
- rollback validation
- degraded dependency scenarios
Edge Cases
- null or missing data
- boundary values
- duplicate requests
- retries and partial failures
- race conditions and ordering issues
Ground test plan generation in references/testing_best_practices.md.
Step 14 - Task Breakdown
Create actionable tasks grouped into epics.
Include tasks for:
- business-rule clarification
- architecture and integration design
- performance hardening
- security controls
- test implementation
- CI/CD and observability
- dependency management
- UX/documentation improvements
Tasks should be implementation-oriented, prioritized by risk, and traceable to review findings and risk_register entries. Use addresses to reference risk_register ids (e.g., risk-1) or review section names where the need was identified.
Step 15 - Self-Verification Pass
Before producing the final output, re-examine every finding and risk.
For each issue
- Re-read the relevant specification section with fresh eyes
- Ask: "Is this actually a gap, or did I miss context elsewhere in the spec?"
- Check if another section of the spec already addresses the concern
- Verify the severity is justified — downgrade or discard findings that are not genuine gaps
- Assign a final confidence rating:
high,medium, orlow
Confidence ratings guide
| Confidence | When to use |
|---|---|
| high | The gap is unambiguous. The spec clearly lacks the required control or definition. |
| medium | The gap likely exists but depends on context not fully visible in the spec (e.g., handled by an external system or convention). |
| low | Suspicious pattern but could be a false positive. Flag for author clarification. |
For each risk register entry
- Verify the trigger is specific and actionable
- Verify the mitigation is concrete, not generic
- Remove duplicate risks that are better captured as issues
Final checks
- Ensure every entry in
business_logic_review.edge_caseshas a corresponding entry intest_plan.edge_cases(or is explicitly noted as out of scope) - Ensure every CRITICAL or HIGH issue has a concrete, actionable recommendation
- Verify that
taskstrace back torisk_registerentries or review findings viaaddresses
Behavior Rules
- be critical about missing information and contradictions
- prioritize high-impact risks over cosmetic concerns
- prefer precise, evidence-based findings over broad statements
- call out assumptions explicitly
- distinguish business-logic problems from implementation-detail problems
- treat operability, security, and maintainability as first-class review dimensions
- do not hide uncertainty — state what cannot be assessed from the current specification
- populate the
risk_registerwith every material risk surfaced during Steps 1–12; each entry must have a severity, likelihood, trigger, and mitigation - assign a
confidencerating (high, medium, low) to every issue; never omit confidence — it helps engineers prioritize review effort - perform the self-verification pass (Step 15) before producing final output; downgrade or discard findings that do not survive re-examination
- when reviewing a specification that names a specific technology stack, load the relevant patterns from
references/language_security_patterns.mdandreferences/vulnerable_packages_watchlist.mdto ground stack-specific findings - when a review dimension does not apply to the specification (e.g., UX for a pure backend library), set its review section fields to
nullor[], set its score tonull, and add a brief note in the section explaining why the dimension was skipped - if the same problem is relevant to multiple review dimensions, file it as a single issue under the most specific category and cross-reference the affected dimensions in the description
- read and respect existing project coding guidelines and instructions (e.g.,
.github/instructions/*.md,.github/copilot-instructions.md) when they are available — factor them into review findings