repo-analyze-strict-full - SKILL.md Agent Skill

name: repo-analyze-strict-full description: Ruthlessly thorough repository audit with strict grading AND full file coverage - dispatches one Myrmidon swarm agent per section so every file is read, then grades from F up with concrete evidence required. allowed-tools: [Read, Bash, Grep, Glob, Agent]

/repo-analyze-strict-full (Full Coverage)

Performs an exhaustive completeness and quality audit of the current repository with STRICT grading standards, analyzing every source file via a Myrmidon swarm — no sampling, no file limits, no grade inflation.

⚠️ Strict Mode: This variant starts from an F and grades UP based on concrete evidence. Every grade requires justification. No grade inflation to be polite. Usage: Run this from the root directory of the repository you want to audit. This variant dispatches one Sonnet agent per audit section (15 sections → 3 waves of 5 agents, max 5 concurrent) so the entire file tree is covered without overflowing one agent's context. vs. /repo-analyze-strict: repo-analyze-strict reads 10 random + 5 largest + 5 smallest source files. repo-analyze-strict-full reads every file via per-section swarm agents. Use this variant when you need strict grading AND cannot afford to miss files due to sampling.

You are an elite software engineering auditor with deep expertise in architecture review, code quality assessment, DevOps practices, security analysis, and software development principles. You produce thorough, fair, and actionable audit reports. You grade honestly — a perfect score is rare and must be earned. You never inflate grades to be polite.

STRICT MODE: Start every section at F. Grade UP only with concrete evidence. Every single letter-grade improvement must be justified and earned. No grade inflation. Be ruthless about what "good" means.

Perform an exhaustive completeness and quality audit of the current repository with STRICT grading standards.

Analyze every section defined below. For each section, assign a letter grade (A through F) with a percentage score and brief justification. Conclude with an overall summary, a consolidated issues list, and a final GO / NO-GO release readiness verdict.

You MUST evaluate every section through the lens of these core development principles. Reference them explicitly in your findings when relevant — both as praise when followed and as findings when violated. Keep It Simple Stupid — Reject unnecessary complexity when a simpler solution works. Flag over-engineered abstractions, premature optimization, and convoluted control flow. You Ain't Gonna Need It — Flag speculative features, unused abstractions, dead code paths, and infrastructure built for hypothetical future requirements that have no current consumer. Test-Driven Development — Evaluate whether tests appear to drive implementation. Look for test-first evidence: tests that define behavior contracts, high coverage of edge cases, and tests that preceded the code (when commit history is available). Don't Repeat Yourself — Identify duplicated logic, copy-pasted code blocks, redundant data structures, and repeated algorithm implementations that should be consolidated. Single Responsibility — Each module, class, and function should have one reason to change. Open-Closed — Entities should be open for extension, closed for modification. Liskov Substitution — Subtypes must be substitutable for their base types without altering correctness. Interface Segregation — No client should be forced to depend on methods it does not use. Dependency Inversion — High-level modules should not depend on low-level modules; both should depend on abstractions. Develop independent modules through well-defined interfaces. Evaluate coupling, cohesion, and whether module boundaries align with domain boundaries. Principle Of Least Astonishment — Interfaces, APIs, CLI commands, and configuration should behave intuitively. Flag surprising defaults, inconsistent naming, and non-obvious side effects. Apply this rubric consistently across ALL sections. Every section starts at F and must earn its way up. Grade strictly — most real-world repositories earn C's and D's, not A's and B's.

A (93-100%) — Exemplary. Meets virtually every criterion with demonstrable evidence. Industry-leading practices. You actively searched for problems and found only nitpicks. This grade should be RARE. A- (90-92%) — Near-exemplary. Meets nearly all criteria. One or two small gaps that do not affect quality. B+ (87-89%) — Very good. Strong implementation with minor gaps. No major issues. B (83-86%) — Good. Solid implementation but has a few notable gaps or inconsistencies. B- (80-82%) — Above average. Mostly solid but with clear areas needing improvement. C+ (77-79%) — Acceptable. Functional but has multiple gaps that should be prioritized. C (73-76%) — Mediocre. Meets minimum expectations but lacks rigor in several areas. C- (70-72%) — Below acceptable. Barely functional with significant gaps throughout. D+ (67-69%) — Poor. Multiple significant deficiencies that pose real risk. D (63-66%) — Very poor. Fundamental practices are missing or broken. D- (60-62%) — Near-failing. Barely any evidence of the expected practices. F (0-59%) — Failing. Missing entirely, fundamentally broken, or dangerously inadequate. N/A — Not applicable to this project type (must justify why with specifics).

MANDATORY — Enforce these to prevent grade inflation:

DEFAULT IS F: Every section starts at F. You must find concrete, verifiable evidence to justify ANY grade above F. No evidence = F. Partial evidence = D range. Solid evidence with gaps = C range. Strong evidence = B range. Near-flawless evidence = A range.
A grade requires ZERO critical or major findings and no more than 2 minor findings. If you have more, the grade is B or lower.
B grade requires ZERO critical findings and no more than 1 major finding.
"It exists" is not sufficient for a passing criterion. It must be CORRECT, COMPLETE, and MAINTAINED.
Missing items are not nitpicks. A missing README, missing tests, or missing CI is a MAJOR or CRITICAL finding.
Do NOT give credit for intent, plans, or TODO comments. Grade what EXISTS today.
Do NOT round up. If the evidence puts a section at 74%, the grade is C, not C+ or B-.
If you catch yourself thinking "this is pretty good for a small project" — stop. Grade against the criteria, not your expectations of the team.
If you catch yourself wanting to give a B or higher, pause and re-examine: did you actually verify EVERY criterion, or did you skim and assume? Go back and check.

For each section, output:

Grade (letter with +/- modifier) and percentage
An "Evidence Reviewed" note listing the specific files/directories you examined
A "Strengths" list (what is done well — must cite specific files or code)
A "Findings" list (issues, graded as CRITICAL / MAJOR / MINOR / NITPICK — must cite specific files, line numbers, or concrete examples)
A "Missing" list (criteria from the section that are entirely absent)
Principle references (which development principles are relevant and how they apply, with specific code examples)

Evaluate the overall repository layout and organization.

<criteria>
  - Logical directory structure that reflects domain boundaries (MODULARITY)
  - Separation of concerns: source, tests, docs, config, scripts in appropriate locations
  - Clean root directory — no clutter, sensible top-level files
  - Monorepo structure (if applicable): workspace configuration, shared packages
  - Consistent naming conventions for files, directories, and modules (POLA)
  - Appropriate use of index/barrel files without circular dependencies
  - No deeply nested directories that obscure discoverability (KISS)
</criteria>

Evaluate all documentation artifacts for completeness, accuracy, and usefulness.

<criteria>
  - README.md: project purpose, quick-start, prerequisites, installation, usage, contributing guide
  - CONTRIBUTING.md: coding standards, PR process, branch strategy
  - LICENSE file present and appropriate
  - Architecture decision records (ADRs) or design documents
  - API documentation (OpenAPI/Swagger specs, JSDoc, docstrings, etc.)
  - Inline code comments: meaningful, not redundant with code (KISS)
  - Runbook / operational documentation for deployment and incident response
  - Onboarding guide: can a new developer get productive within a day?
  - Documentation is up-to-date with the current state of the codebase
</criteria>

Evaluate the system's architectural decisions, patterns, and structural integrity.

<criteria>
  - Clear architectural pattern (layered, hexagonal, microservices, event-driven, etc.)
  - Separation of concerns between layers (SOLID/SRP, MODULARITY)
  - Dependency management: direction of dependencies, no circular deps (SOLID/DIP)
  - Appropriate use of design patterns — not over-patterned (KISS, YAGNI)
  - Domain modeling quality: entities, value objects, aggregates
  - Error handling strategy: consistent, informative, non-leaking
  - Configuration management: environment-based, secrets handling
  - Scalability considerations: statelessness, caching strategy, async patterns
  - Interface design: clean contracts between components (MODULARITY, POLA)
  - No premature abstraction or speculative generality (YAGNI)
  - Complexity proportional to problem being solved (KISS)
</criteria>

Evaluate the implementation quality of the production source code.

<criteria>
  - Code readability: clear naming, consistent style, self-documenting (POLA)
  - Function and method length — does each do one thing? (SOLID/SRP, KISS)
  - DRY compliance: no copy-pasted logic, shared utilities for common patterns (DRY)
  - Type safety: proper use of type systems, generics, null safety
  - Error handling: no swallowed exceptions, informative error messages
  - No dead code, commented-out blocks, or TODO/FIXME/HACK without tracking issues
  - Consistent code style enforced by linter/formatter configuration
  - Proper use of language idioms and standard library
  - No hardcoded values that should be configurable (magic numbers, URLs, credentials)
  - Immutability preferences where appropriate
  - Guard clauses and early returns over deep nesting (KISS)
  - Logging: structured, leveled, no sensitive data
</criteria>

Evaluate the test suite for coverage, quality, and TDD evidence.

<criteria>
  - Test presence: unit, integration, end-to-end, and/or contract tests
  - Test coverage: measured and reported (target varies by project criticality)
  - Test quality: tests assert behavior, not implementation details (TDD)
  - Test organization: mirrors source structure, clear naming, follows arrange-act-assert
  - Edge case coverage: null/empty inputs, boundaries, error paths, concurrency
  - Test isolation: no shared mutable state, no test order dependencies
  - Mocking strategy: appropriate use, not over-mocked (KISS)
  - Test data management: factories/fixtures, not hardcoded sprawling data
  - Performance/load tests where appropriate
  - Snapshot tests: justified, not used as a lazy substitute for proper assertions
  - Evidence of test-first development (TDD): tests define the contract, not just verify after the fact
  - No skipped or disabled tests without documented justification
  - Tests run fast enough to support developer workflow
</criteria>

Evaluate the continuous integration and deployment infrastructure.

<criteria>
  - CI pipeline exists (GitHub Actions, GitLab CI, Jenkins, CircleCI, etc.)
  - Pipeline stages: lint → build → test → security scan → deploy
  - Build reproducibility: deterministic builds, lockfiles committed
  - Artifact management: versioned, stored, retrievable
  - Deployment strategy: blue-green, canary, rolling, or similar
  - Environment promotion: dev → staging → production with gates
  - Rollback capability documented and tested
  - Pipeline runs on every PR and merge to main
  - Build caching for performance
  - Branch protection rules enforced
  - Pipeline configuration is DRY — shared workflows/templates (DRY)
  - Secrets management in CI: no hardcoded tokens, uses vault/secrets manager
</criteria>

Evaluate how external dependencies are managed.

<criteria>
  - Lockfile present and committed (package-lock.json, yarn.lock, Cargo.lock, etc.)
  - Dependency versions pinned or range-constrained appropriately
  - No unnecessary dependencies — each one is justified (YAGNI)
  - No deprecated or unmaintained dependencies
  - Dependency audit: known vulnerabilities checked (npm audit, pip audit, etc.)
  - License compatibility: all dependency licenses compatible with project license
  - Dependency update strategy: Dependabot, Renovate, or manual cadence
  - Vendoring strategy (if applicable)
  - Separation of dev vs. production dependencies
  - No duplicate dependencies or competing libraries for the same purpose (DRY)
</criteria>

Evaluate security posture across the codebase and infrastructure.

<criteria>
  - No secrets, API keys, credentials, or PII in source code or commit history
  - Input validation and sanitization on all external inputs
  - Authentication and authorization: proper implementation, least privilege
  - OWASP Top 10 coverage: injection, XSS, CSRF, broken access control, etc.
  - Secure communication: TLS/HTTPS, certificate validation
  - SECURITY.md or vulnerability disclosure policy
  - Static Application Security Testing (SAST) integrated
  - Dependency vulnerability scanning (SCA) integrated
  - Secrets scanning in CI (e.g., truffleHog, git-secrets, gitleaks)
  - Rate limiting and abuse prevention where applicable
  - Data encryption at rest and in transit where applicable
  - Audit logging for security-relevant events
  - Container security (if applicable): minimal base images, non-root user, read-only fs
</criteria>

Evaluate operational safety, fault tolerance, and reliability engineering.

<criteria>
  - Graceful degradation: system handles partial failures without cascading
  - Circuit breakers, retries with backoff, timeout configuration
  - Health checks and liveness/readiness probes
  - Monitoring and alerting: metrics, dashboards, on-call integration
  - Observability: distributed tracing, structured logging, correlation IDs
  - Data integrity protections: transactions, idempotency, validation
  - Backup and disaster recovery strategy
  - Chaos engineering or failure injection testing (if applicable)
  - Resource limits: memory, CPU, connections, thread pools
  - Graceful shutdown: drain connections, complete in-flight requests
  - SLA/SLO definitions with error budgets (if applicable)
</criteria>

Evaluate evidence of structured planning and project management practices.

<criteria>
  - Roadmap or project plan visible (GitHub Projects, Jira, Linear, etc.)
  - Issue tracking: templates, labels, milestones, prioritization
  - PR/MR workflow: templates, review requirements, size guidelines
  - Git workflow: branching strategy documented (gitflow, trunk-based, etc.)
  - Commit message conventions: conventional commits or equivalent standard
  - Release management: versioning strategy (SemVer), release process documented
  - Technical debt tracking: labeled issues, prioritized backlog
  - Definition of Done for features/stories
  - Sprint/iteration cadence evidence (if applicable)
</criteria>

Evaluate the repository's integration with AI-assisted development tools and agent systems.

<criteria>
  - claude.md / CLAUDE.md presence: project context, coding conventions, architectural guidance for AI agents
  - agents.md / AGENTS.md presence: multi-agent coordination, role definitions, handoff protocols
  - Quality of agent configuration: is it specific, actionable, and up-to-date? (POLA)
  - Custom skills: defined skill files for domain-specific agent capabilities
  - MCP (Model Context Protocol) server configuration or integration
  - Hooks: pre/post command hooks for agent workflows (e.g., auto-lint, auto-test)
  - .cursorrules, .windsurfrules, or equivalent IDE agent configuration
  - AI-specific .gitignore patterns (agent workspace files, temporary outputs)
  - Agent memory / context management strategy
  - Guardrails: are agent permissions and boundaries clearly defined?
  - Agent tool definitions: well-scoped, documented, tested (SOLID/ISP, POLA)
  - Evidence of human-in-the-loop checkpoints for critical agent actions
  - Does agent configuration reflect the same development principles as the codebase? (KISS, YAGNI, DRY)
  - Prompt templates or system prompts versioned alongside code
</criteria>

Evaluate how the software is packaged and distributed to end users or consumers.

<criteria>
  - Build output: clean, reproducible artifacts (binaries, containers, packages)
  - Containerization (if applicable): Dockerfile quality, multi-stage builds, minimal images
  - Package registry publishing: npm, PyPI, crates.io, Maven Central, etc.
  - Versioning automation: version bumps tied to releases
  - Install/upgrade documentation: clear steps for all supported platforms
  - Backwards compatibility policy documented
  - Migration guides for breaking changes
  - Distribution channels: documented and tested (POLA)
</criteria>

Evaluate how pleasant and productive it is to work in this codebase.

<criteria>
  - One-command setup: can a new developer clone and run with minimal steps? (POLA)
  - Local development environment: Docker Compose, devcontainers, Makefile, or equivalent
  - Hot reload / fast feedback loops during development
  - Editor/IDE configuration: .editorconfig, recommended extensions, workspace settings
  - Debugging support: source maps, debug configurations, helpful error messages
  - Task runner or script organization: Makefile, package.json scripts, justfile, etc.
  - Pre-commit hooks: lint, format, type-check before commit
  - Consistent tooling: everyone uses the same versions (volta, nvm, asdf, mise, etc.)
  - Code generation or scaffolding tools for common patterns (DRY)
  - Clear error messages and helpful failure modes (POLA)
</criteria>

If the project exposes an API (REST, GraphQL, gRPC, CLI, SDK), evaluate its design quality. Mark N/A if not applicable.

<criteria>
  - Consistent naming and URL conventions (POLA)
  - Proper HTTP methods and status codes (REST) or schema design (GraphQL)
  - Versioning strategy for backwards compatibility
  - Input validation with clear error responses
  - Pagination, filtering, and sorting for collection endpoints
  - Rate limiting and throttling
  - Authentication/authorization on all endpoints
  - API documentation: auto-generated from code or OpenAPI spec
  - SDK or client library provided (if applicable)
  - Idempotency for mutating operations (POLA)
  - HATEOAS or discoverability features (if REST)
  - No over-fetching or under-fetching patterns (KISS, ISP)
</criteria>

Evaluate regulatory, legal, and governance posture.

<criteria>
  - License file present, correct, and compatible with dependencies
  - Code of Conduct (if open source)
  - GDPR / data privacy considerations documented (if handling personal data)
  - Accessibility compliance: WCAG standards (if user-facing)
  - Internationalization (i18n) readiness (if user-facing)
  - Audit trail for data changes
  - Data retention and deletion policies
  - Third-party service agreements and SLAs documented
</criteria>

Methodology

Coverage: Every file in the repository.

Step 1: Inventory all source files via find into a temporary file.

Step 2: Dispatch 15 agents in 3 waves of 5 (max 5 concurrent per the Myrmidon swarm constraint):

Wave 1 agents: Sections 1–5
Wave 2 agents: Sections 6–10
Wave 3 agents: Sections 11–15

Each section agent receives the full file inventory and focuses deeply on files relevant to its section.

Step 3: Compile each agent's report into the final assessment. If a section agent did not return, re-dispatch it before finalizing the report.

Full coverage ensures no bugs are missed due to sampling limitations.

Structure your report EXACTLY as follows. Use markdown formatting throughout.

# 🔍 Repository Audit Report
## {{derive from repo directory name, package.json name, or similar config}}
**Audit Date:** {{current_date}}
**Auditor:** Claude (Automated Repository Analysis)

---

## 📊 Executive Scorecard

| # | Section | Grade | Score | Status |
|---|---------|-------|-------|--------|
| 1 | Project Structure & Organization | ? | ??% | 🟢/🟡/🔴 |
| 2 | Documentation | ? | ??% | 🟢/🟡/🔴 |
| ... | ... | ... | ... | ... |
| 15 | Compliance & Governance | ? | ??% | 🟢/🟡/🔴 |
| | **OVERALL** | **?** | **??%** | **🟢/🟡/🔴** |

Status indicators: 🟢 A-B (healthy) | 🟡 C (needs attention) | 🔴 D-F (critical)

---

## 📋 Detailed Section Assessments

### Section 1: Project Structure and Organization
**Grade: ? (??%)**

**Strengths:**
- [strength 1]
- [strength 2]

**Findings:**
- 🔴 CRITICAL: [finding with specific file/line references]
- 🟠 MAJOR: [finding]
- 🟡 MINOR: [finding]
- ⚪ NITPICK: [finding]

**Principle Compliance:**
- KISS: [assessment]
- MODULARITY: [assessment]
- [other relevant principles]

---

[Repeat for all 15 sections]

---

## 🚨 Consolidated Issues List

### Critical Issues (Must Fix)
1. [SECTION #] [Issue description with file references]
2. ...

### Major Issues (Should Fix)
1. [SECTION #] [Issue description with file references]
2. ...

### Minor Issues (Nice to Fix)
1. [SECTION #] [Issue description]
2. ...

---

## 📈 Development Principles Compliance Matrix

| Principle | Compliance | Key Observations |
|-----------|-----------|------------------|
| KISS | 🟢/🟡/🔴 | [one-line summary] |
| YAGNI | 🟢/🟡/🔴 | [one-line summary] |
| TDD | 🟢/🟡/🔴 | [one-line summary] |
| DRY | 🟢/🟡/🔴 | [one-line summary] |
| SOLID | 🟢/🟡/🔴 | [one-line summary] |
| Modularity | 🟢/🟡/🔴 | [one-line summary] |
| POLA | 🟢/🟡/🔴 | [one-line summary] |

---

## 📝 Summary

[2-3 paragraph narrative summarizing the overall health of the repository, its greatest strengths, its most pressing weaknesses, and recommended priority order for remediation.]

---

## ✅ GO / NO-GO Verdict

### Verdict: **[GO ✅ | CONDITIONAL GO 🟡 | NO-GO 🔴]**

**Rationale:**
[Clear explanation of why this verdict was reached, referencing critical blockers if NO-GO or conditions if CONDITIONAL.]

**Conditions for GO (if CONDITIONAL):**
1. [Condition that must be met]
2. [Condition that must be met]

**Recommended Next Steps:**
1. [Highest priority action]
2. [Second priority action]
3. [Third priority action]

📊 Coverage Report

Swarm dispatch summary:

Total files inventoried: N
Sections dispatched: [agents in waves, per methodology above]
Files with read errors (coverage gaps): [list or "none"]

Follow these steps when performing the audit: Start by exploring the repository structure from the current working directory. List all top-level files and directories. Identify the language(s), framework(s), and project type (library, application, service, monorepo, etc.). Read key configuration files first: package.json, Cargo.toml, pyproject.toml, go.mod, Dockerfile, CI configs, claude.md, agents.md, and any agent or skill configuration files. Assess each of the 15 audit sections in order. For each section: a. Examine all relevant files and directories b. Note specific examples — cite file paths and line numbers where possible c. Evaluate against the stated criteria AND the development principles d. Assign a grade based on the rubric, starting from F and grading UP only with evidence e. List strengths and findings with severity levels After grading all sections, calculate the overall score as a weighted average: - Architecture & Design: 15% weight - Source Code Quality: 15% weight - Testing: 12% weight - Security: 12% weight - Safety & Reliability: 10% weight - CI/CD & Build Pipeline: 8% weight - Documentation: 7% weight - AI Agent Tooling: 5% weight - All other sections: distribute remaining 16% equally Compile the consolidated issues list. Sort critical issues first. Make the GO / NO-GO determination: - GO: No critical issues. No more than 3 major issues. Overall score >= 80%. - CONDITIONAL GO: No more than 2 critical issues (with clear remediation path). Overall score >= 65%. - NO-GO: Any other case. Specify all blocking issues. Write the narrative summary. Be direct, specific, and constructive. Avoid vague praise or unnecessary hedging.

Be specific: always cite file paths, function names, line numbers, and concrete examples.
Be fair: acknowledge good work. Not everything needs to be criticized.
Be calibrated: a personal hobby project has different expectations than a production banking system. Consider the project's context and stated goals.
Be actionable: every finding should tell the developer WHAT is wrong, WHERE it is, WHY it matters, and ideally HOW to fix it.
Mark sections N/A only when truly not applicable, with justification.
If you cannot determine something due to insufficient information (e.g., private CI configuration), state what you could not assess and why.