name: 09-qa description: > Post-build quality assurance. Runs full lint, format, typecheck, security, dependency, data-staging, Modal workspace, and cross-file checks on the entire codebase. Writes docs/qa-report.md with blocking vs advisory findings. Runs in parallel with 10-e2e; results consumed by 11-verify-impl. Advisory remediation is a separate user-requested pass (spawn parallel agents) — not part of the default 09 run.
09 — QA Checks
Final quality assurance pass on the complete codebase after the build is done.
Preamble: pipeline-preamble.md — shared conventions for stages 00–17.
Sessions: sessions-reference.md — requires active_session unless waived; reports under docs/sessions/{id}/reports/.
Cross-cutting: considerations.md, connectivity-gates.md.
State agent: workflow-state-manager — mandatory read/update.
Connectivity (stage 09)
Full-repo QA must report connectivity status in docs/qa-report.md:
- Blocking: H0c (
test_cors_policy.py) and H0i (tests/integration) in Python test agent - Staging: H4–H5 when
VECINITA_STAGING_*_FRONTEND_URLset; else advisory with QA-ID pointing to connectivity-gates
Do not mark QA PASS if H0c fails. See connectivity-gates §Stage 09.
When to Use
- After 07-build completes (or when execution plan shows build substantially done): the "final exam" for anything that slipped past milestone 08-verify-build
- Runs in parallel with 10-e2e when both are scheduled
- Results collected by 11-verify-impl for user sign-off
Difference from 08-verify-build
| Aspect | 08-verify-build | 09-qa |
|---|---|---|
| When | During build, at milestones | After build completes |
| Scope | Changed files / milestone | Entire codebase |
| Auto-correct | Yes (lint/format) | Report only — no fixes |
| Blocking | Non-blocking for auto-fix | Async — summary for 11 |
| Extra checks | No | Cross-file, data staging, Modal workspace, frontends |
| Output | docs/verification-report.md (optional) |
docs/qa-report.md (required) |
Prerequisites
- Build gate: Execution plan tasks for the active phase are
completed(or user explicitly requests QA mid-build — note partial scope in the report). docs/execution-plan.md§Tech Stack Summary — tool commands.workflow-state.yaml§template— conformance checks (Vecinita:api+worker).- Baseline expectation: 08-verify-build recently PASS (re-run full checks anyway; do not assume).
Session management
Per sessions-reference.md §10 and workflow-state-agent-protocol.md.
- Agent
read_contextmust returnactive_session(or blocking deviation). - Current stage must appear in
active_session.routing_planunless user amends plan. - Write stage reports to
active_session.artifacts_dir/reports/when this stage produces a report. - On completion: update routing-plan entry status; mirror
project.stages.{key}via agentupdate. - 00-context exempt from active_session requirement (session opener).
Report:
reports/qa-report.md.
State management
Agent protocol: workflow-state-agent-protocol.md.
Stage key: stages.09-qa.
Invoke workflow-state-manager read_context before any other action; update after each
substep. Do not edit workflow-state.yaml directly.
Update §stages.09-qa:
09-qa:
status: completed
started_at: "<ISO date>"
completed_at: "<ISO date>"
report: docs/qa-report.md
overall: pass | fail | pass_with_advisories
Append docs/qa-report.md to §artifacts when created.
Delta / feature-addition mode
- Scope QA report to affected Fn, apps, and journeys in the active evolve cycle.
- Do not re-audit entire codebase unless user requests full 09 pass.
Workflow
Phase 1 — Configuration
Read from disk (do not guess commands):
| Source | Use for |
|---|---|
docs/execution-plan.md §Tech Stack Summary |
ruff, basedpyright, pytest, pip-audit, vitest |
docs/typing-policy.md |
No Any/any (ADR-018) |
.github/workflows/ci.yml |
CI parity (exact paths, ignore files, frontend matrix) |
docs/data-staging-state.md |
D1–D7 asset status |
workflow-state.yaml §template |
Layout / Modal / deploy pattern |
infra/modal/README.md |
Modal app names, workspace, volumes |
Vecinita default commands (repo root, uv sync --group dev):
uv run ruff check apps packages tests
uv run ruff format --check apps packages tests
uv run basedpyright apps packages tests
uv run pytest tests/unit tests/integration tests/privacy tests/e2e tests/smoke tests/eval
uv run pip-audit # with audit/pip-audit-ignore.txt if present
bash scripts/check_secrets.sh
bash scripts/check_modal_no_database_url.sh
bash scripts/check_openapi_specs.sh
Frontends (each app dir):
cd apps/chat-rag-frontend && npm ci && npm run lint && npm test -- --run
cd apps/data-management-frontend && npm ci && npm run lint && npm test -- --run
Coverage gate (CI parity — run from repo root): make test-unit-coverage enforces the
95% frontend branch gate (and Python unit coverage); for a frontend-only change use
cd apps/<app> && npm run test:coverage (no DB needed). Plain vitest run does not
enforce coverage, so QA must run this to match CI's coverage job — a separate required CI
job distinct from the frontend matrix.
Phase 2 — Run QA Checks
Prefer one message with parallel shell/tool work (or Task subagents) for independent checks. Do not block the whole QA on optional live Modal/staging unless env vars are set.
Agent 1 — Linter (full codebase)
uv run ruff check apps packages tests- Optional:
uv run ruff check --select F401,F841for unused imports/vars - Return: total issues by rule; PASS if zero
Agent 2 — Formatter
uv run ruff format --check apps packages tests- Return: count of files needing format; PASS if zero
Agent 3 — Typechecker
uv run basedpyright apps packages tests- Return: error/warning count; PASS if zero errors
Agent 4 — Test suite (Python + frontend)
Python: same pytest paths as CI (see Phase 1).
Frontend: Vitest + ESLint per app in CI matrix.
Return:
- Python: passed / failed / skipped (list skipped reasons if env-gated)
- Frontend: per-app pass/fail
- Coverage gate (
make test-unit-coverage): pass/fail vs the 95% FE branch threshold - Note non-blocking warnings (e.g. Pydantic/LlamaIndex
validate_default)
PASS if zero failures and the coverage gate passes (skips are OK when documented).
Agent 5 — Security
Run three layers; classify each finding as blocking or advisory:
| Layer | Command / tool | Blocking when |
|---|---|---|
| CVEs | uv run pip-audit (+ ignore file) |
High/critical on PyPI-resolved deps |
| Current tree secrets | bash scripts/check_secrets.sh |
Any match in apps/, packages/, tests/, infra/, openapi/ |
| Working-tree gitleaks | gitleaks detect --no-git --config .gitleaks.toml |
Any leak in current files (if gitleaks installed) |
| Dangerous patterns | ripgrep pickle.loads, eval(, exec( in apps/ + packages/ |
Any match in app code |
Git history (advisory only):
gitleaks detecton full history may report dozens of hits in deleted legacy paths.- Split results: count hits in current working tree vs history-only.
- Do not recommend
git filter-repounless user asks or real live credentials were committed. - Document resolution in report → point to
docs/security/gitleaks-resolution.mdif it exists.
Expected non-blocking: workspace packages vecinita-* skipped by pip-audit (not on PyPI).
Agent 6 — Cross-file analysis
| Check | How | Severity |
|---|---|---|
| Unused imports | Ruff F401/F841 | Blocking if >0 |
| Circular deps | Workspace package import graph (apps/*, packages/*) |
Blocking if cycles |
| Dead code | Optional (vulture); if not run, note SKIPPED |
Advisory |
| Public docstrings | AST scan: public defs/classes without docstring in apps/, packages/ |
Advisory (style) |
| Naming | Project rules / ruff N rules if enabled | Advisory |
Agent 7 — Dependency health
pip list --outdated— flag LlamaIndex/pinned stacks perdocs/dependency-inventory.md- Heuristic import vs
pyproject.tomlper workspace member — ignore__future__, workspace packages - Return: outdated count (advisory if pins intentional), missing (blocking if real)
Agent 8 — Template & platform conformance (Vecinita api+worker)
Read template-registry.md and verify:
| Criterion | Vecinita expectation |
|---|---|
| Layout | apps/*, packages/*, tests/, openapi/, infra/ |
| Modal isolation | import modal only under infra/modal/ |
| Modal workspace | Deploy scripts use vecinita profile (see scripts/modal_ensure_workspace.sh) |
| Deploy URLs | https://vecinita--vecinita-* — not fontface-- or other workspace prefixes |
| DO backends | No DATABASE_URL in Modal worker paths (scripts/check_modal_no_database_url.sh) |
| OpenAPI | openapi/*.yaml present and parse (scripts/check_openapi_specs.sh) |
| CI | .github/workflows/ci.yml matches Phase 1 commands |
Modal live checks (advisory unless env set):
- If
VECINITA_MODAL_EMBED_URL/VECINITA_MODAL_LLM_URLset:GET /health, embed dim 384 - Else: read
docs/data-staging-state.mdfor D6/D7 status
Agent 9 — Data staging & deploy readiness (Vecinita)
Read docs/data-staging-state.md and docs/execution-plan.md Phase 4 gate:
| Asset / gate | Report |
|---|---|
| D1–D5 fixtures/migrations | Must be verified for full data QA PASS |
| D6 FastEmbed / D7 Qwen | verified | staged_procedure | pending |
| Phase 4 H1–H3 live | Advisory if VECINITA_STAGING_CHAT_URL unset — point to docs/staging-runbook.md, scripts/deploy/staging_smoke.sh |
| H0c CORS / connectivity | Blocking: pytest tests/unit/test_cors_policy.py must pass in CI |
| H4–H5 live | Advisory if staging frontend URLs unset — scripts/deploy/verify_connectivity.sh, connectivity-gates.md |
Modal secret vecinita-data-management |
Note if data_management_app deploy requires it |
Phase 3 — Compile results
Overall status:
- PASS — all blocking checks green; advisories may remain
- FAIL — any blocking check failed
- pass_with_advisories — use in
workflow-state.yamlwhen blocking PASS but advisories need 11-verify-impl review
Short summary block (always include in report header):
QA Results:
Lint: [PASS/FAIL] — [N] issues
Format: [PASS/FAIL] — [N] files
Typecheck: [PASS/FAIL] — [N] errors
Tests (Python): [PASS/FAIL] — [N] passed, [N] skipped, [N] failed
Tests (FE): [PASS/FAIL] — [N] passed per app
Coverage gate: [PASS/FAIL] — FE branch ≥95% (make test-unit-coverage)
Security: [PASS/FAIL] — [N] CVEs; [N] secrets (tree); [N] history (advisory)
Cross-file: [N] unused imports; [N] cycles; [N] docstrings missing (advisory)
Dependencies: [N] outdated (advisory); [N] missing
Template: [PASS/FAIL]
Data / Modal: D6 [status]; D7 [status]; workspace [vecinita|WRONG]
Assign QA-00N IDs to advisories in §Findings for 11-verify-impl (see docs/qa-report.md examples).
Phase 4 — Write report
Write docs/qa-report.md with:
- Executive summary table (blocking vs advisory)
- Commands run (copy-paste reproducible)
- Per-check details (verbatim tool output for failures)
- Findings for 11-verify-impl table (ID, severity, finding, suggested action)
- Data integrity / Modal workspace subsection when relevant
- Phase / execution-plan alignment (deferred gates called out explicitly)
Do not auto-fix code, rewrite git history, or deploy to Modal in the default 09 run.
Advisory remediation (separate invocation)
09-qa is assessment only. When the user asks to address advisories after a QA report, use a follow-up pass (not a re-run of 09):
- Read
docs/qa-report.md§Findings. - Spawn parallel Task agents (one concern per agent), e.g.:
- D6/D7 —
scripts/stage_modal_weights.sh,scripts/modal_ensure_workspace.sh, updatedocs/data-staging-state.md; enforce vecinita Modal profile; stop mistaken deploys on other workspaces viascripts/teardown_fontface_vecinita.sh - Secrets history —
.gitleaks.toml,scripts/check_secrets.sh, CI--no-git,docs/security/gitleaks-resolution.md(no history rewrite unless user requests) - Docstrings — public symbols in
apps/+packages/; verify ruff/basedpyright (noAny) - Phase 4 staging — H2 in
staging_smoke.sh,docs/staging-runbook.md, skip-safetests/smoke/test_staging_health.py
- D6/D7 —
- Re-run blocking checks only for touched areas; update
docs/qa-report.mdor add a short QA remediation note with date. - Modal workspace rule: all Vecinita deploys → profile
vecinita; URLs must usevecinita--prefix; document ininfra/modal/env.example.
Do not bundle unrelated fixes into one agent.
Output rules
- Report only in the default 09 run — no auto-fix, no commits unless user asks.
- Full codebase scope — not just diff since last milestone.
- Distinguish blocking vs advisory — especially security (tree vs history) and data staging.
- No AskQuestion in 09 — surface choices in
docs/qa-report.mdfor 11-verify-impl. - Async-safe — report is self-contained; include exact commands and env prerequisites.
- CI parity — if local PASS but CI differs, note branch/workflow mismatch as advisory.
Common Vecinita advisories (reference)
Use as checklist when writing findings; not all apply every run.
| ID pattern | Typical finding | Typical remediation (user-requested) |
|---|---|---|
| QA-003 | D6/D7 pending on Modal volumes | stage_modal_weights.sh; verify embed 384-dim; vecinita workspace |
| QA-004 | Outdated transitive deps (LlamaIndex pins) | ADR + pip-audit after intentional bump |
| QA-005 | gitleaks hits in git history only | .gitleaks.toml + CI --no-git; no filter-repo |
| QA-002 | Public symbols without docstrings | Doc pass in apps/ + packages/ |
| QA-006 | Phase 4 H1–H3 deferred (no staging URLs) | staging-runbook.md, env-gated smoke tests |
| QA-007 | Modal apps on wrong workspace (fontface--) |
modal_ensure_workspace.sh, teardown script, redeploy |
Handoff to 11-verify-impl
11-verify-impl should:
- Walk user through blocking FAIL items first
- Present advisories with approve / defer / fix-now (fix-now → advisory remediation pass above)
- Cross-check
docs/qa-report.mdagainstdocs/feature-list.mdand acceptance criteria - Not re-run full 09 unless codebase changed materially since report date