name: banker-preflip-validation
description: >
Read-only pre-flag-flip gate for the banker Q&A workflow shipped dormant in PR #178.
Walks the operator through the documented pre-flip checklist before flipping
BANKER_QA_OUTPUT=true in production. Tier 1 (offline, ~30s): gold-fixture parse-back
via node scripts/run-bankerqa-isolated.mjs --dry (no API, no cost). Tier 2 (static,
~1m): bash scripts/g4-readiness.sh with Check B asserting all 5 banker-qa alert rules
point at real, registry-wired metrics. Tier 3 (live, billable): ONE full non-Cardinal
banker session under WRAPPED_SUBAGENTS=true + BANKER_QA_OUTPUT=true — assert dispatch
emits mcp__subagents__run_banker_*, the 9 banker artifacts land, the alert metrics get
live series on /metrics, and frontend banker-mode renders. Emits a GO/NO-GO verdict.
Never flips the flag itself; references docs/runbooks/g4-rollback-playbook.md for rollback.
Triggers: "banker preflip", "validate banker mode", "pre-flag-flip gate", "preflip
banker", "is banker mode ready to flip", "/banker-preflip-validation".
Banker Pre-Flip Validation
A thin operator-guidance gate for the BANKER_QA_OUTPUT=true production flip.
PR #178 shipped the banker Q&A workflow dormant (flag default off). The documented pre-flip gate is: one full non-Cardinal banker session passes AND the G4 alerts are real before flipping. The underlying scripts already exist — this skill wraps them into a single falsifiable checklist and a GO/NO-GO verdict.
This skill is read-only. It runs validation scripts and probes /metrics. It does
NOT edit flags.env, redeploy, or flip the flag. The operator flips the flag manually
(via /deploy + g4-operator-enable-disable.md § A) only after all tiers pass.
Configuration
Project Directory
/Users/ej/Super-Legal/super-legal-mcp-refactored — run all script commands from here
(both scripts resolve REPO_ROOT relative to themselves, so cwd inside the repo is fine).
Base URL Resolution (Tier 3 only)
- If
--url <url>given, use that. - Else
cat scripts/.staging-ip→http://<ip>:3001. - Else
http://localhost:3001.
Invocation
/banker-preflip-validation— run Tier 1 + Tier 2 (offline + static), print Tier 3 plan./banker-preflip-validation --tier 1— gold-fixture parse-back only./banker-preflip-validation --tier 2— G4 readiness + alert-metric reality check only./banker-preflip-validation --tier 3 --url http://<ip>:3001— drive + verify the live session.
Tier 3 is billable (1–2 Opus 4.8 agent calls + one full banker session) and requires a
staging deployment with WRAPPED_SUBAGENTS=true already on. Run it deliberately, not on a loop.
Tier 1 — Gold-Fixture Parse-Back (offline, ~30s, no API)
Proves the validation path end-to-end against the Cardinal gold banker-question-answers.md
without spending a token. This is the cheapest falsifiable signal that the parser/validator
contract is intact.
node scripts/run-bankerqa-isolated.mjs --dry
| Check | Pass criteria |
|---|---|
| Gold fixture parses clean | Script logs PASS — gold fixture is parser-clean and exits 0 |
| Validator stats non-empty | stats block shows the expected Q-block count (from specialist-coverage-state.json) |
FAIL (exit 1) here means the validator itself regressed (not a model issue) — investigate
src/utils/knowledgeGraph/bankerQaValidator.js before anything else. Do not proceed.
Tier 2 — G4 Readiness + Alert-Metric Reality (static, ~1m)
Runs the full G4 operational-readiness gate and, critically, asserts that the banker-qa alerts reference real metrics — closing the false-green where an alert points at a metric that exists nowhere (Prometheus then shows "no data" forever and never fires).
bash scripts/g4-readiness.sh --static-only
# or, with a live staging target so Check B also probes /metrics:
BASE_URL=http://<ip>:3001 bash scripts/g4-readiness.sh
| Check | Pass criteria |
|---|---|
| Overall G4 verdict | Script exits 0 (no FAIL lines). SKIPs that require a staging shell are acceptable at this tier. |
| Check B — 5 alerts defined | All 5 alert rules present in prometheus/alerts-banker-qa.yml: BankerQAWriterFailure, BankerIntakeAnalystFailure, BankerQACoverageFail, BankerCertifierReject, BankerKGPhase1bLatency |
| Check B — alert metrics wired in registry | Every metric the 5 alerts reference is declared in src/utils/sdkMetrics.js (script prints metric wired in registry: <m> per metric). The 4 base metrics are claude_gate_check_results_total, claude_qa_dimension_score, claude_kg_phase_duration_ms, claude_kg_circuit_breaker_state. Any metric NOT in registry (dead alert) line = NO-GO. |
| Check B — alert metrics live (if BASE_URL set) | Each metric either emits on /metrics or is reported as not-yet-emitted (a SKIP, acceptable pre-session — it will populate in Tier 3). |
| Rollback playbook present | docs/runbooks/g4-rollback-playbook.md § A/B/C all pass. |
NO-GO if the script exits non-zero, or if any alert is missing, or any alert metric is absent from the registry (a "dead alert" can never fire — that defeats the entire monitoring gate).
Tier 3 — Live Non-Cardinal Banker Session (billable)
The keystone check. Drive ONE full banker session on a non-Cardinal deal (Cardinal is the gold fixture — using it would be circular; use a different deal so coverage/extraction is genuinely exercised) on staging with both flags on:
WRAPPED_SUBAGENTS=true
BANKER_QA_OUTPUT=true
Then assert all four of the following. Any miss = NO-GO.
| # | Assertion | How to verify | Pass criteria |
|---|---|---|---|
| T3.1 | Dispatch fires the banker agents | hook_audit_log rows / SSE for the session show tool_name LIKE 'mcp__subagents__run_banker_%' |
≥1 invocation each of mcp__subagents__run_banker_intake_analyst and mcp__subagents__run_banker_qa_writer (and ..._banker_specialist_coverage_validator if the deal triggers coverage validation) |
| T3.2 | The 9 banker artifacts are produced | List the session dir under reports/<session>/ |
All 9 present: banker-deal-context.json, banker-prohibited-assumptions.json, banker-intake-state.json, banker-questions-presented.md, banker-question-answers.md, banker-qa-state.json, banker-qa-metadata.json, banker-qa.md, plus specialist-coverage-state.json. banker-question-answers.md must pass the parse-back validator (re-run Tier 1's validator against the new file, not the gold). |
| T3.3 | Alert metrics now have live series | curl -s $URL/metrics | grep -E '^(claude_gate_check_results_total|claude_qa_dimension_score|claude_kg_phase_duration_ms|claude_kg_circuit_breaker_state)' |
All 4 base metrics emit at least one series after the session (e.g. claude_qa_dimension_score{dimension="dim_13..."}, claude_kg_phase_duration_ms{phase="KG-Phase1b"...}). Confirms the alerts will actually evaluate against real data. |
| T3.4 | Frontend banker-mode renders | Load the session in the dashboard (test/react-frontend/) |
Banker Q&A view renders the questions/answers/confidence without console errors; the banker artifacts hydrate. |
See references/preflip-checklist.md for the per-assertion query/probe commands and the
exact 9-artifact + dispatch-tool reference.
GO / NO-GO Gate
GO — flip BANKER_QA_OUTPUT=true in prod only when all three tiers pass:
- ✅ Tier 1:
--dryexits 0 (validator path proven). - ✅ Tier 2:
g4-readiness.shexits 0; all 5 alerts defined; all 4 alert metrics wired in the registry (no dead alerts). - ✅ Tier 3: dispatch emits
mcp__subagents__run_banker_*; 9 artifacts produced + parser-clean; 4 alert metrics have live/metricsseries; frontend banker-mode renders.
NO-GO on any failure. Do not flip. Common blockers:
| Symptom | Verdict | Action |
|---|---|---|
| Tier 1 FAIL | NO-GO | Validator regression — fix bankerQaValidator.js, do not touch the flag |
| Tier 2 "dead alert" line | NO-GO | Wire the metric in src/utils/sdkMetrics.js (alerts that can't fire defeat the gate) |
| Tier 3 artifact missing | NO-GO | Writer/intake contract gap — diagnose via session-diagnostics <session> |
| Tier 3 metric has no live series | NO-GO | Metric registered but not emitted on the banker path — emission bug, not a flag-readiness state |
| Tier 3 frontend render error | NO-GO | Frontend banker-mode hydration bug |
When the flip is later made by the operator, the enable sequence is in
docs/runbooks/g4-operator-enable-disable.md § A. If anything degrades after the flip, roll
back per docs/runbooks/g4-rollback-playbook.md:
- § A — soft-disable (flip flag off + redeploy; orphan banker data is safe to leave).
- § B — hard-rollback (DB + GCS WORM constraints).
- § C — orphan-data behavior post-flag-off.
Read-Only Guarantee
This skill runs run-bankerqa-isolated.mjs --dry, g4-readiness.sh, curl /metrics, and
read-only DB/SSE inspection. It does not edit flags.env, redeploy, run DML, or flip
BANKER_QA_OUTPUT. Tier 3 drives one real billable banker session on staging — that is the
only side effect (a scratch staging session + 1–2 Opus 4.8 calls), and it touches staging only.
Pre-flight
which node # required for Tier 1
which bash # required for Tier 2
which curl # required for Tier 3 /metrics probe
which jq # G4 baselines branch check + /metrics parsing