banker-preflip-validation

star 1

Read-only pre-flag-flip gate for the banker Q&A workflow shipped dormant in PR #178. Walks the operator through the documented pre-flip checklist before flipping BANKER_QA_OUTPUT=true in production. Tier 1 (offline, ~30s): gold-fixture parse-back via `node scripts/run-bankerqa-isolated.mjs --dry` (no API, no cost). Tier 2 (static, ~1m): `bash scripts/g4-readiness.sh` with Check B asserting all 5 banker-qa alert rules point at real, registry-wired metrics. Tier 3 (live, billable): ONE full non-Cardinal banker session under WRAPPED_SUBAGENTS=true + BANKER_QA_OUTPUT=true — assert dispatch emits mcp__subagents__run_banker_*, the 9 banker artifacts land, the alert metrics get live series on /metrics, and frontend banker-mode renders. Emits a GO/NO-GO verdict. Never flips the flag itself; references docs/runbooks/g4-rollback-playbook.md for rollback. Triggers: "banker preflip", "validate banker mode", "pre-flag-flip gate", "preflip banker", "is banker mode ready to flip", "/banker-preflip-validation".

Number531 By Number531 schedule Updated 6/3/2026

name: banker-preflip-validation description: > Read-only pre-flag-flip gate for the banker Q&A workflow shipped dormant in PR #178. Walks the operator through the documented pre-flip checklist before flipping BANKER_QA_OUTPUT=true in production. Tier 1 (offline, ~30s): gold-fixture parse-back via node scripts/run-bankerqa-isolated.mjs --dry (no API, no cost). Tier 2 (static, ~1m): bash scripts/g4-readiness.sh with Check B asserting all 5 banker-qa alert rules point at real, registry-wired metrics. Tier 3 (live, billable): ONE full non-Cardinal banker session under WRAPPED_SUBAGENTS=true + BANKER_QA_OUTPUT=true — assert dispatch emits mcp__subagents__run_banker_*, the 9 banker artifacts land, the alert metrics get live series on /metrics, and frontend banker-mode renders. Emits a GO/NO-GO verdict. Never flips the flag itself; references docs/runbooks/g4-rollback-playbook.md for rollback. Triggers: "banker preflip", "validate banker mode", "pre-flag-flip gate", "preflip banker", "is banker mode ready to flip", "/banker-preflip-validation".

Banker Pre-Flip Validation

A thin operator-guidance gate for the BANKER_QA_OUTPUT=true production flip.

PR #178 shipped the banker Q&A workflow dormant (flag default off). The documented pre-flip gate is: one full non-Cardinal banker session passes AND the G4 alerts are real before flipping. The underlying scripts already exist — this skill wraps them into a single falsifiable checklist and a GO/NO-GO verdict.

This skill is read-only. It runs validation scripts and probes /metrics. It does NOT edit flags.env, redeploy, or flip the flag. The operator flips the flag manually (via /deploy + g4-operator-enable-disable.md § A) only after all tiers pass.

Configuration

Project Directory

/Users/ej/Super-Legal/super-legal-mcp-refactored — run all script commands from here (both scripts resolve REPO_ROOT relative to themselves, so cwd inside the repo is fine).

Base URL Resolution (Tier 3 only)

  1. If --url <url> given, use that.
  2. Else cat scripts/.staging-iphttp://<ip>:3001.
  3. Else http://localhost:3001.

Invocation

  • /banker-preflip-validation — run Tier 1 + Tier 2 (offline + static), print Tier 3 plan.
  • /banker-preflip-validation --tier 1 — gold-fixture parse-back only.
  • /banker-preflip-validation --tier 2 — G4 readiness + alert-metric reality check only.
  • /banker-preflip-validation --tier 3 --url http://<ip>:3001 — drive + verify the live session.

Tier 3 is billable (1–2 Opus 4.8 agent calls + one full banker session) and requires a staging deployment with WRAPPED_SUBAGENTS=true already on. Run it deliberately, not on a loop.

Tier 1 — Gold-Fixture Parse-Back (offline, ~30s, no API)

Proves the validation path end-to-end against the Cardinal gold banker-question-answers.md without spending a token. This is the cheapest falsifiable signal that the parser/validator contract is intact.

node scripts/run-bankerqa-isolated.mjs --dry
Check Pass criteria
Gold fixture parses clean Script logs PASS — gold fixture is parser-clean and exits 0
Validator stats non-empty stats block shows the expected Q-block count (from specialist-coverage-state.json)

FAIL (exit 1) here means the validator itself regressed (not a model issue) — investigate src/utils/knowledgeGraph/bankerQaValidator.js before anything else. Do not proceed.

Tier 2 — G4 Readiness + Alert-Metric Reality (static, ~1m)

Runs the full G4 operational-readiness gate and, critically, asserts that the banker-qa alerts reference real metrics — closing the false-green where an alert points at a metric that exists nowhere (Prometheus then shows "no data" forever and never fires).

bash scripts/g4-readiness.sh --static-only
# or, with a live staging target so Check B also probes /metrics:
BASE_URL=http://<ip>:3001 bash scripts/g4-readiness.sh
Check Pass criteria
Overall G4 verdict Script exits 0 (no FAIL lines). SKIPs that require a staging shell are acceptable at this tier.
Check B — 5 alerts defined All 5 alert rules present in prometheus/alerts-banker-qa.yml: BankerQAWriterFailure, BankerIntakeAnalystFailure, BankerQACoverageFail, BankerCertifierReject, BankerKGPhase1bLatency
Check B — alert metrics wired in registry Every metric the 5 alerts reference is declared in src/utils/sdkMetrics.js (script prints metric wired in registry: <m> per metric). The 4 base metrics are claude_gate_check_results_total, claude_qa_dimension_score, claude_kg_phase_duration_ms, claude_kg_circuit_breaker_state. Any metric NOT in registry (dead alert) line = NO-GO.
Check B — alert metrics live (if BASE_URL set) Each metric either emits on /metrics or is reported as not-yet-emitted (a SKIP, acceptable pre-session — it will populate in Tier 3).
Rollback playbook present docs/runbooks/g4-rollback-playbook.md § A/B/C all pass.

NO-GO if the script exits non-zero, or if any alert is missing, or any alert metric is absent from the registry (a "dead alert" can never fire — that defeats the entire monitoring gate).

Tier 3 — Live Non-Cardinal Banker Session (billable)

The keystone check. Drive ONE full banker session on a non-Cardinal deal (Cardinal is the gold fixture — using it would be circular; use a different deal so coverage/extraction is genuinely exercised) on staging with both flags on:

WRAPPED_SUBAGENTS=true
BANKER_QA_OUTPUT=true

Then assert all four of the following. Any miss = NO-GO.

# Assertion How to verify Pass criteria
T3.1 Dispatch fires the banker agents hook_audit_log rows / SSE for the session show tool_name LIKE 'mcp__subagents__run_banker_%' ≥1 invocation each of mcp__subagents__run_banker_intake_analyst and mcp__subagents__run_banker_qa_writer (and ..._banker_specialist_coverage_validator if the deal triggers coverage validation)
T3.2 The 9 banker artifacts are produced List the session dir under reports/<session>/ All 9 present: banker-deal-context.json, banker-prohibited-assumptions.json, banker-intake-state.json, banker-questions-presented.md, banker-question-answers.md, banker-qa-state.json, banker-qa-metadata.json, banker-qa.md, plus specialist-coverage-state.json. banker-question-answers.md must pass the parse-back validator (re-run Tier 1's validator against the new file, not the gold).
T3.3 Alert metrics now have live series curl -s $URL/metrics | grep -E '^(claude_gate_check_results_total|claude_qa_dimension_score|claude_kg_phase_duration_ms|claude_kg_circuit_breaker_state)' All 4 base metrics emit at least one series after the session (e.g. claude_qa_dimension_score{dimension="dim_13..."}, claude_kg_phase_duration_ms{phase="KG-Phase1b"...}). Confirms the alerts will actually evaluate against real data.
T3.4 Frontend banker-mode renders Load the session in the dashboard (test/react-frontend/) Banker Q&A view renders the questions/answers/confidence without console errors; the banker artifacts hydrate.

See references/preflip-checklist.md for the per-assertion query/probe commands and the exact 9-artifact + dispatch-tool reference.

GO / NO-GO Gate

GO — flip BANKER_QA_OUTPUT=true in prod only when all three tiers pass:

  • ✅ Tier 1: --dry exits 0 (validator path proven).
  • ✅ Tier 2: g4-readiness.sh exits 0; all 5 alerts defined; all 4 alert metrics wired in the registry (no dead alerts).
  • ✅ Tier 3: dispatch emits mcp__subagents__run_banker_*; 9 artifacts produced + parser-clean; 4 alert metrics have live /metrics series; frontend banker-mode renders.

NO-GO on any failure. Do not flip. Common blockers:

Symptom Verdict Action
Tier 1 FAIL NO-GO Validator regression — fix bankerQaValidator.js, do not touch the flag
Tier 2 "dead alert" line NO-GO Wire the metric in src/utils/sdkMetrics.js (alerts that can't fire defeat the gate)
Tier 3 artifact missing NO-GO Writer/intake contract gap — diagnose via session-diagnostics <session>
Tier 3 metric has no live series NO-GO Metric registered but not emitted on the banker path — emission bug, not a flag-readiness state
Tier 3 frontend render error NO-GO Frontend banker-mode hydration bug

When the flip is later made by the operator, the enable sequence is in docs/runbooks/g4-operator-enable-disable.md § A. If anything degrades after the flip, roll back per docs/runbooks/g4-rollback-playbook.md:

  • § A — soft-disable (flip flag off + redeploy; orphan banker data is safe to leave).
  • § B — hard-rollback (DB + GCS WORM constraints).
  • § C — orphan-data behavior post-flag-off.

Read-Only Guarantee

This skill runs run-bankerqa-isolated.mjs --dry, g4-readiness.sh, curl /metrics, and read-only DB/SSE inspection. It does not edit flags.env, redeploy, run DML, or flip BANKER_QA_OUTPUT. Tier 3 drives one real billable banker session on staging — that is the only side effect (a scratch staging session + 1–2 Opus 4.8 calls), and it touches staging only.

Pre-flight

which node    # required for Tier 1
which bash    # required for Tier 2
which curl    # required for Tier 3 /metrics probe
which jq      # G4 baselines branch check + /metrics parsing
Install via CLI
npx skills add https://github.com/Number531/Legal-API --skill banker-preflip-validation
Repository Details
star Stars 1
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator