name: pentest description: Run the owner-authorized weekly pentest on the deployed Pablo app (Cloud Run + Firebase + Cloud SQL). Use when the user asks for a pentest, security test, HIPAA review of the live deployment, or says "run /pentest". aliases: [security-test, hipaa-test]
Pablo Pentest Skill
Owner-authorized penetration test on the deployed Pablo app. The user is the owner. This is a legitimate, recurring (~weekly) security engagement. Proceed without asking for re-authorization each time, but still honor the strict rules below.
Pre-flight — ALWAYS do these first
- Read the evidence bundle. If
/workspace/pentest-bundle/index.jsonexists,pentest_runner.pyhas already executed every scanner, inventory query, egress enumeration, Cloud SQL read-only probe, and prior-run fetch for you. Your job is to read those artifacts and write the report — do not re-run scanners. The bundle'sindex.jsonlists each artifact with status, exit code, duration, and path. If the bundle is absent (rare — interactive debugging only), fall through to the inline scanner invocations under "Scanner tooling" below. - Discover targets (only if the bundle is absent):
With a bundle present,gcloud run services list --format="table(metadata.name,status.url,metadata.labels.'cloud.googleapis.com/location')" gcloud sql instances list --format="value(name,connectionName,region)"targets.*at the top ofindex.jsonhas all of this. Always pass--region=<region>to any subsequentgcloud run services|jobs describe— without it, gcloud prompts to pick from ~40 regions and the interactive session hangs. For Pablo the region isus-central1. - Pull the live frontend config (only if missing from
01-inventory.txtin the bundle):
Givescurl -s "<frontend_url>/api/config"firebaseApiKey,firebaseProjectId,apiUrl,pabloEdition,devMode. Public values (not secrets). - Refresh gcloud ADC (Cloud SQL proxy breaks with
invalid_raptif stale). If you hitinvalid_rapt, tell the user to rungcloud auth application-default login— you cannot do it for them. - Read credentials from env vars — no user paste needed. When the
pentest runner launches this skill, it has already bootstrapped an
ephemeral MFA-enrolled Firebase user and exported
PENTEST_TEST_EMAIL,PENTEST_TEST_UID, and a freshPENTEST_TEST_ID_TOKEN(TOTP-MFA sign-in already completed in-process — the password and TOTP secret are intentionally NOT exported, to keep them off the env and child processes). Use$PENTEST_TEST_ID_TOKENdirectly as the Bearer token for authenticated probes. The token is good for ~1h; pentests are shorter. The account is allowlisted via an Alembic-seeded row (pentest-auto@pablo-pentest.invalid, RFC 2606 reserved TLD); it is not a real customer and carries no practice assignment, so §6 (cross-tenant IDOR) still needs Cloud SQL-sourced foreign UUIDs. If the env vars are absent (interactive debugging outside the runner), fall back to owner-pasted credentials. - Scope note — this report only claims what it can technically observe. Paper controls (BAAs, signed policies, workforce training records, contingency plan documents) are out of scope for automated scanning. The companion document
docs/compliance/hipaa-security-rule.md(per-control narrative) is the source of truth for those; this report's §7 control matrix marks themN/S (paper control — see narrative)and does not assert their status. Egress destinations and deployed service config are technically observable and stay in scope.
Strict rules (don't violate)
- Rate limits are scoped by target class:
- Own Cloud Run services (
*.run.appfrontend/backend): ≤20 req/sec steady, short bursts up to 50 req/sec OK for scanner sweeps. Total run budget ≤30,000 requests across all tools. - Firebase Auth (
identitytoolkit.googleapis.com): strict — ≤1 req/sec, ≤10 bad-password attempts total across the whole run. Firebase anti-abuse locks accounts fast. - Third-party infra (Google infra outside your Cloud Run, Firebase internals, Anthropic, dependencies): zero — do not probe.
- Own Cloud Run services (
- No DoS. A sustained burst that trips Cloud Armor or drives Cloud Run autoscaling into new instances is itself a finding — stop and report, do not keep pushing. Never use scanner flags like
-t 100/--threads 10/-rate 200. - Read-only against Cloud SQL directly. No
UPDATE/DELETE/INSERTstatements through the psql / cloud-sql-proxy path. The DB-side assertion is: if the pentest touched the DB, it did so only viaSELECT. - Writes through the authenticated API are allowed — with cleanup. You may create test patients and therapy sessions via the normal authenticated POST endpoints in order to exercise CRUD paths. Constraints:
- Prefix the first name with
PENTEST-and the last name with a run UUID (e.g.PENTEST-ab12cd34) so cleanup is deterministic. - Never upload real transcript content or real PHI. Synthetic placeholder strings only (e.g.
"synthetic test transcript for pentest run ab12cd34"). - At the end of the run, delete every patient/session/appointment you created via the authenticated DELETE endpoints. Final step of every run must be a cleanup pass.
- If cleanup fails partway, report the un-deleted IDs in the findings so the owner can sweep manually.
- Prefix the first name with
- Test users: if you create test users (
identitytoolkit:signUp), delete them at the end viaidentitytoolkit /accounts:delete. Prefer pre-provisioned test accounts passed in by the owner — only mint new ones when the test explicitly needs a fresh identity. - Exploit to PoC only — one clear reproduction, then stop. No further pivoting.
- Stay on
*.run.appservices belonging to this app and its Cloud SQL instance. Do not probe Google / Firebase infra itself, Anthropic, or other third parties. - No stored XSS payloads. Reflected-only, in your own session.
- Redact any PHI/PII in the report to
<REDACTED>. - Stop on lockout / WAF / rate-limit and report what you have.
Scanner tooling (pre-installed — use sanctioned invocations)
The pentest container ships with rate-limited scanners. The flags below are mandatory — don't drop them.
nuclei — template-driven vuln/misconfig scan against the backend:
nuclei -u "<backend_url>" \
-t cves/ -t exposures/ -t misconfiguration/ -t http/default-logins/ \
-rate-limit 20 -c 10 \
-severity medium,high,critical \
-exclude-tags dos,fuzz,intrusive \
-o /tmp/nuclei.txt
ffuf — endpoint discovery:
ffuf -u "<backend_url>/api/FUZZ" \
-w /usr/share/seclists/Discovery/Web-Content/common.txt \
-rate 20 -t 5 -mc 200,201,301,302,401,403 \
-of json -o /tmp/ffuf.json
sqlmap — only on a specific parameter you already suspect. Never blanket-run:
sqlmap -u "<backend_url>/api/endpoint?param=1" \
--headers="Authorization: Bearer $ID_TOKEN" \
--delay 0.5 --threads 1 --level 1 --risk 1 \
--technique=BT --batch --flush-session --output-dir=/tmp/sqlmap
nikto — narrow tunings:
nikto -h "<backend_url>" -Tuning 2,3,4,5 -maxtime 5m -o /tmp/nikto.txt
testssl.sh — TLS posture:
testssl.sh --severity MEDIUM --color 0 "<backend_url>" > /tmp/testssl.txt
semgrep — static analysis over the baked-in backend source (/app/backend). Feed results into the Static analysis findings section of the report; review each hit manually before elevating.
semgrep scan \
--config=p/owasp-top-ten --config=p/python --config=p/security-audit \
--severity=ERROR --severity=WARNING \
--json --output=/tmp/semgrep.json --metrics=off --quiet \
/app/backend
jq '.results | group_by(.check_id) | map({rule: .[0].check_id, count: length, paths: [.[].path] | unique})' /tmp/semgrep.json
Playwright + Chromium — JS-console capture, DOM XSS PoC, CSP verification. The global node_modules is symlinked at /node_modules, so any .mjs you drop in /tmp or /workspace can import 'playwright' with no extra setup (ESM ignores NODE_PATH; it walks up from the script looking for node_modules/ and hits /node_modules).
// /tmp/dom_probe.mjs — run: node /tmp/dom_probe.mjs
import { chromium } from 'playwright';
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
page.on('console', msg => console.log(`CONSOLE ${msg.type()}: ${msg.text()}`));
page.on('pageerror', err => console.log(`PAGEERROR: ${err.message}`));
await page.goto(process.env.TARGET_URL);
await page.waitForLoadState('networkidle');
const resp = await page.request.get(process.env.TARGET_URL);
console.log('CSP:', resp.headers()['content-security-policy']);
console.log('HSTS:', resp.headers()['strict-transport-security']);
await browser.close();
Auth flow — Firebase MFA sign-in (bash)
Firebase MFA sign-in is three calls. Use this pattern; the TOTP window matters (Firebase rejects replays).
API_KEY="<from /api/config firebaseApiKey>"
SECRET="<TOTP b32 secret for this account>"
# Wait until the start of a fresh TOTP window to avoid a wasted attempt
until [ $((30 - $(date +%s) % 30)) -lt 3 ]; do sleep 2; done
sleep 3
CODE=$(oathtool --totp -b "$SECRET")
resp=$(curl -s -X POST "https://identitytoolkit.googleapis.com/v1/accounts:signInWithPassword?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{"email":"<email>","password":"<pw>","returnSecureToken":true}')
MFA_CRED=$(echo "$resp" | python3 -c "import json,sys; print(json.load(sys.stdin)['mfaPendingCredential'])")
ENROLLMENT_ID=$(echo "$resp" | python3 -c "import json,sys; print(json.load(sys.stdin)['mfaInfo'][0]['mfaEnrollmentId'])")
# Finalize — MUST include both mfaPendingCredential AND mfaEnrollmentId
curl -s -X POST "https://identitytoolkit.googleapis.com/v2/accounts/mfaSignIn:finalize?key=$API_KEY" \
-H "Content-Type: application/json" \
-d "{\"mfaPendingCredential\":\"$MFA_CRED\",\"mfaEnrollmentId\":\"$ENROLLMENT_ID\",\"totpVerificationInfo\":{\"verificationCode\":\"$CODE\"}}" \
> /tmp/signin.json
ID_TOKEN=$(python3 -c "import json; print(json.load(open('/tmp/signin.json'))['idToken'])")
Gotchas learned the hard way:
mfaSignIn:finalizerequires BOTHmfaPendingCredentialandmfaEnrollmentId. OmittingmfaEnrollmentIdreturns a confusingINVALID_ARGUMENT.- A consumed TOTP code →
INVALID_CODE. The Bash tool blocks long leadingsleeps; use anuntil [ ... ]; do sleep 2; doneloop to wait for the next window. bashpre-flight cannot dosleep 32directly — it's blocked. Use the until-loop pattern.
Secrets lookup (GCP)
Secret Manager in the Pablo GCP project is authoritative. Names observed:
pablo-database-url, pablo-db-password, AUTH_SECRET, JWT_SECRET_KEY, GOOGLE_CLIENT_ID/SECRET, AUTH_COOKIE_SIGNATURE_KEY. Project id is whatever is in /api/config.firebaseProjectId (e.g., pablohealth-oss).
gcloud secrets list --project=<project>
gcloud secrets versions access latest --secret=pablo-database-url --project=<project>
# parse: postgresql://pablo:<pw>@/pablo?host=/cloudsql/<conn>
Cloud SQL direct access
# Use a high port — the user may have another cloud-sql-proxy running on 5433
cloud-sql-proxy --port 15433 <connection> > /tmp/p.log 2>&1 &
# wait until "ready for new connections" appears in /tmp/p.log
DB_PW=$(gcloud secrets versions access latest --secret=pablo-db-password --project=<project>)
PGPASSWORD="$DB_PW" psql "host=127.0.0.1 port=15433 user=pablo dbname=pablo sslmode=disable" -c "\dt"
Always run read-only at the DB level. SELECT ... LIMIT 5. Queries to prioritize:
\dt— confirmaudit_logsexists and has recent rows (SELECT count(*) FROM audit_logs WHERE timestamp > now() - interval '1 day';).- Disambiguate row count == 0 with the closed-loop result before flagging.
collect_closed_loop_audit(artifact40-closed-loop-audit.txt/ summaryfindings_count,highest_severity) actively writes one row in the same run and asserts it appears via/api/users/me/audit-log. Decision matrix:- closed-loop
ok / NONE+ row count0→ INFORMATIONAL (idle period; pipeline proven alive). Goes in §6a as a positive control, not §6. - closed-loop
ok / NONE+ row count>0→ pass; nothing to flag. - closed-loop
error/skipped+ row count0→ HIGH (pipeline unverified AND historical data empty). Real § 164.312(b) failure. - closed-loop
error/skipped+ row count>0→ MEDIUM (write-path unverified this run, but historical data shows activity).
- closed-loop
- Never flag
audit_logsempty as HIGH on closed-loop-green evidence — that's the false-positive class documented in2026-05-03_pentest_report.mdPABLO-002.
- Disambiguate row count == 0 with the closed-loop result before flagging.
- Confirm audit schema is PHI-free:
\d+ audit_logsshould NOT showuser_email,user_name,patient_namecolumns. If it does, that's a regression. - RLS policies:
SELECT schemaname, tablename, policyname FROM pg_policies; SHOW row_security;and per-connectionSHOW app.current_user_id;- Scan for unexpectedly plaintext PHI columns (e.g.,
\d+ patients→ SSN, DOB storage).
Kill the proxy with pkill -f "cloud-sql-proxy --port 15433" when done.
Vulnerability exception log (read every run)
docs/pentest/VULNERABILITY_EXCEPTIONS.md is the operator's documented-risk register for advisories that aren't patched within the standard SLA — required by § 164.308(a)(1)(ii)(B). Treat it as first-class evidence: it both suppresses known-and-accepted dep-scan hits in §6 and is itself a thing that can rot.
Find the file. Path varies by how the runner mounted the repo — try in order, stop at the first hit:
EXC_FILE=""
for p in /workspace/pentest-bundle/repo/docs/pentest/VULNERABILITY_EXCEPTIONS.md \
/workspace/repo/docs/pentest/VULNERABILITY_EXCEPTIONS.md \
/app/docs/pentest/VULNERABILITY_EXCEPTIONS.md \
docs/pentest/VULNERABILITY_EXCEPTIONS.md; do
[ -f "$p" ] && EXC_FILE="$p" && break
done
If no path resolves, that's a MEDIUM finding in §6 — the operator cannot demonstrate documented risk decisions for unpatched advisories. Do NOT silently treat absence as "no exceptions."
Per open entry (under ## Open), parse ### <advisory-id>, Severity, Status, Revisit by, and the entry's age. Age sources, in order:
- An explicit
Raised:field, if present (preferred — push the operator to add one when missing). - Git first-add date for the entry heading (fallback):
REPO_ROOT="$(git -C "$(dirname "$EXC_FILE")" rev-parse --show-toplevel)" git -C "$REPO_ROOT" log --diff-filter=A --format=%aI -S"### CVE-2026-3219" -- "$EXC_FILE" | tail -1 - The file's first-add date as the conservative upper bound if neither works.
Compute age_days = today - raised. Today is date -u +%Y-%m-%d.
Cross-reference into §6. For any §9 dep-scan hit (pip-audit, trivy, osv-scanner) whose advisory ID is listed under ## Open, do not promote it on dep-scan severity alone — record it as INFO with Status: documented-exception (see §9 Documented exceptions → <advisory-id>). That's the entire purpose of the log.
Staleness rule — HIGH finding when triggered. Any open entry with age_days > 30 ships in §6:
- ID:
PABLO-EXC-STALE-<advisory-id> - Title:
Vulnerability exception stale (>30 days) — refresh or remediate: <advisory-id> - Severity: HIGH (§ 164.308(a)(1)(ii)(B) — documented risk decisions must be re-reviewed; >30 days without update means the decision is unverified against current upstream state)
- Description: Exception was raised
days ago. Either upstream now ships a fix (close the entry), the situation has changed (refresh Why not patched,Compensating control, andRevisit bywith current rationale), or the risk acceptance is silently aging — none acceptable to OCR. - Remediation: Patch the dep if a fix exists, OR update the entry with current rationale and a new
Revisit by. BumpingRevisit byalone without re-justifying does not clear this finding — the staleness check is onRaised:/git-add date, not onRevisit by. To reset the clock, add or update aRaised:field to today and document what was re-verified. - Owner: copy the entry's
Owner(default Kurt Niemi).
If age_days ≤ 30 for every open entry → §6a positive control row: "Exception log fresh — N open, oldest
Checklist (run in order — keep it updated as findings stabilize)
Recon — TLS (
openssl s_client -tls1_1should fail with alert 70), cert chain, security headers on frontend + backend,/api/config,/api/health,/docs//openapi.json(404 in prod — good),/api/ext/auth/seed-admin(404 expected — referenced but not implemented),robots.txt. Runtestssl.shfor full TLS config coverage. 1b. Scripted sweep — runnucleiandffufagainst the backend URL (sanctioned invocations above). Review findings manually; fold HIGH/CRITICAL into the report. Skipnikto/sqlmapunless the sweep surfaces a lead. 1c. Static analysis + dependency scan —semgrep(code),pip-audit(Python deps),trivy imageagainst the deployed backend tag,osv-scanner --recursive /app/backend,gitleaks detect --source /app/backend --no-git. Feeds §8 and §9 of the report. Anygitleakshit = automatic CRITICAL finding in §6. 1d. Vulnerability exception log review — locatedocs/pentest/VULNERABILITY_EXCEPTIONS.md(see "Vulnerability exception log" section above), parse every entry under## Open, computeage_daysper entry, cross-reference advisory IDs against the §1c dep-scan output, and apply the staleness rule (>30 days → HIGH §6 finding). Missing file → MEDIUM finding.Cloud Run IAM —
gcloud run services get-iam-policy <svc>. Both backend and frontend areallUsers → roles/run.invokerby design; auth is at app layer. Verify with unauthcurl→ expect 401 on/api/{users/me, patients, sessions, admin/users}and 403 "Service auth required" on/api/ext/auth/*.CORS — preflight with
Origin: https://evil.commust not be reflected. Legit origin = the frontend's Cloud Run URL; checkallow_credentials: trueis only paired with that.Auth flow — decode JWT (
alg=RS256,aud=<projectId>,firebase.sign_in_second_factor=totp). Tamper toalg=none→ 401. SwapX-Tenant-IDheader → ignored (tenant from JWT).Unauth surface —
/api/ext/auth/check-allowlistandcheck-statusrequire Bearer. Recurring HIGH check:ext_auth.py:_verify_blocking_function_tokenmust passaudience=<backend URL>togoogle.oauth2.id_token.verify_token. If theaudience=kwarg is still missing, file it again.Cross-tenant IDOR (A vs B — separate practice schemas) — do this autonomously, do NOT ask the user to create accounts. The runner has already provisioned MFA-enrolled ephemeral users for you on this run: tokens in
$PENTEST_TEST_ID_TOKEN_A/$PENTEST_TEST_ID_TOKEN_B(emails in$PENTEST_TEST_EMAIL_A/_B, uids in$PENTEST_TEST_UID_A/_B). A and B sit in separate pentest tenants — this exercises the Postgres schema boundary, which is the outermost layer of the patient-access model. Exercise both:- With
$PENTEST_TEST_ID_TOKEN_A,POSTone patient + one session through the authenticated endpoints (name prefixPENTEST-<run-uuid>) so there is a known-real A-owned record. Capture the returned UUIDs. The session endpoint emits a note when the upload completes; capture the note id from the session response (or fromGET /api/patients/<id>/notes) into$PENTEST_TEST_NOTE_ID_A. - With
$PENTEST_TEST_ID_TOKEN_B, run the probes below against A's UUIDs. Expect 404 from the repo layer (NOT 403 — 403 leaks existence). A 200 is a cross-tenant BOLA = CRITICAL + Breach candidate § 164.402. A 403 is HIGH (info leak via presence signal).GET /api/patients/<A's-patient-uuid>GET /api/sessions/<A's-session-uuid>GET /api/notes/<A's-note-uuid>— the SOAP/note endpoint moved off/api/soap-notes/...in pa-0nx (notes/sessions split). Live shape is/api/notes/{note_id}(seebackend/app/routes/notes.py).PATCH /api/notes/<A's-note-uuid>with body{"content_edited": {"plan": "pentest-<run-uuid>"}}— write-side IDOR.POST /api/notes/<A's-note-uuid>/finalizewith body{"quality_rating": 5}— finalize-IDOR (would flip the note'sfinalized_atif it succeeded).POST /api/notes/<A's-note-uuid>/submit-export(no body) — export-IDOR (would queue another tenant's PHI for outbound export).GET /api/appointments/<A's-appt-uuid>(if A created one). Note:GET /api/appointmentslist-by-range stays on the calendar-owner shape (the "my calendar" view), so the list endpoint is not patient-scoped — only the single-resource read is. A 200 on the patient-scoped probe MAY be acceptable if B somehow has a grant on A's patient, but in this scenario B is in a different tenant entirely so even the schema lookup must miss; treat 200 here as CRITICAL.
- Repeat write-side probes:
PUT/DELETEon the same foreign UUIDs with$PENTEST_TEST_ID_TOKEN_B→ expect 404. A 200/204 means a write IDOR — CRITICAL + Breach candidate. - Probe
X-Tenant-IDheader swap on the same endpoints — the header must be ignored (tenant is taken from JWT).
Fallback — if A/B tokens are missing (identity bootstrap failed): note explicitly in §12 that "cross-tenant IDOR was not exercised this run — identity bootstrap returned no credentials". Do NOT fabricate results and do NOT attempt to mint new Firebase accounts inline — the runner owns that lifecycle.
- With
6b. Same-tenant cross-clinician IDOR (A vs C — same schema, no patient_clinicians grant) — PR #170 unified every patient-scoped table (patients, notes, therapy_sessions, appointments) around the has_patient_access(patient_id, user_id) SQL function backed by the patient_clinicians table (see migrations 777b846ab944_patient_clinicians_table_and_access_* and 9dea1edf7fe0_drop_patients_user_id_in_favor_of_* in backend/alembic/versions/). The schema boundary doesn't help here — A and C share a Postgres schema, but C has no grant row on A's patient. This is the boundary the #170 IDOR was actually on; the cross-tenant block above is the outer-defense check, this is the load-bearing one. The runner provisions a third user for this scenario: token in $PENTEST_TEST_ID_TOKEN_C (email $PENTEST_TEST_EMAIL_C, uid $PENTEST_TEST_UID_C).
- Reuse the same A-owned UUIDs from §6 (
$PENTEST_TEST_NOTE_ID_A, A's patient + session UUIDs). Do not re-create resources for this block. - With
$PENTEST_TEST_ID_TOKEN_C, run the same probe set as §6 against A's UUIDs:GET /api/patients/<A's-patient-uuid>GET /api/sessions/<A's-session-uuid>GET /api/notes/<A's-note-uuid>PATCH /api/notes/<A's-note-uuid>with body{"content_edited": {"plan": "pentest-<run-uuid>"}}POST /api/notes/<A's-note-uuid>/finalizewith body{"quality_rating": 5}POST /api/notes/<A's-note-uuid>/submit-export(no body)
- Expect 404 on every probe (consistent with the existence-non-leak rule). A 200 / 200-with-edit / 201-with-finalize / 200-with-queued-export is CRITICAL + Breach candidate § 164.402 — same severity tier as a cross-tenant breach, because the schema isolation did not save us and
has_patient_accessis the only remaining wall. Phrase the §6 entry as "same-tenant cross-clinician BOLA viahas_patient_accessbypass" and link back to PR #170 as the introducing change. - Skip
/api/appointments/<A's-appt-uuid>for the §6b probe set: appointments deliberately keep a calendar-owner read shape on top of the patient-scoped grant (the "my calendar" view), so a same-tenant clinician may see the appointment summary if they hold any patient grant — won't happen in this scenario (C has zero grants), but the rubric is the patient probe is sufficient evidence and the appointment probe is noisy.
Fallback — if $PENTEST_TEST_ID_TOKEN_C is missing but A/B are present (the bootstrap is mid-rollout, or this skill is being driven against an older runner that pre-dates _C): note explicitly in §12 that "same-tenant cross-clinician IDOR was not exercised this run — _C credentials absent". Do NOT silently fold this back into the §6 cross-tenant numbers — they're different scenarios with different defense-in-depth assertions.
7. Privilege escalation — /api/admin/* with clinician token → ADMIN_REQUIRED. JWT tamper (won't work, Firebase verifies sig).
8. Injection (targeted) — 1–2 probes each on reflected XSS, SQLi (boolean+time on obvious params), SSRF on iCal feed_url (hostname allowlist + scheme check; don't bother with 169.254.169.254 — it's blocked by hostname).
9. Rate limiting — 5 bad passwords on the real account via Firebase signInWithPassword (Firebase handles lockout; don't exceed 10). Skip if prior run already tripped lockout.
10. Upload DoS — Content-Length: 2000000000 + tiny body header-only probe (reject on header = good). Recurring MEDIUM check: sessions.py:upload_audio should read in bounded chunks, not await file.read() before size check.
11. MFA integrity — POST /api/users/me/mfa-enrolled as a fresh non-MFA sign-up. Recurring MEDIUM: returns a new mfa_enrolled_at without verifying Firebase-side enrollment. Doesn't grant access (JWT claim still gates), but poisons compliance metrics.
12. Signup hygiene — identitytoolkit /accounts:signUp with @example.invalid should fail in a locked-down deployment. If it succeeds, restrict_signups=false — flag it.
13. Cloud SQL direct — see section above. audit_logs table existence is the key HIPAA check. The Cloud SQL collector also asserts two integrity properties on audit_logs: the app role holds no UPDATE/DELETE (append-only, §164.312(c)(1)) and no foreign key cascades a patient delete into audit rows (6-year retention, §164.530(j)). Findings surface in 30-cloud-sql.txt.
14. Cloud configuration depth — the runner pre-computes deterministic posture checks; read these artifacts and fold their findings into §6 / positive results into §6a / evidence into §7. Do not re-run the gcloud queries.
- 50-sa-iam.txt — Cloud Run runtime service-account roles. Target: no roles/owner/roles/editor, no unjustified *.admin, not the shared default compute SA. (§164.308(a)(4))
- 51-cloud-sql-posture.txt — Cloud SQL TLS mode, public-IP/private-IP exposure, CMEK custody. Target: TLS-only, no open authorized network, Private IP (or documented proxy-only). Google-managed keys are recorded as INFO (addressable). (§164.312(e), §164.312(a)(2)(iv))
- 52-audit-log-config.txt — Cloud Audit Logs DATA_READ for Secret Manager + Cloud SQL. (§164.312(b), operator side)
- 53-secret-rotation.txt — newest enabled version age per secret; flags any past the max-age threshold. (§164.308(a)(5)(ii)(D))
- 54-wif-scope.txt — Workload Identity Federation trust conditions. Target: repo-pinned (assertion.repository ==), not org-only. (supply-chain trust)
- 55-ci-workflow-audit.txt — pull_request_target workflows that check out untrusted PR head (present only when the .github tree is in scope).
- 56-image-provenance.txt — deployed image build-provenance presence (best-effort; skips cleanly when unavailable).
Deliverable — HIPAA-grade report format
Target audience: HHS OCR auditors, Pablo's owner/operator, and an external qualified pentester using this report as their scoping input. Every run produces the full artifact — there is no "lite" mode.
Never skip a finding. Every issue observed — INFO through CRITICAL — lands in §6 with a full write-up. Severity is how findings are ordered and highlighted, not a gate for inclusion. A "clean" run still enumerates the INFO-level observations it considered. §2 highlights CRITICAL/HIGH for the reader; nothing is dropped because it looked minor. If you considered an issue and decided it was a non-issue, that belongs in §6a (Positive controls and items tested clean) with the reasoning, not silently omitted.
Positive findings are first-class. §6a is mandatory — it captures the controls that passed and the things the assessor considered and dismissed, with the same evidence rigor as a finding (what was tested, what the expected behavior was, what was observed). This is what a Covered Entity shows an auditor to demonstrate that the technical safeguards are actually working, not just written down, and that the assessor looked at the full surface.
Read this disclaimer onto every report cover: this is an automated self-assessment driven by an LLM, not an independent qualified third-party pentest. For full HIPAA §164.308(a)(8) defensibility (2024 NPRM anticipated to finalize in 2026), Pablo should still engage an independent qualified pentester at least annually. This weekly artifact complements — does not replace — that engagement, and is meant to surface issues between formal engagements + give the external tester a scoped starting point.
Report sections, in order (use these exact headings):
1. Cover & scope
Covered entity (Pablo Health, LLC); report ID (PABLO-PENTEST-<date>-<run UUID>); reporting period (since prior run in GCS); tester identity (CLI + model); authorization statement; in-scope systems (Cloud Run × 2, Cloud SQL, Firebase, GCS compliance bucket, Secret Manager, backend source); out-of-scope (GCP/Firebase/Vertex infra, third-party source repos, physical, social engineering, wireless); run metadata (project ID, URLs, connection name, region).
2. Executive summary
3–5 bullets, business-risk language. Lead with severity totals and trend vs prior run.
3. Asset inventory & data flow
From gcloud run services list, gcloud sql instances list, gcloud secrets list, gsutil ls, /api/config. Table: Asset | Type | ePHI touched | Region | Encryption at rest | Encryption in transit. Text data-flow.
Observed external egress destinations — enumerate from code + config, then reason about each destination's BAA coverage. List:
- Deployed service env vars that route inference:
gcloud run services describe pablo-backend --region=us-central1 --format='value(spec.template.spec.containers[0].env)'→ recordCLAUDE_CODE_USE_VERTEX,GOOGLE_GENAI_USE_VERTEXAI,ANTHROPIC_VERTEX_PROJECT_ID,GOOGLE_CLOUD_LOCATIONvalues. - All external hostnames the backend can reach:
grep -rEoh "https?://[a-zA-Z0-9.-]+" /app/backend | sort -u
For each destination the backend can send request bodies (prompts, transcripts, patient fields) to, determine whether it is covered by a Business Associate Agreement — Vertex AI under the project's Google Cloud BAA is covered; api.anthropic.com, api.openai.com, generativelanguage.googleapis.com (public Gemini direct) are not covered by the Google Cloud BAA. Apply the severity rubric below: ePHI traversing a non-BAA destination is a § 164.504(e) permitted-use violation and a § 164.402 Breach candidate — score per the rubric, do not pre-commit to a specific tier here.
4. Threat model (STRIDE-lite, Pablo-specific)
Actors: unauth internet; authenticated clinician cross-tenant; authenticated clinician horizontal BOLA within tenant; insider with GCP IAM; compromised dependency; compromised subprocessor. Crown jewels: patient records, session transcripts, audit_logs, Firebase/GCP credentials, AUTH_SECRET/JWT_SECRET_KEY. Map each actor to attack paths and to the findings in §6.
5. Methodology & frameworks
OWASP WSTG v4.2, OWASP API Security Top 10 (2023), OWASP ASVS 4.0 Level 2, PTES Technical Guidelines, NIST SP 800-115, HIPAA Security Rule §164.308/.310/.312/.314 (2024 NPRM). List tools from the scanner tooling section that were invoked with which flags.
6. Findings
Table: ID | Title | Severity | CVSS 3.1 vector | CWE | §164 control | Asset | Status. One subsection per finding with Description / Evidence (redacted) / Reproduction / Business impact / Remediation / Owner (default: CODEOWNERS or Kurt Niemi) / Target resolution (CRITICAL ≤7d, HIGH ≤30d, MEDIUM ≤90d, LOW ≤180d) / Status (new | carry-over-from:<prior report ID> | resolved-this-run | regression).
Severity rubric — HIPAA overlay applies. Raw CVSS underweights PHI impact: a "medium" CVSS CVE that enables PHI access is a § 164.402 Breach and ships as CRITICAL. Use:
effective_severity = max(cvss_tier, phi_impact_tier)
where phi_impact_tier is:
- CRITICAL — any unauthenticated PHI read; cross-tenant or horizontal PHI access by an authenticated user; PHI integrity compromise (write/delete across tenant boundary); PHI egress to infrastructure not covered by a Business Associate Agreement (§ 164.504(e) — the disclosure path itself is an impermissible use regardless of whether a Breach has occurred yet); any condition that would meet the § 164.402 "acquisition, access, use, or disclosure of PHI in a manner not permitted" definition of a reportable Breach.
- HIGH — authenticated privilege escalation; auth-bypass that could lead to PHI without a second bug; PHI availability loss > 24h; any secret in git (
gitleakshit); missing/non-firing audit logs on PHI routes (§ 164.312(b) gap). - MEDIUM — info disclosure without direct PHI nexus; DoS vector; missing defense-in-depth on a PHI-adjacent path.
- LOW — defense-in-depth gap with no PHI nexus.
- INFO — observation / posture note.
"PHI-adjacent" means the path touches metadata, signals, or fields that would appear in a HIPAA compliance report, audit trail, or Breach risk-assessment, even when the underlying PHI is not itself exposed. Example: /api/users/me/mfa-enrolled poisons the mfa_enrolled_at field that feeds § 164.308(a)(5) workforce-MFA attestations — PHI-adjacent, MEDIUM. Counter-example: missing HSTS on a static-asset CDN with no session tokens — no PHI nexus, LOW.
Baseline examples (the raw CVSS column is what a pure-technical scorer would emit; effective is what ships):
| Finding | CVSS tier | PHI tier | Effective | Note |
|---|---|---|---|---|
Unauth GET /api/patients/<id> returns 200 |
High | Critical | Critical | § 164.402 Breach — reportable |
| Authed clinician B reads tenant A's patient | High | Critical | Critical | Cross-tenant BOLA = Breach |
| Stored XSS in session notes | Medium | Critical | Critical | PHI integrity + exfiltration path |
audit_logs silent gap > 24h on PHI route, closed-loop ALSO failed |
Medium | High | High | § 164.312(b) enforcement failure |
audit_logs empty over 24h, closed-loop GREEN this run |
Info | None | Informational | Idle-period evidence, not a finding (see §Cloud SQL guidance) |
| Stale npm dep, CVSS 7.5, no PHI reachability | High | None | High | No overlay needed |
| HSTS header missing | Low | None | Low | Defense-in-depth only |
Backend routes inference to api.anthropic.com (not Vertex); transcripts contain PHI |
Low (config) | Critical | Critical | PHI disclosed to non-BAA subprocessor = § 164.504(e) impermissible use + § 164.402 Breach candidate — Breach candidate: § 164.402 |
Compute and emit a CVSS 3.1 base vector for every finding (e.g., AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:N) alongside the effective HIPAA-overlay severity, so auditors can see both scores and the reasoning.
Breach-candidate flag. Any finding scored CRITICAL because of a PHI nexus (not because CVSS alone was Critical) must carry a Breach candidate: § 164.402 tag in its subsection header, with a sentence naming the specific disclosure path. This is what separates a HIPAA-aware report from a generic CVE dump, and it's the line an OCR reviewer looks for first.
Methodology reference: OWASP Risk Rating Methodology (business-impact multiplier), HITRUST CSF Risk Analysis Guide, HHS OCR "Guidance on Risk Analysis Requirements" under § 164.308(a)(1)(ii)(A). The max(cvss_tier, phi_impact_tier) formula is the common Coalfire/Clearwater pattern for HIPAA-aware pentest reports.
6a. Positive controls and items tested clean
Single evidence-rigor table — no parallel bullet list elsewhere. Columns: Control | What was tested | Expected behavior | Observed evidence | §164 control mapped. Draw from the checklist (§1–§13) — at minimum cover: TLS ≥1.2 enforcement, security headers (HSTS / CSP / X-Content-Type-Options), CORS origin allowlist, unauth 401 on PHI routes, JWT alg=none rejection, X-Tenant-ID header ignored, cross-tenant 404-not-403, admin-route ADMIN_REQUIRED enforcement, Cloud Run egress env vars pointing to Vertex (not public Gemini/Anthropic), audit_logs table present with recent rows, audit schema PHI-free, CI supply-chain scans passing on the deployed tag. Every row cites specific evidence (status code, header value, row count, screenshot-equivalent text output).
Also include controls the model considered and dismissed as non-issues — e.g. "/api/ext/auth/seed-admin returns 404 as expected, not a live backdoor despite setup-solo.sh reference." This is what shows the auditor the technical safeguards are working and the assessor looked at the full surface, not merely what was written down. Dismissed scanner false positives belong in §8 / §9 with the tool context, not here.
7. HIPAA Security Rule control matrix
Copy the table under HIPAA control matrix below into this section, filling Status / Evidence / Gap from this run's observations. Every row required; N/A rows need justification. Every Partial or Fail must link to a finding in §6.
8. Static analysis findings (semgrep)
Only confirmed-real hits, grouped by rule ID with file:line. Dismissed false positives stay here as a Dismissed: subsection with the reason — not duplicated in §6a or Appendix B.
9. Dependency & supply-chain scan
CI already runs pip-audit, trivy, and npm audit on every PR + weekly (see .github/workflows/ci.yml + security.yml). The pentest adds what CI can't: (a) scanning what's currently deployed (may lag main), (b) self-contained evidence inline in the report (auditors shouldn't need to cross-reference the GitHub Security tab). Flag any divergence between "CI clean" and "deployed tag vulnerable" as its own finding.
Subsections:
- Python deps —
pip-audit:Package | Installed | Vulnerable | Fixed | CVE | CVSS - Deployed container image —
trivy image <deployed tag>where the tag comes fromgcloud run services describe pablo-backend --region=us-central1 --format='value(spec.template.spec.containers[0].image)' - Multi-ecosystem —
osv-scanner --recursive /app/backend - Secrets in git —
gitleaks detect --source /app/backend --no-git(any hit = CRITICAL finding in §6) - Documented exceptions — table of every entry under
## Openindocs/pentest/VULNERABILITY_EXCEPTIONS.md. Columns:Advisory | Package | Severity | Status | Raised | Age (days) | Revisit by | Stale (>30d)?. Note the file path actually used (or "file not found" with the paths searched). For each row whose advisory ID also appears inpip-audit/trivy/osv-scanneroutput above, add aCross-ref:line under the row pointing at that scanner's finding ID — and downgrade the §6 entry for that advisory to INFO withStatus: documented-exception. Stale rows (Age > 30) get a separate HIGH §6 finding per the staleness rule.
10. Prior-run carry-over & trends
Fetch the newest .md before today from gs://<COMPLIANCE_REPORT_BUCKET>/pentest/. Diff into: Resolved since last run / Persisting (with "consecutive runs open" counter) / Regressions (elevate to HIGH minimum) / New this run. If no prior report exists, mark this as the baseline.
11. Endpoint coverage matrix
Group by auth profile, not by individual route. Columns: Auth profile | Tenant-scoped | Route count | Representative routes | Tests run | Result. One row per (auth_requirement, tenant_scope) combination — e.g. one row covering all require_mfa + tenant-scoped PHI routes, one for require_mfa + admin-only, one for get_current_user_no_mfa, one for public. List 2–3 representative routes per row; the exhaustive enumeration goes in Appendix B under endpoints.txt.
Any route that doesn't fit a known profile (missing Depends(), explicit public=true tag, unusual composition) gets its own row. Every unusual row needs either a positive test result or an explicit "deferred to external pentester" reason — never "ran out of time."
Enumerate the full set for the appendix:
grep -rnE "@router\.(get|post|put|delete|patch)\(" /app/backend/app/routes/ > /tmp/endpoints.txt
12. Automated assessment scope boundaries
Describe what is outside the boundary of this automated engagement and why, framed as scope constraints rather than gaps. The audience is auditors establishing what human-led testing should cover next; frame each entry as "requires human judgment / multi-session context / physical access / social engineering" rather than "we didn't test this." Entries belong here when they fall outside what any automated tool can assert — business-logic depth, multi-step stateful workflows, novel zero-days, prompt-injection depth, physical controls, social engineering. Do not list items that were simply skipped due to time; those are findings or fallback notes in the relevant section.
Static analysis exclusions (semgrep): The following paths are excluded from semgrep via --exclude flags (rationale mirrored in /.semgrepignore). A human reviewer or independent pentester should audit these paths directly:
| Excluded path | Rule(s) suppressed | Rationale |
|---|---|---|
alembic/ |
avoid-sqlalchemy-text |
Migration DDL; text() calls are operator-authored SQL, never reachable from web requests |
tests_integration/, tests/ |
avoid-sqlalchemy-text |
Test fixture DDL; not on the production attack surface |
app/db/__init__.py, app/db/provisioning.py |
avoid-sqlalchemy-text |
Tenant schema management DDL (CREATE SCHEMA, SET search_path, pg_advisory_lock, RLS policy setup); all text() arguments are system-generated, never from user input |
app/jobs/pentest_*.py |
dynamic-urllib-use-detected |
Self-assessment tooling; dynamic outbound calls target IAM-gated GCP admin APIs and operator-configured webhook URLs |
app/jobs/hipaa_log_review.py |
dynamic-urllib-use-detected |
Audit log reviewer; urllib targets an operator-configured webhook (admin-only runtime config) |
Inline # nosemgrep annotations suppress individual false-positive hits where exclusion would be too broad (e.g. auth/service.py unverified-JWT routing helper, logger calls in auth handlers).
13. Prioritized remediation roadmap
Ordered list grouped by severity. Columns: Finding ID | Severity | Effort (S/M/L) | Target date | Owner | Retest by. The Retest by column defaults to "next scheduled run" — only override for CRITICAL items that need sooner re-verification (e.g. "next run + manual curl confirmation within 7 days"). The §15 retest-plan section was merged here; if a finding needs a bespoke retest procedure beyond "run this skill again," describe it inline in that finding's §6 subsection under a Retest procedure: line.
14. Appendices
The pentest_runner.py wrapper uploads every raw scanner artifact to gs://<COMPLIANCE_REPORT_BUCKET>/pentest/<run-uuid>/raw/ with retention lock. The report inlines findings-level output (the specific lines that drove a conclusion) and links to the GCS object for full raw dumps. Inline blocks are capped at ~50 lines each — anything longer, link out.
- A: Commands executed — chronological list of every shell command/API call the run made (redact tokens and any PHI). One command per line. Audit defense: an OCR reviewer must be able to reconstruct what actually ran. This one stays fully inline — it's short and load-bearing.
- B: Scanner invocations & findings — one subsection per tool (
nuclei,ffuf,semgrep,pip-audit,trivy,osv-scanner,gitleaks,testssl.sh,nikto/sqlmapif invoked, plus theendpoints.txtenumeration from §11). For each: exact invocation, exit code, summary line counts (total / high / medium / low / info), inline excerpts of the specific lines that became §6 findings, and aRaw output: gs://.../raw/<tool>.txtlink. Dismissed false positives live here with their dismissal reason — do not duplicate them in §6a. - C: Cloud SQL query log — every
SELECTrun against the DB and the row counts returned (values redacted). Stays fully inline. - D: SBOM — link to
gs://.../raw/sbom.cyclonedx.json; inline a one-line summary (component count, critical CVE count). If the image scan is unavailable, note that and link topip-list.txtinstead. - E: Attestation block — tester identity (CLI + model), run UUID, ISO timestamp, SHA256 of the report body (excluding this block). Unsigned (automated run); needs human countersign before the operator uses it as input to the annual §164.314(a) written verification to Covered Entities.
Output handling: emit the complete markdown report — through all appendices — to stdout. Do NOT gsutil cp or otherwise upload to GCS yourself; the calling runner (pentest_runner.py) captures stdout and uploads to the retention-locked compliance bucket. Uploading from inside the skill creates duplicate objects with inconsistent metadata. The final thing you emit must be the closing of appendix E; do not append a trailing "uploaded to gs://…" line.
Length & tone: body 2000–3000 words; appendices on top of that. Inline evidence for anything ≤50 lines; link to the GCS raw-artifact bucket for longer dumps (see §14). Audit-ready neutral tone. Every "pass" claim cites observable evidence.
HIPAA control matrix (copy into §7 of every report)
Administrative safeguards (§164.308)
| Control | Requirement | Status | Evidence | Gap |
|---|---|---|---|---|
| §164.308(a)(1)(ii)(A) | Risk analysis — accurate, thorough, documented; reviewed ≥12mo | |||
| §164.308(a)(1)(ii)(B) | Risk management — reduce risks to reasonable level | |||
| §164.308(a)(1)(ii)(D) | Information system activity review — logs, access reports, incident reports | |||
| §164.308(a)(3)(i) | Workforce authorization / supervision | |||
| §164.308(a)(3)(ii)(C) | Termination procedures — revoke access on departure | |||
| §164.308(a)(4)(ii)(B) | Access authorization — granted per role | |||
| §164.308(a)(4)(ii)(C) | Access establishment & modification | |||
| §164.308(a)(5)(ii)(C) | Log-in monitoring — detect anomalies | |||
| §164.308(a)(5)(ii)(D) | Password management (NPRM: MFA required) | |||
| §164.308(a)(6)(ii) | Security incident response & reporting | |||
| §164.308(a)(7)(ii)(A) | Data backup plan — tested | |||
| §164.308(a)(7)(ii)(B) | Disaster recovery plan — restore ≤72h (NPRM) | |||
| §164.308(a)(7)(ii)(D) | Contingency plan testing ≥12mo | |||
| §164.308(a)(8) | Technical evaluation (this report) — annual pentest + biannual vuln scan (NPRM) |
Physical safeguards (§164.310) — out of scope for this automated scan (cloud-inherited or workforce-level). Operator tracks separately.
Technical safeguards (§164.312) — the heart of this pentest.
| Control | Requirement | Status | Evidence | Gap |
|---|---|---|---|---|
| §164.312(a)(1) | Unique user identification | |||
| §164.312(a)(2)(ii) | Emergency access procedure | |||
| §164.312(a)(2)(iii) | Automatic logoff / session timeout | |||
| §164.312(a)(2)(iv) | Encryption / decryption of ePHI at rest (NPRM: required) | |||
| §164.312(b) | Audit controls — record & examine activity | |||
| §164.312(c)(1) | Integrity — protect ePHI from improper alteration/destruction | |||
| §164.312(c)(2) | Mechanism to authenticate ePHI (NPRM) | |||
| §164.312(d) | Person or entity authentication (NPRM: MFA required) | |||
| §164.312(e)(1) | Transmission security | |||
| §164.312(e)(2)(i) | Integrity controls in transit | |||
| §164.312(e)(2)(ii) | Encryption in transit (NPRM: required) |
Organizational / BA contracts (§164.314) — paper controls, out of scope for this automated scan. Operator tracks separately.
Administrative (§164.308) non-technical rows — rows like workforce authorization, termination procedures, risk analysis documentation are paper controls. The scanner only fills rows where it has direct technical evidence (e.g. §164.308(a)(5)(ii)(D) MFA via JWT claim inspection, §164.308(a)(1)(ii)(D) via audit_logs freshness). Mark the rest N/S (out of automated scope) — do not mark them Pass/Fail from assumption.
Status values: Pass / Partial / Fail / N/S (out of automated scope). Every Partial / Fail must link to a finding in §6. N/S rows do not require evidence — they're tracked outside this report.
Known recurring findings (verify each run — patches may land between runs)
Migration target: any row whose re-test is a one-line grep belongs in make lint (as a custom semgrep rule or a tiny pytest) rather than in a weekly pentest. When a regression check drops into the pentest twice in a row and has a mechanical verifier, file a task to promote it to CI and remove the row here. The pentest should focus on things a static check can't catch (live config, cross-tenant behavior, deployed-image vs source drift).
| Findings seen in past runs | Re-test |
|---|---|
ext_auth.py verify_token missing audience= |
grep ext_auth.py for verify_token( — confirm audience= kwarg present |
AuditService() instantiated without DB → logger.info only |
grep get_audit_service and _persist — confirm a Postgres write path |
sessions.py:upload_audio reads before size check |
grep upload_audio for await .*\.read() — confirm chunked reads |
/api/users/me/mfa-enrolled trusts client |
grep mfa-enrolled for Firebase Admin MFA verification |
restrict_signups=false on this deployment |
gcloud run services describe pablo-backend --region=us-central1 env vars + live signUp probe |
Cleanup checklist (do before writing the report)
- Delete any test users:
curl -X POST "identitytoolkit /accounts:delete" -d '{"idToken":"<tok>"}'(sign in to get a fresh idToken first). -
pkill -f "cloud-sql-proxy --port 15433"(or whatever port you used). - Confirm no UPDATE/DELETE/INSERT was issued (review your psql commands).
- Redact any PHI names/emails/DOBs from evidence strings in the report.