ai-safety-auditor

name: ai-safety-auditor description: Validates clinical AI outputs for safety, bias, and hallucination risks before delivery to end-users or clinicians. keywords: - ai-safety - compliance - bias-detection - hallucination-check - clinical-validation measurable_outcome: Identifies 100% of critical safety violations and potential PHI leakage in generated clinical notes. license: MIT metadata: author: Biomedical AI Team version: "1.0.0" compatibility: - system: Python 3.10+ allowed-tools: - run_shell_command - read_file

AI Safety Auditor

The AI Safety Auditor is a critical "human-in-the-loop" simulator and automated guardrail system. It intercepts outputs from other clinical agents to ensure they meet medical safety standards, do not contain Protected Health Information (PHI) where inappropriate, and are free from harmful hallucinations.

When to Use This Skill

As a final check before any clinical agent output is shown to a user.
To audit historical logs of agent interactions for compliance.
When detecting potential bias in diagnosis or treatment recommendations.
To verify that citations in a generated report actually exist (hallucination check).

Core Capabilities

PHI Scrubbing Verification: Ensures no identifiers leaked into non-secure outputs.
Hallucination Detection: Cross-references generated claims against trusted knowledge bases.
Bias Scanning: Checks for demographic or socioeconomic bias in clinical reasoning.
Contraindication Check: Verifies treatment recommendations against patient allergies/conditions.
On-Premises Open-Source Clinical LLM Deployment Risk Review: Evaluates distilled reasoning models proposed for local clinical diagnosis by requiring diagnostic performance validation, local infrastructure constraint review, privacy-control verification, and mandatory clinician oversight before clinical use.
On-Premises Reasoning Model Deployment Audit: Evaluates distilled open-source reasoning models proposed for clinical diagnosis by checking model provenance, calibration drift risk, local privacy controls, benchmark representativeness, and required human oversight before clinical use.
Clinical Practice Guideline LLM Assistance Audit: Evaluates LLM-supported guideline development for evidence traceability, conflict-of-interest awareness, consensus workflow support, versioned citations, and red-team checks for omitted harms or overconfident recommendations.
Real-Time Guideline Development Evaluation Workflow: Audits LLM assistance during clinical practice guideline development with real-time evidence checking, conflict-of-interest and citation verification, consensus-panel handoff, and documented human oversight gates before recommendations enter practice guidance.
Guarded Test-Time Knowledge Acquisition Audit: Checks medical decision support outputs for retrieve-before-answer evidence acquisition when current clinical knowledge is needed, verifies source authority and freshness before use, records acquired evidence and provenance in audit logs, handles contradictions with explicit uncertainty or escalation, compares pre/post retrieval decisions, and refuses or escalates when retrieved evidence is insufficient.
LLM-Assisted Guideline Drafting Evaluation: Checks clinical practice guideline drafts for evidence traceability, conflict handling, recommendation grading, human panel review, and real-time discrepancy logging during guideline development.
Fine-Grained Medical Q&A Dataset Evaluation: Audits domain-specific medical Q&A evaluation sets with domain stratification, omission and harm taxonomies, calibration checks, uncertainty labeling, and trustworthy-answer scoring beyond simple accuracy.
Clinical Guideline Contribution Review: Evaluates LLM contributions to clinical practice guideline development by requiring recommendation-level evidence traceability, real-time expert reviewer workflows, omission and harm checks, explicit source grading, consensus handling for disagreements, and mandatory human expert oversight before adoption.
Real-Time Guideline Recommendation Evaluation: Reviews LLM-supported guideline development for real-time evidence traceability, conflict-of-interest review, recommendation grading checks, citation audit, and panel-level human signoff before recommendations are finalized.
Domain-Specific Medical QA Stratified Error Analysis: Classifies answer correctness, omissions, unsafe reasoning, evidence support, and dataset leakage in medical QA datasets, requiring stratified error analysis before trusting medical LLM answers.
Clinical Practice Guideline Development Audit Pattern: Limits LLM use to draft synthesis and checking, tracks source provenance, compares draft recommendations against guideline-panel decisions, and flags unverifiable evidence, omitted harms, conflicts, and strength-of-recommendation mismatches.
Fine-Grained Trustworthy Medical QA Dataset Audit: Evaluates domain-specific Q&A datasets with item-level error labels, omission and risk-of-harm scoring, dataset provenance checks, specialty stratification, and regression benchmarks for trustworthy medical language model evaluation.
Medical Decision Support Test-Time Knowledge Acquisition Check: Requires retrieval provenance for acquired clinical facts, separates retrieved facts from model priors, evaluates answer changes after retrieval, and flags unsupported or outdated clinical claims.
On-Premises Open-Source Clinical LLM Deployment Checklist: Requires local diagnostic benchmark design, calibration and harm review, PHI containment, model provenance documentation, regression gates, and clinician override requirements before production use.
LLM-Assisted Clinical Practice Guideline Acceptance Checklist: Evaluates guideline-development use of LLMs by checking evidence synthesis traceability, recommendation drafting provenance, conflict detection, audit trails, and documented human panel sign-off before guideline text is accepted.
Diagnosis-Focused On-Premises Open-Source LLM Deployment Checklist: Compares distilled reasoning models against closed baselines before diagnosis use, requires local specialty validation, documents governance and latency tradeoffs, and classifies autonomous diagnostic use as high-risk with mandatory clinician oversight.
Clinical Guideline-Development LLM Evaluation Mode: Requires source traceability, conflict-of-interest awareness, recommendation grading checks, omission review, and human panel adjudication before any LLM-assisted guideline language is accepted.
Fine-Grained Domain-Specific Medical Q&A Benchmark Review: Evaluates trustworthy medical language model benchmarks with item-level correctness checks, omission labeling, harm-potential assessment, domain stratification, rubric reliability review, and dataset documentation before using Q&A results as evidence of model safety.
Test-Time Knowledge Acquisition Clinical Safety Pattern: Requires retrieval of current vetted references before medical reasoning, logs source provenance, compares pre/post-retrieval answer changes, and escalates uncertainty when retrieved evidence conflicts.
Distilled Open-Source Reasoning Model Clinical Deployment Gate: Evaluates on-premises distilled reasoning models by documenting local privacy benefits and degraded diagnostic reliability risk, designing benchmarks against closed frontier models, checking calibration, setting human review thresholds, and requiring deployment governance before clinical use.
Fine-Grained Domain-Specific Medical Q&A Regression Audit: Labels each Q&A item with error category, omission status, hallucination status, and specialty stratum, then tracks regressions across model or provider changes before treating domain-specific medical Q&A performance as evidence of trustworthy medical language model behavior.
Medical Decision Support Test-Time Evidence Safeguards: Triggers external evidence retrieval for current, high-risk, guideline-dependent, medication, diagnostic, contraindication, or uncertain clinical claims; verifies retrieved sources for clinical authority, provenance, date/version freshness, and patient-context relevance; handles conflicts by documenting uncertainty and escalating to clinician review; refuses or escalates when evidence is missing, stale, unverifiable, or insufficient for safe decision support.
Fine-Grained Domain-Specific Medical Q&A Evaluation Audit: Evaluates domain-specific medical Q&A items with separate labels for correctness, omission, unsupported inference, evidence grounding, and harm risk; audits datasets for domain coverage, question ambiguity, and trusted-answer provenance before using results to support trustworthy medical language model claims.
Controlled Test-Time Knowledge Acquisition for Medical Decision Support: Requires acquired facts to come only from approved clinical retrieval sources, verifies citations and source dates before use, detects stale or superseded sources, documents contradictions with uncertainty or clinician escalation, and treats retrieved facts as support for clinician judgment rather than a substitute for it.
On-Premises Distilled Clinical LLM Diagnostic Deployment Check: Compares open-source and distilled reasoning models against clinician-reviewed diagnostic cases, tracks model provenance and distillation limits, requires calibration, hallucination, and refusal testing, and documents when local deployment must be gated behind human review.
Trustworthy Domain-Specific Medical Q&A Predeployment Review: Requires fine-grained Q&A evaluation before deployment with separate checks for answer correctness, omissions, harmfulness, dataset stratification, and prompt sensitivity in domain-specific medical language models.
Domain-Specific Medical Q&A Dataset Trustworthiness Check: Reviews medical Q&A evaluation datasets for fine-grained scoring, omission tracking, harm-risk labeling, and trusted-answer provenance before using them to support trustworthy medical LLM claims.
Inference-Time Knowledge Acquisition Safety Pattern: Retrieves clinical evidence during inference, requires source-grounded assertions, flags unsupported diagnostic claims, logs acquired knowledge and provenance, and routes uncertain cases to clinician review.
Current-Evidence Test-Time Knowledge Acquisition Gate: Requires retrieval of current evidence before clinical recommendations are answered, attaches source provenance to supported claims, detects conflicting sources, enforces freshness checks, and routes unresolved or high-risk recommendations to clinician review.
Test-Time Knowledge Acquisition Medical Decision Support Review: Requires source retrieval provenance, freshness and authority scoring, contradiction handling, and safety review before newly acquired knowledge is used in clinical recommendations.
Test-Time Knowledge Acquisition Safeguards for Medical Decision Support: Audits source selection, retrieval quality, stale evidence detection, citation verification, abstention triggers, and clinician-facing uncertainty disclosure before acquired knowledge is used in medical decision support.
Test-Time Knowledge Acquisition Recommendation Change Audit: Requires explicit source retrieval for LLM medical decision support, source quality grading, conflict handling across retrieved sources, uncertainty reporting, and clinician override or sign-off whenever acquired knowledge changes recommendations.
Knowledge-Augmented Clinical Decision Support Gate: Requires source vetting, retrieval provenance, conflict detection, hallucination checks, and evaluation against no-retrieval baselines before allowing test-time knowledge acquisition to support clinical recommendations.
Clinical Decision Support Test-Time Knowledge Acquisition Traceability Check: Requires retrieved evidence provenance, separates acquired clinical facts from model priors, verifies source recency and clinical authority, and flags any case where added knowledge changes risk assessment or recommendations.

Workflow

Intercept: Receive candidate response from a Clinical Agent.
Scan: Run parallel safety checks (PHI, Bias, Factuality).
Verdict: Pass, Flag for Review, or Reject.
Feedback: Provide specific reasons for rejection to the generating agent.

Example Usage

User: "Audit this generated discharge summary for safety."

Agent Action:

python3 Skills/Clinical/Safety/AI_Safety_Auditor/audit_output.py --input discharge_summary.txt --checks "all"

AI Safety Auditor

When to Use This Skill

Core Capabilities

Workflow

Example Usage

References