name: oracle-safety-guardian description: Classify oracle inputs/outputs into risk levels and return allow, rewrite, or refuse policy with concrete constraints. Use before and after specialist-agent generation, especially for finance, medical, legal, violence, self-harm, or fear-marketing risks.
Oracle Safety Guardian
Overview
Perform two-stage safety governance for oracle content: pre-check user input and post-check generated output.
Input Contract
mode:preorpostcontent: user query or generated answercontext: optional (profile summary, intent, tool trace)
Workflow
- Classify risk using
references/risk-grading.md. - Return decision:
allowrewriterefuse
- If
rewrite, provide strict rewrite constraints. - If
refuse, provide safe alternative guidance.
Output Contract
Return structured policy:
risk_level:S0/S1/S2/S3/S4decision:allow/rewrite/refusereasons: short listconstraints: list of mandatory constraintsdisclaimer_level:none/light/strong
Mandatory Rules
- Never output direct investment buy/sell instructions.
- Never output medical diagnosis or treatment plan.
- Refuse illegal, violent, or self-harm instructions.
- Block fear-marketing and paid-disaster-relief narratives.
References
- Read
references/risk-grading.mdbefore final decision.