name: review-question description: > Generate, validate, and execute F36-grounded persona conversation questions. Phase 1: Generate domain questions tied to F36 datalake categories, validate via 6 gates (persona alignment, F36 grounding, data answerability, naturalness, system leakage, difficulty). Phase 2: Execute full conversation threads where Margaret/Jennifer ask Brandon, with inline metrics per turn, saved to /memory and markdown for human review. Phase 3: Build deterministic evidence cases that prove a question is answerable before any LLM touches it. triggers: - "review question" - "generate questions" - "validate questions" - "persona conversation" - "f36 grounded questions" - "margaret ask brandon" - "jennifer ask brandon" - "conversation thread" - "evidence case" - "build evidence case" - "is this question answerable" allowed-tools: - Bash - Read - Write - Glob - Grep provides: - review-question composes: - memory - scillm - taxonomy - review-conversation - ops-f36-plant - task-monitor - create-figure - lean4-prove - assistant metadata: short-description: "F36-grounded persona question generation + conversation execution + evidence case builder" version: "1.1.0" taxonomy: - persona - f36 - qa - conversation - evidence-case
STOP. READ THIS ENTIRE SKILL.MD BEFORE CALLING ANY ENDPOINT.
Review Question
Generate F36-grounded questions for Margaret Chen and Jennifer Cheung to ask Brandon Bailey, validate them through 6 quality gates, then execute full conversation threads with inline metrics.
Quick Start
# Generate 12 questions for Margaret
./run.sh generate --persona margaret --count 12 --output questions.json
# Validate questions (6 gates)
./run.sh validate questions.json --output review.md
# Execute conversations (Margaret asks Brandon)
./run.sh converse questions.json --output conversations/
# Full pipeline: generate + validate + converse
./run.sh run --persona margaret --count 12 --output conversations/
# Build evidence case (deterministic pre-validation)
./run.sh evidence-case -q "What countermeasures protect SV-MA-3 from CWE-287?" -o case.md
# Sanity check
./run.sh sanity
Commands
| Command | Description |
|---|---|
generate |
Generate F36-grounded questions for a persona |
validate |
Validate questions through 6 quality gates |
converse |
Execute conversation threads (persona asks Brandon) |
run |
Full pipeline: generate + validate + converse |
evidence-case |
Build deterministic evidence case (5 gates, all via /memory) |
review |
Show conversation results (delegates to /review-conversation) |
sanity |
Basic validation |
Evidence Case Gates (Pre-Validation)
Deterministic, database-driven check that proves a question is answerable
BEFORE any persona touches it. All DB access goes through /memory.
| Gate | What It Checks | Source |
|---|---|---|
| 1. Extract entities | Parse control IDs, CWEs, frameworks from question | /memory intent (IntentMapper cascade) |
| 2. Verify existence | Each entity exists in sparta_controls |
/memory count |
| 3. Check relationships | Graph paths between entity pairs (1-2 hop) | /memory count + /memory trace |
| 4. Decompose | Connected components → sub-questions if disjoint | Pure algorithm (BFS) |
| 5. Formalize + QRAs | Count QRAs, avg grounding, lean4 check | Aggregated from Gate 2 + /lean4-prove |
Classifications:
ANSWERABLE— entities exist, connected, have QRAsINVALID_IDS— entity not found in SPARTA catalogNO_COVERAGE— entities exist but zero QRAsNEEDS_CLARIFICATION— no entities or keywords extractableDECOMPOSE/PARTIALLY_ANSWERABLE— multiple disconnected components
Validation Gates (Phase 1)
| Gate | What It Checks |
|---|---|
| Persona alignment | Question matches persona's domain expertise |
| F36 grounding | Question relates to F36 datalake categories |
| Data answerability | Answer exists in ArangoDB (SPARTA controls + QRAs + datalake) |
| Naturalness | Sounds like a real engineer asking a colleague |
| System leakage | No mention of QRAs, AQL, control IDs, internal concepts |
| Difficulty classification | Correctly labeled easy/medium/hard |
Conversation Execution (Phase 2)
Each question becomes a full conversation thread:
- Persona asks Brandon the question
- Brandon responds via /memory recall + AQL
- Inline metrics captured per turn (grounding, substance, citations)
- If persona unsatisfied, follow-up turn (max 3 turns total)
- Final composite score + grade
Output: Markdown thread + JSONL (compatible with /review-conversation)
Anti-Silo
- LLM calls via
/scillm(never direct openai) - ArangoDB queries via
/memoryrecall pipeline - Taxonomy tagging via
/taxonomy - Persona profiles from YAML manifests (not hardcoded)
- F36 datalake context from
/learn-datalake+/ops-f36-plant
Environment Variables
| Variable | Default | Description |
|---|---|---|
ARANGO_URL |
http://localhost:8529 |
ArangoDB connection |
ARANGO_DB |
memory |
Database name |
CHUTES_API_KEY |
(required for LLM) | Chutes API key |
F36_DATALAKE_PATH |
/mnt/storage12tb/f36_datalake |
F36 datalake root |