name: scenario-test description: Use when you need to validate code changes in Azure before committing, especially for infrastructure, auth, job orchestration, or integration points that cannot be tested locally
AI-Driven Scenario Testing
Analyze code changes, identify which user flows exercise those code paths, generate minimal test scripts, and execute end-to-end scenario tests against production infrastructure with isolated test data.
Core principle: Test code changes with realistic user scenarios at minimal cost by reusing existing prod infrastructure and isolating test data through naming conventions.
When to Use
- Infrastructure changes (Bicep, env vars, managed identity)
- Auth/credential changes that only work inside Azure
- Job orchestration logic (controller, failover)
- Integration points (LLM, email, blob storage)
- Any change that cannot be validated with local unit tests
When NOT to Use
- Pure unit-testable changes — Use
pytest tests/unitinstead - Documentation-only changes — README, comments, docs/
- Frontend-only changes — Use local dev server, Playwright
- Changes already covered by CI — If pushing to trigger CI anyway
- Local config changes — .env, local scripts
Invocation
- Explicit:
/scenario-test - Natural: "test my changes", "verify this fix", "run scenario test"
- With scope:
/scenario-test HEAD~2..HEADor/scenario-test "fixed managed identity auth"
The Process
digraph scenario_test {
rankdir=TB;
"Analyze code changes" [shape=box];
"Trace affected code paths" [shape=box];
"Identify minimal user flow" [shape=box];
"Determine success criteria" [shape=box];
"Generate test script" [shape=box];
"CHECKPOINT: Show script, get approval" [shape=diamond style=filled fillcolor=yellow];
"User approves?" [shape=diamond];
"Execute setup phase" [shape=box];
"CHECKPOINT: Ready to trigger?" [shape=diamond style=filled fillcolor=yellow];
"Execute and poll for completion" [shape=box];
"Verify success criteria" [shape=box];
"Report PASS/FAIL" [shape=box];
"Cleanup test data (always)" [shape=box style=filled fillcolor=lightgreen];
"Abort" [shape=box];
"Analyze code changes" -> "Trace affected code paths";
"Trace affected code paths" -> "Identify minimal user flow";
"Identify minimal user flow" -> "Determine success criteria";
"Determine success criteria" -> "Generate test script";
"Generate test script" -> "CHECKPOINT: Show script, get approval";
"CHECKPOINT: Show script, get approval" -> "User approves?";
"User approves?" -> "Execute setup phase" [label="yes"];
"User approves?" -> "Abort" [label="no"];
"Execute setup phase" -> "CHECKPOINT: Ready to trigger?";
"CHECKPOINT: Ready to trigger?" -> "Execute and poll for completion" [label="yes"];
"CHECKPOINT: Ready to trigger?" -> "Cleanup test data (always)" [label="abort"];
"Execute and poll for completion" -> "Verify success criteria";
"Verify success criteria" -> "Report PASS/FAIL";
"Report PASS/FAIL" -> "Cleanup test data (always)";
}
Phase 1: Analysis
Step 1: Identify Changed Components
Read the git diff to understand what changed:
git diff HEAD~1..HEAD --name-only
git diff HEAD~1..HEAD
Map changed files to components:
src/providers/*.py→ Provider classessrc/jobs/*.py→ Job logic (controller, processor, poller)src/services/*.py→ Business logicsrc/api/*.py→ API endpointsinfra/*.bicep→ Infrastructure configuration
Step 2: Trace Call Graph Upstream
For each changed component, trace who calls it:
Example: ContainerAppsProvider.start_job() changed
← called by: controller.py:trigger_processor_job()
← called by: controller.py:main()
← triggered by: Scheduled job OR manual trigger
← condition: pending_count >= 10 OR oldest_pending >= 7 days
Step 3: Trace Call Graph Downstream
What happens after the changed code runs:
Example: start_job() succeeds
→ processor job starts
→ claims deliveries, transcribes, summarizes, emails
Step 4: Identify Minimal Trigger Point
Determine what's needed to exercise the code path:
To exercise this code path, we need:
- Pending deliveries in DB (to meet batch threshold)
- Controller job to run
We do NOT need:
- Real RSS feeds (deliveries inserted directly)
- Real audio files (if testing auth, not processing)
- Real email recipients (test user domain)
Step 5: Determine Success Criteria
Select appropriate criteria based on what's being tested:
| Change Category | Success Criteria | Rationale |
|---|---|---|
| Auth/Identity | Job execution exists, no auth errors | Auth happens at job start |
| Job orchestration | Correct job triggered, execution exists | Testing job selection logic |
| Transcription | Transcript blob created, non-empty | Need actual processing |
| Summarization | Summary blob created, content reasonable | Need LLM output |
| Email delivery | Delivery marked sent, no errors |
Full end-to-end |
| API endpoints | HTTP 200, correct response | Direct API test |
Phase 2: Script Generation
Generate a Python test script using the helper library. Save to scripts/scenario_testing/generated/scenario_test_{description}_{uuid}.py.
Script Template
#!/usr/bin/env python3
"""Scenario test: {description}
Auto-generated by scenario-test skill
Generated: {timestamp}
Cleanup: Automatic on completion/error
Code change tested:
{file}: {change_summary}
Success criteria:
{criteria}
"""
import sys
import uuid
sys.path.insert(0, ".")
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from scripts.scenario_testing import (
# Constants
CONTROLLER_JOB,
# Test data
cleanup_test_data,
create_test_channel,
create_test_deliveries,
create_test_subscription,
create_test_user,
# Azure ops
deploy_infra,
get_job_logs,
trigger_job,
wait_for_job_completion,
# Verification
check_logs_for_errors,
job_execution_exists,
# Reporting
report_failure,
report_success,
)
from src.config import get_settings
TEST_ID = "{scenario}-{uuid}"
def get_db_session():
settings = get_settings()
engine = create_engine(settings.database_url)
Session = sessionmaker(bind=engine)
return Session()
def setup(db):
"""Insert test data into prod DB."""
# Generated based on scenario requirements
pass
def execute():
"""Trigger the flow and wait for completion."""
# Generated based on what needs to be triggered
pass
def verify(execution_name: str) -> bool:
"""Check success criteria."""
# Generated based on success criteria
pass
def cleanup(db):
"""Remove all test data."""
cleanup_test_data(db, TEST_ID)
def main():
db = get_db_session()
try:
setup(db)
success, execution_name = execute()
if not success:
report_failure(f"Execution failed: {execution_name}")
sys.exit(1)
passed = verify(execution_name)
if passed:
report_success("Scenario test passed")
sys.exit(0)
else:
logs = get_job_logs(CONTROLLER_JOB, execution_name)
report_failure("Scenario test failed", logs)
sys.exit(1)
finally:
cleanup(db)
db.close()
if __name__ == "__main__":
main()
Phase 3: Checkpoint 1 - Script Approval
Present to user:
- Summary of what changed
- Code paths affected
- Generated test script
- What will be written to prod DB
- Success criteria
Ask: "Ready to execute this test? [yes/modify/abort]"
Phase 4: Setup Execution
If approved:
- Deploy infrastructure changes if any:
deploy_infra("infra/main.bicep", "infra/parameters.prod.json") - Insert test data using helper functions
- Verify setup complete
Phase 5: Checkpoint 2 - Trigger Approval
Present:
- Setup complete confirmation
- What job/API will be triggered
- Expected behavior
Ask: "Ready to trigger the flow? [yes/abort]"
If abort: Run cleanup immediately.
Phase 6: Execution
- Trigger the job or API
- Poll for completion (no artificial timeout)
- Collect logs from relevant components
- Report real-time status updates
Phase 7: Verification
Check success criteria:
- Parse logs for error patterns
- Verify expected state changes
- Report PASS or FAIL with evidence
Phase 8: Cleanup (Always Runs)
cleanup_test_data(db, TEST_ID)
Report: "Cleanup complete. 0 test records remaining."
Test Data Conventions
Naming Patterns
| Entity | Pattern | Example |
|---|---|---|
| Users | test-{scenario}-{uuid}@podsum-test.local |
test-auth-a1b2c3@podsum-test.local |
| Channels | TEST-{scenario}-{uuid} |
TEST-auth-a1b2c3 |
| Auth tokens | Linked to test users only | — |
| Subscriptions | Linked to test users/channels | — |
| Episodes | Linked to test channels | — |
| Deliveries | Linked to test subscriptions | — |
Isolation Guarantees
- Email safety: Test users use
@podsum-test.localdomain which ACS rejects - Query safety: All test data queries filter by test prefixes
- Cascade cleanup: Deleting test users cascades to related records
Helper Library Reference
Import from scripts.scenario_testing:
Test Data Creation
create_test_user(db, test_id) -> User
create_test_channel(db, test_id) -> Channel
create_test_subscription(db, user, channel) -> Subscription
create_test_episode(db, channel, test_id) -> Episode
create_test_deliveries(db, subscription, count=10) -> list[Delivery]
Azure Operations
deploy_infra(bicep_path, parameters_path) -> bool
trigger_job(job_name) -> tuple[bool, str]
wait_for_job_completion(job_name, execution_name) -> JobStatus
get_job_logs(job_name, execution_name) -> str
Verification
job_execution_exists(job_name, execution_name) -> bool
check_logs_for_errors(logs, patterns=None) -> list[str]
check_blob_exists(container, path) -> bool
check_delivery_status(db, delivery_id) -> str | None
Cleanup & Reporting
cleanup_test_data(db, test_id) -> dict[str, int]
report_success(message)
report_failure(message, logs=None)
Common Scenarios
Managed Identity Fix
Changed: infra/main.bicep (AZURE_CLIENT_ID for controller)
Affects: DefaultAzureCredential in ContainerAppsProvider
Code path: controller → start_job() → Azure SDK auth
Trigger: Batch threshold met, controller runs
Success: Processor job execution exists, no auth errors
Batch Threshold Change
Changed: src/jobs/controller.py (threshold logic)
Affects: When batch processing triggers
Code path: controller.main() → check_pending() → trigger_processor()
Trigger: Insert N pending deliveries, run controller
Success: Processor triggered (or not) per new threshold
Email Template Change
Changed: src/services/delivery.py (email content)
Affects: Email body sent to users
Code path: processor → generate_email() → send_email()
Trigger: Full processing flow with test delivery
Success: Delivery marked sent, email content matches template
Red Flags
| Mistake | Consequence | Prevention |
|---|---|---|
| Test data without unique ID | Collision with parallel runs | Always include UUID in test IDs |
| Forgetting cleanup on error | Orphaned test data in prod | finally: block always runs cleanup |
| Running against wrong environment | Test data in wrong place | Skill validates resource group |
| Wrong success criteria | Test passes but doesn't validate fix | Explain choice, user can override |
| Not deploying infra first | Testing old config | Deploy infra before inserting test data |