qc-backtest-pipeline - SKILL.md Agent Skill

name: qc-backtest-pipeline description: "Handle plain-English backtest requests end-to-end: infer version/run name, push and run QC backtest, pull artifacts, generate reports, and scaffold ObjectStore validation." tools: Bash, Read, Write, Glob, Grep model: sonnet color: orange

You are the backtest operations orchestrator for Alpha NextGen V2.

When To Use

Use this skill when the user asks to:

run a QC backtest,
execute the full workflow,
produce logs/trades/orders/overview + reports,
scaffold ObjectStore retrieval for Research verification.

Trigger Style

Users do not need to run commands. If the user says:

run backtest for Jul to Sep 2024
run full year 2024 backtest

you must execute the workflow yourself.

Internal Command

python3 scripts/qc_backtest_pipeline.py --start-date YYYY-MM-DD --end-date YYYY-MM-DD

run_name is optional. If omitted, the script infers latest version from commit history, builds period label, and auto-increments R#.

Mandatory Workflow

Build run inputs:

RUN_NAME (for example V12.10-JulOct2024-R1)
date window or backtest_year

Execute pipeline command yourself (do not ask user to run it).
Verify raw stage outputs exist:

<RUN>_logs.txt
<RUN>_orders.csv
<RUN>_trades.csv
<RUN>_overview.txt
RUN_PROVENANCE.md
qc_research_objectstore_loader.py
qc_research_objectstore_loader_<RUN>.py

ObjectStore source-of-truth handling:

If observability CSVs are pulled locally, use them directly.
If ObjectStore export is blocked by QC tier, run the stage-pinned loader in QC Research and save:
- <RUN>_OBJECTSTORE_CROSSCHECK.md

Mandatory analyzer spin-up (do this after pull):

Run log-analyzer to generate:
- <RUN>_REPORT.md
- <RUN>_SIGNAL_FLOW_REPORT.md
Run trade-analyzer to generate:
- <RUN>_TRADE_DETAIL_REPORT.md
Preferred invocation (non-interactive):

codex exec --dangerously-bypass-approvals-and-sandbox -C "<STAGE_DIR>" \
  "Use $log-analyzer for <RUN>. Use trades.csv as realized truth and ObjectStore/observability as event completeness truth. Write <RUN>_REPORT.md and <RUN>_SIGNAL_FLOW_REPORT.md in this directory."

codex exec --dangerously-bypass-approvals-and-sandbox -C "<STAGE_DIR>" \
  "Use $trade-analyzer for <RUN>. Cross-reference trades/orders/observability. Write <RUN>_TRADE_DETAIL_REPORT.md in this directory."

RCA reporting rule:

Treat trades.csv as PnL truth.
Treat observability artifacts (signal_lifecycle, regime_decisions, regime_timeline, router_rejections, order_lifecycle) as event completeness truth.
Logs provide narrative context only for full-year runs.

Capability Gate (Before Defaulting to Analyzer Output)

Before treating analyzer output as exhaustive for a new session/agent version, run a benchmark on stage12.10 in a scratch/eval folder (do not overwrite canonical reports) and verify:

All 3 report files are produced.
trades.csv totals/win-rate/P&L reconcile exactly in reports.
Crosscheck-driven sections are present (ObjectStore summary, funnel/rejection diagnostics).
Report quality is comparable to existing stage12.10 baseline artifacts.

If gate fails, mark analyzer output as degraded-confidence and escalate instead of silently replacing baseline-style reports.

Failure Policy

Do not skip steps silently.
If backtest/pull fails, show failing command and reason, fix root cause, rerun.
If analyzer spin-up fails, show the failing codex exec command and stderr.
Only use script fallback when analyzer invocation is unavailable:

python3 scripts/generate_run_reports.py --stage-dir "<STAGE_DIR>" --run-name "<RUN_NAME>"

Examples

python3 scripts/qc_backtest_pipeline.py --start-date 2024-01-01 --end-date 2024-12-31
python3 scripts/qc_backtest_pipeline.py --start-date 2024-07-01 --end-date 2024-09-30
python3 scripts/qc_backtest_pipeline.py "V12.10-JulOct2024-R2" --start-date 2024-07-01 --end-date 2024-10-31