quant-backtest - SKILL.md Agent Skill

name: quant-backtest description: Institutional-grade Python backtesting framework builder for Codex. Use this skill whenever the user mentions backtesting, quant strategy, alpha model, trading system, signal engine, portfolio backtest, walk-forward optimization, strategy performance, Sharpe ratio calculation, look-ahead bias, transaction cost modeling, or building any systematic trading infrastructure. Also trigger on mentions of vectorized signals, position sizing, mark-to-market, risk metrics (VaR, Sortino, Calmar), regime filters, or factor attribution. If the user says "backtest this idea" or "test my trading strategy", use this skill.

Quant Backtesting Framework

Build modular, institutional-grade Python backtesting systems. Keep the architecture strategy-agnostic and enforce rigorous data handling, transaction cost modeling, and performance attribution.

Purpose

Use this skill to:

design a reusable Python backtesting framework
implement or extend strategy modules
add risk, attribution, or transaction cost logic
debug look-ahead bias and data leakage
scaffold walk-forward and out-of-sample validation workflows

Inputs to Gather

Collect as many of these as are available from the user or repository:

target asset class or instruments
bar frequency and date range
data source or storage format
strategy hypothesis and signal logic
position sizing method
transaction cost assumptions
benchmark or factor model
desired outputs such as charts, metrics, reports, or tests

If important inputs are missing, make reasonable defaults explicit and proceed.

Operating Principles

Prevent look-ahead bias structurally.
Use log returns internally for compounding math.
Prefer vectorized pandas and numpy implementations.
Treat transaction costs as mandatory, not optional.
Separate in-sample fitting from out-of-sample evaluation.
Keep modules loosely coupled and easy to swap.

Execution Plan

1) Inspect the repo or task shape

Find whether the user wants a fresh framework, a strategy added, a bug fixed, or metrics extended.
Identify existing file layout, coding style, and tests.
If the repo already has conventions, follow them.

2) Map the work to this module layout

Use or adapt this structure:

backtest/
├── __init__.py
├── data.py
├── signals.py
├── positions.py
├── tca.py
├── risk.py
├── regime.py
├── strategy.py
├── dashboard.py
└── engine.py

3) Build or update the core modules

Data layer

Define a DataProvider abstract base class.
Default backend should support DuckDB + Parquet.
Preserve timezone-aware timestamps where possible.
Forward-fill sparse data carefully, then drop leading invalid rows.
Standardize outputs to a consistent tabular format.

Signal engine

Use a SignalGenerator base class.
Put raw predictive logic in generate_raw().
Apply .shift(1) in generate() as a non-negotiable invariant.
Keep signals vectorized and timestamp aligned.

Position and accounting engine

Track positions, cash, gross exposure, net exposure, leverage, and mark-to-market equity.
Support at least equal-weight, fully deployed, and volatility-aware sizing.
Ensure position application timing is consistent with shifted signals.

Transaction cost analysis

Model commissions, slippage, and regulatory fees where relevant.
Base cost calculations on turnover or notional traded.
Report both gross and net results, but emphasize net results.

Risk and metrics

Compute at least:

total return
CAGR
annualized volatility
Sharpe
Sortino
max drawdown
Calmar
VaR 95 and 99, historical and parametric

Where feasible, also include:

expected shortfall
beta / alpha attribution
factor regression
turnover
hit rate
exposure statistics

Regime filter

Keep regime logic separate from alpha logic.
Support simple threshold filters and optional probabilistic models such as HMMs.
Shift regime masks when needed so they do not leak future information.

Strategy abstraction

Create a base Strategy class that composes:
- a signal generator
- a position manager
- an optional regime filter

Orchestrator

Build a backtest engine that wires together:
1. data fetch
2. signal generation
3. sizing
4. cost application
5. equity curve construction
6. metric computation

4) Validate correctness

Before finalizing:

confirm signals are shifted
confirm returns and positions are aligned
confirm transaction costs are applied
confirm metrics run on realized strategy returns
confirm tests or sanity checks cover edge cases

5) Deliver useful outputs

Depending on the task, provide:

code changes
a concise architecture summary
example usage
tests
assumptions and next steps
notes on limitations or future extension points

Reference Implementation Notes

Data layer skeleton

from abc import ABC, abstractmethod
import numpy as np
import pandas as pd

class DataProvider(ABC):
    @abstractmethod
    def fetch(self, symbols: list[str], start: str, end: str) -> pd.DataFrame:
        pass

    def log_returns(self, prices: pd.DataFrame) -> pd.DataFrame:
        return np.log(prices / prices.shift(1))

    def clean(self, df: pd.DataFrame) -> pd.DataFrame:
        return df.ffill().dropna(how="all")

Signal safety skeleton

from abc import ABC, abstractmethod
import pandas as pd

class SignalGenerator(ABC):
    @abstractmethod
    def generate_raw(self, data: pd.DataFrame) -> pd.DataFrame:
        pass

    def generate(self, data: pd.DataFrame) -> pd.DataFrame:
        raw = self.generate_raw(data)
        return raw.shift(1)

Core invariant checklist

never override the shifted execution invariant without a very explicit reason
never present uncosted performance as the primary result
never treat in-sample performance as expected live performance
never mix raw prices and return series without clear alignment

Task-Specific Playbooks

A. Build from scratch

Create the module layout.
Implement base abstractions first.
Add one example strategy end to end.
Add a fee model and risk metrics.
Add a minimal dashboard or report output.
Add tests for alignment, look-ahead protection, and metric sanity.

B. Add a strategy

Inspect the current base strategy and signal interfaces.
Implement new signal logic in generate_raw().
Reuse existing sizing, cost, and metrics machinery.
Add tests showing no future leakage.
Include a usage example with realistic parameters.

C. Add risk metrics

Extend the metrics engine rather than scattering calculations.
Keep annualization assumptions centralized.
Document formulas and units.
Add regression or benchmark alignment checks when attribution is involved.

D. Debug look-ahead bias

Check whether signals are shifted.
Check whether rolling windows include the current bar improperly.
Check whether positions and returns are multiplied on the right dates.
Check whether regime filters or benchmark series leak future data.
Add explicit tests that fail when shift protection is removed.

E. Add transaction cost modeling

Identify the portfolio turnover source.
Compute trades from position deltas, not level positions.
Apply cost assumptions consistently by asset class.
Report performance degradation from gross to net.

Output Style

When using this skill, prefer:

production-leaning code over pseudo-code
minimal but clear comments
explicit assumptions
concise summaries of what changed and why
tests when files are modified

Example Quick Start

provider = DuckDBParquetProvider("./data")
strategy = MyStrategy("example")
engine = BacktestEngine(provider, FeeSchedule(pct_of_notional=0.001, slippage_bps=5))
results = engine.run(strategy, ["SPY"], "2010-01-01", "2024-01-01")
print(results["metrics"])

Final Checks

Before finishing, verify:

the framework remains strategy-agnostic
data, signal, execution, accounting, and metrics are modular
performance metrics are based on net realized returns
the result is easy for a future engineer to extend