strategy-report - SKILL.md Agent Skill

name: strategy-report description: Write a structured trading-strategy assessment, indicator explainer, or model/workflow guide and save it to the repo's docs folder. Use when the user says "write up the final assessment as a report", "write up a detailed summary", "write a detailed summary so I have multiple options", "explain why I'd use one over the other", "assess strategy X vs Y", "document this so we can reproduce it", "write a highly detailed document with all results", "make this my workflow guide", or wants a strategy/indicator/model comparison written to docs. Trading-strategy specific — distinct from the general-purpose tech-writer skill.

Strategy Report

The user repeatedly asks for written assessments — comparing strategy or model variants, explaining how/why an indicator works, recommending one option over another, turning a batch of backtest results into a go-forward playbook — saved to a docs/ folder. The recurring ask: "write up a very detailed summary so I have multiple options. Explain why I would use one over the other ... save to docs folder".

When to use this vs. tech-writer

Use this for trading strategy / indicator / ML-model write-ups. Use tech-writer for general project documentation, READMEs, and API docs.

Core principle: assemble, don't regenerate

A strategy report is a writing and synthesis task, not a compute task. The results almost always already exist — sweep .md files, walk-forward report .md files, saved JSON reports, the metrics printed earlier in the conversation. Build the report from those artifacts. Do not re-run backtests, sweeps, or training to "get the numbers" — that wastes time, and re-running an analysis on an out-of-sample window that has already been used for selection quietly contaminates it (every fresh look at the same OOS data adds a hidden degree of selection freedom). If a genuinely needed number is missing, ask the user whether to source it rather than silently regenerating. If you catch yourself launching a backtest to populate a table, stop — the number is probably already in an artifact you can cite.

Report structure

Save to ~/Personal/Trading/trading-research/docs/research/ (or the current trading repo's docs/), named YYYY-MM-DD_<topic>_report.md. Lead with the recommendation; a reader who stops after section 1 should still know what to do. Sections scale with the topic — a single-indicator explainer may merge some, a multi-model guide may split them — but cover all six concerns.

Bottom line / recommendation — the verdict first: what to deploy, what to use when, in a small table if there are several options. Everything below justifies this section.
Mechanism — how and why each indicator/strategy/model works. Include the math and the pipeline (features, labels, exits, training windows).
Variants & when to use each / when to switch — explicit "use X when…, use Y when…" guidance. If the variants are not mutually exclusive (e.g. session models that run at different times, or a model retrained on a schedule), say when to switch between them and on what trigger — time of day, regime, account type. Segment recommendations by use case when it matters (prop-firm vs personal account often want different parameters).
Backtest evidence — the full stat block per the repo's reporting rules: trades, WR, PnL, avg win/loss, PF, Max DD, Ret/DD, Calmar (+ Sharpe/Sortino if the harness emits them), date range, data path. Never abbreviate to a metric subset. See "Metric provenance" below — every number must say which run produced it.
Caveats & data integrity — the honest-limitations section. What the numbers do not tell you; which results are upper bounds and why; fill- model assumptions; reused OOS windows; borrowed/assumed parameters. A report without this section reads as a sales pitch, not an assessment.
Reproduction — exact commands, configs, blob/data paths (absolute — worktree-relative paths break when the data lives in the main repo), and the execution backend (Metal vs Python). The user has explicitly complained that past results lacked reproduction context — always include it.

Metric provenance — label every number

The same strategy reports wildly different P&L depending on how it was measured, and conflating those numbers is the most damaging mistake a report can make. Always state, next to each figure or as a table caption:

Which run produced it — which model, which parameter set, which threshold.
Which window — train window vs out-of-sample window, and the dates.
What kind of test — a walk-forward (retrain-each-period, validates that the method generalises) reports very different totals than a single fixed-config backtest (sizes the deployed model). Both are legitimate; presenting one as the other is not. If a report shows a walk-forward total and a deploy-threshold total side by side, explain explicitly that they answer different questions.
Optimism flags — if the fill model is known to overstate results (e.g. an OHLCV-only backtest on a thin-liquidity session under-charges slippage and gap-through), mark those results as an upper bound and say so in the Caveats section. Profit factors riding on tiny loss denominators (few losers) are unstable — discount them and say why.

Recommendations — back every claim with a number

A recommendation that isn't traceable to data is an opinion. When you write "deploy X", the same sentence or the next should cite the metrics that make X win — "PF floor 5.80 vs 4.76, worst-month DD −$1,450 vs −$1,708". When asked for "multiple options", give real tradeoffs: the runner-up, the condition under which it would win, and what you'd give up by choosing it. Rank explicitly (1st / 2nd / provisional) and state what would change a ranking — e.g. "Asian moves from provisional to deploy once a slippage stress test clears."

Notes

Cite where result files live (artifact paths) next to every number, so a reader can verify and re-run.
ES is the primary instrument; call out separately if NQ results are included.
If the user frames the ask as "my guide going forward" or "my workflow", add a workflow / next-steps section: the repeatable process and the prioritised open experiments, not just a snapshot of current results.
If the user's question depends on data the backtest can't see (live spread, depth of book, real fills), note which questions an available live API could answer and how — don't silently leave the gap.