name: strategy-report description: Write a structured trading-strategy assessment, indicator explainer, or model/workflow guide and save it to the repo's docs folder. Use when the user says "write up the final assessment as a report", "write up a detailed summary", "write a detailed summary so I have multiple options", "explain why I'd use one over the other", "assess strategy X vs Y", "document this so we can reproduce it", "write a highly detailed document with all results", "make this my workflow guide", or wants a strategy/indicator/model comparison written to docs. Trading-strategy specific — distinct from the general-purpose tech-writer skill.
Strategy Report
The user repeatedly asks for written assessments — comparing strategy or model
variants, explaining how/why an indicator works, recommending one option over
another, turning a batch of backtest results into a go-forward playbook — saved
to a docs/ folder. The recurring ask: "write up a very detailed summary so I
have multiple options. Explain why I would use one over the other ... save to
docs folder".
When to use this vs. tech-writer
Use this for trading strategy / indicator / ML-model write-ups. Use
tech-writer for general project documentation, READMEs, and API docs.
Core principle: assemble, don't regenerate
A strategy report is a writing and synthesis task, not a compute task. The
results almost always already exist — sweep .md files, walk-forward report
.md files, saved JSON reports, the metrics printed earlier in the
conversation. Build the report from those artifacts. Do not re-run
backtests, sweeps, or training to "get the numbers" — that wastes time, and
re-running an analysis on an out-of-sample window that has already been used
for selection quietly contaminates it (every fresh look at the same OOS data
adds a hidden degree of selection freedom). If a genuinely needed number is
missing, ask the user whether to source it rather than silently regenerating.
If you catch yourself launching a backtest to populate a table, stop — the
number is probably already in an artifact you can cite.
Report structure
Save to ~/Personal/Trading/trading-research/docs/research/ (or the current
trading repo's docs/), named YYYY-MM-DD_<topic>_report.md. Lead with the
recommendation; a reader who stops after section 1 should still know what to
do. Sections scale with the topic — a single-indicator explainer may merge
some, a multi-model guide may split them — but cover all six concerns.
- Bottom line / recommendation — the verdict first: what to deploy, what to use when, in a small table if there are several options. Everything below justifies this section.
- Mechanism — how and why each indicator/strategy/model works. Include the math and the pipeline (features, labels, exits, training windows).
- Variants & when to use each / when to switch — explicit "use X when…, use Y when…" guidance. If the variants are not mutually exclusive (e.g. session models that run at different times, or a model retrained on a schedule), say when to switch between them and on what trigger — time of day, regime, account type. Segment recommendations by use case when it matters (prop-firm vs personal account often want different parameters).
- Backtest evidence — the full stat block per the repo's reporting rules: trades, WR, PnL, avg win/loss, PF, Max DD, Ret/DD, Calmar (+ Sharpe/Sortino if the harness emits them), date range, data path. Never abbreviate to a metric subset. See "Metric provenance" below — every number must say which run produced it.
- Caveats & data integrity — the honest-limitations section. What the numbers do not tell you; which results are upper bounds and why; fill- model assumptions; reused OOS windows; borrowed/assumed parameters. A report without this section reads as a sales pitch, not an assessment.
- Reproduction — exact commands, configs, blob/data paths (absolute — worktree-relative paths break when the data lives in the main repo), and the execution backend (Metal vs Python). The user has explicitly complained that past results lacked reproduction context — always include it.
Metric provenance — label every number
The same strategy reports wildly different P&L depending on how it was measured, and conflating those numbers is the most damaging mistake a report can make. Always state, next to each figure or as a table caption:
- Which run produced it — which model, which parameter set, which threshold.
- Which window — train window vs out-of-sample window, and the dates.
- What kind of test — a walk-forward (retrain-each-period, validates that the method generalises) reports very different totals than a single fixed-config backtest (sizes the deployed model). Both are legitimate; presenting one as the other is not. If a report shows a walk-forward total and a deploy-threshold total side by side, explain explicitly that they answer different questions.
- Optimism flags — if the fill model is known to overstate results (e.g. an OHLCV-only backtest on a thin-liquidity session under-charges slippage and gap-through), mark those results as an upper bound and say so in the Caveats section. Profit factors riding on tiny loss denominators (few losers) are unstable — discount them and say why.
Recommendations — back every claim with a number
A recommendation that isn't traceable to data is an opinion. When you write "deploy X", the same sentence or the next should cite the metrics that make X win — "PF floor 5.80 vs 4.76, worst-month DD −$1,450 vs −$1,708". When asked for "multiple options", give real tradeoffs: the runner-up, the condition under which it would win, and what you'd give up by choosing it. Rank explicitly (1st / 2nd / provisional) and state what would change a ranking — e.g. "Asian moves from provisional to deploy once a slippage stress test clears."
Notes
- Cite where result files live (artifact paths) next to every number, so a reader can verify and re-run.
- ES is the primary instrument; call out separately if NQ results are included.
- If the user frames the ask as "my guide going forward" or "my workflow", add a workflow / next-steps section: the repeatable process and the prioritised open experiments, not just a snapshot of current results.
- If the user's question depends on data the backtest can't see (live spread, depth of book, real fills), note which questions an available live API could answer and how — don't silently leave the gap.