evolution-engine - SKILL.md Agent Skill

name: evolution-engine description: Domain knowledge for the Evolution Engine — LLM-powered autonomous strategy discovery from raw OHLCV data. Covers the generate-backtest-select-evolve loop, vectorized backtesting, out-of-sample validation, and strategy graduation. Use when discovering trading patterns, running backtests, evolving strategies, or reviewing evolution logs. Triggers on "evolve", "discover patterns", "backtest", "evolution", "strategy generation", "candidate strategy".

Evolution Engine

Overview

The Evolution Engine autonomously discovers trading strategies from raw price data. It uses LLM-powered pattern generation combined with vectorized backtesting to evolve, test, and graduate viable trading rules — without manual rule writing.

This is not parameter optimization on a known strategy. It's open-ended strategy discovery: the LLM proposes novel entry/exit logic, the engine validates it against real data, and natural selection eliminates the losers.

How It Works

The Evolution Loop

OHLCV Data → LLM Generation → Vectorized Backtest → Selection → Mutation → Repeat
                                                          ↓
                                                    Out-of-Sample Validation
                                                          ↓
                                                    Graduated Strategies

Step-by-Step

Data Fetch: Pull OHLCV candles from Binance public API (no key needed)
Generate: LLM analyzes price patterns and proposes N candidate strategies (entry/exit rules, position sizing, stop loss)
Backtest: Each candidate is backtested vectorized (numpy, no loop-per-candle) for speed
Score: Candidates scored by Sharpe ratio, win rate, max drawdown, total return
Select: Top K candidates survive. Bottom candidates are eliminated (graveyard).
Mutate: LLM takes survivors and generates variations (parameter tweaks, rule modifications)
Repeat: Steps 3-6 for N generations
Validate: Final survivors are tested on held-out out-of-sample data
Graduate: Strategies that pass OOS validation are marked as graduated

Key Design Decisions

LLM generates rules, not parameters. The engine doesn't optimize MACD(12,26,9) → MACD(14,28,10). It discovers entirely new rule combinations.
Vectorized backtesting. No candle-by-candle loops. Numpy vectorized operations make backtests 100x faster than event-driven simulators.
OOS validation is mandatory. In-sample performance means nothing. Only OOS-validated strategies graduate.
Graveyard is data. Failed strategies are logged with failure reasons. This prevents re-discovering the same dead ends.

MCP Tools

Tool	Purpose
`evolution_fetch_market_data`	Fetch OHLCV data from Binance for a symbol/timeframe/period
`evolution_discover_patterns`	LLM-powered pattern discovery — generates N candidate strategies
`evolution_run_backtest`	Backtest a single candidate — returns Sharpe, win rate, drawdown
`evolution_evolve_strategy`	Full evolution loop: generate → backtest → select → mutate × N generations
`evolution_get_log`	History of evolution runs: graduated strategies, graveyard, metrics

Backtest Metrics

Every backtest produces:

Metric	Minimum for Graduation
Sharpe Ratio	> 1.0 (OOS)
Win Rate	> 40%
Max Drawdown	< 25%
Number of Trades	> 30 (statistical significance)
Profit Factor	> 1.2

These thresholds are guidelines. Context matters — a Sharpe of 0.9 with 500 trades may be more reliable than 2.5 with 15 trades.

Best Practices

Before Running Evolution

Choose the right timeframe. 1h and 4h produce the most tradeable strategies. 1m is noise. 1d may not have enough data points.
Use enough data. 90 days minimum for 1h data. 180 days for 4h. Less data = more overfitting risk.
Start small. 3 generations × 10 candidates is a good starting point. Don't jump to 10 × 50.

During Evolution

Don't interrupt. Each generation builds on the previous. Stopping mid-run wastes compute.
Monitor the graveyard. If 90% of candidates fail on the same metric (e.g., max drawdown), the symbol/timeframe may not be suitable.
Watch for convergence. If surviving strategies across generations look increasingly similar, the engine has found a local optimum.

After Evolution

Never deploy without OOS validation. In-sample results are marketing, not science.
Paper trade first. Even OOS-validated strategies should be paper traded for 2-4 weeks.
Check regime sensitivity. A strategy discovered in a trending market may fail in ranging conditions. Test across multiple market regimes.
Log everything. Use evolution_get_log to review what was tried, what failed, and why.

When NOT to Use Evolution

Not for parameter optimization. If you already have a strategy and just want to tune parameters, use a traditional optimizer.
Not for HFT. The engine works on candle data, not tick data. Sub-minute strategies need different infrastructure.
Not as a replacement for domain knowledge. Evolution discovers patterns, but you still need to understand why a pattern works before risking real money.

Common Mistakes

Mistake	Why It's Bad	Fix
Too few data points	Strategies overfit to noise	Use 90+ days for 1h, 180+ for 4h
Skipping OOS validation	In-sample Sharpe of 3.0 means nothing	Always validate on held-out data
Too many generations	Overfitting through excessive selection pressure	3-5 generations is usually sufficient
Deploying immediately	No buffer for regime changes	Paper trade 2-4 weeks first
Ignoring the graveyard	Re-discovering dead strategies wastes compute	Review `evolution_get_log` before new runs
Using correlated symbols	BTCUSDT and ETHUSDT strategies overlap heavily	Test on uncorrelated markets

Requirements

ANTHROPIC_API_KEY — Required for LLM-powered pattern discovery
Binance public API — Used for OHLCV data (no API key needed)
Python with numpy — For vectorized backtesting