quantbox-autoresearch

name: quantbox-autoresearch description: Use when the user wants to continuously improve a strategy via LLM-driven research loops. Triggers — "improve this strategy", "tune over time", "autoresearch", "search the parameter space", "make this better automatically", "continuous improvement". default_layer: L4 escalation_rules:

to: L5 when: "running as a scheduled cron job (use quantbox autoresearch tick)" status: stub requires_quantbox_min: "0.3.0"

Status: stub. The AutoResearchDriver is not yet implemented. This skill describes the target shape so the LLM-facing API is locked in alongside the architecture. See docs/architecture/autoresearch.md and DEC-0003 for the design.

Pick the layer

Task shape	Use
"One-off improvement loop, run for a few hours"	L4 — `quantbox autoresearch run -c <config>`
"Continuous daily/weekly improvement, budget per tick"	L5 — `quantbox autoresearch tick -c <config>` from cron
"Test the loop wiring before committing budget"	`quantbox autoresearch run --dry-run --max-trials 1 -c <config>`

Default: L4 with a tight first-run budget (~50 trials, ≤$30 LLM cost). Escalate to L5 (cron) only after you've validated the loop config.

What this skill produces

A YAML config (and the supporting research/clients/{client}/ scaffolding) that defines:

Baseline strategy to vary from
Tunable parameters (search space)
Budget (trials, wall clock, LLM cost, compute cost)
Goal + statistical gates (walk-forward, deflated Sharpe, drawdown)
Proposer (Optuna, LLM, or hybrid)
Memory paths (EXPERIMENTS.jsonl, findings.md)
Delivery (Discord channel, candidate threshold)

It does NOT produce a new plugin. The loop tunes existing plugins; it doesn't author them. For new plugin code, see quantbox-strategy-author.

Fast path — typical config

autoresearch:
  baseline: configs/clients/X/strategy.yaml
  search_space:
    params.lookback_days: { range: [30, 252], step: 5 }
    params.vol_target:    { range: [0.08, 0.25], step: 0.01 }

  budget:
    max_trials: 50
    max_wall_clock: 4h
    max_llm_cost_usd: 25
    max_compute_cost_usd: 5

  goal:
    optimize: sharpe_oos
    direction: maximize
    constraints:
      max_drawdown: -0.20
      min_trades_per_year: 12
      deflated_sharpe_min: 0.3

  proposer:
    name: proposer.hybrid.v1
    params:
      llm: anthropic.claude-sonnet-4
      search: optuna.tpe
      diversity_injection: 0.1

  evaluation:
    walk_forward:
      splits: 5
      embargo_days: 5
    bootstrap_ci: 0.95

  memory:
    experiments_jsonl: research/clients/X/EXPERIMENTS.jsonl
    experiments_md:    research/clients/X/EXPERIMENTS.md
    findings_md:       research/clients/X/findings.md
    cap_recent_for_proposer: 20

  termination:
    converged_tolerance: 0.01
    consecutive_no_improve: 5

  delivery:
    discord_channel: "#client-X"
    promotion_candidate_min_lift: 0.15
    auto_summary_every_n_trials: 10

Validate, dry-run, then run:

quantbox autoresearch validate -c configs/clients/X/autoresearch.yaml
quantbox autoresearch run --dry-run --max-trials 1 -c configs/clients/X/autoresearch.yaml
quantbox autoresearch run -c configs/clients/X/autoresearch.yaml

For continuous mode (cron):

quantbox autoresearch tick -c configs/clients/X/autoresearch.yaml --max-trials 5

Defaults the skill enforces

These are mandatory. Refuse to produce a config without them:

Field	Default	Why
`evaluation.walk_forward.splits`	≥ 5	One in-sample number is meaningless
`evaluation.walk_forward.embargo_days`	≥ 5	Prevent train/test leakage
`evaluation.bootstrap_ci`	0.95	Quantify metric uncertainty
`goal.constraints.deflated_sharpe_min`	required	Adjust for multiple-testing inflation
`goal.constraints.max_drawdown`	required	Hard cap on tail risk
`budget.max_trials`	required	Bound the loop
`budget.max_wall_clock`	required	Bound time
`budget.max_llm_cost_usd`	required	Bound LLM spend

If the user asks to skip any of these "for speed," refuse and explain why.

When to escalate to authoring

If the user wants to vary something the existing plugins can't express (e.g., "try a regime-detection layer"), this skill is the wrong tool. Hand off to quantbox-strategy-author to scaffold the new capability first; then come back here to tune it.

The split: this skill tunes; quantbox-strategy-author creates.

Pitfalls to warn about

In-sample p-hacking — without the mandatory gates, the loop will reliably find spurious alpha. Refuse to skip them.
Cost runaway — LLM proposers can rack up bills. The budget is a hard cap, but the user should also know roughly what 50 trials costs (~$30).
Concurrent loops on same memory — one config = one client = one EXPERIMENTS.jsonl. Concurrent writes corrupt state.
Auto-promotion — the loop will NEVER auto-promote to production. The skill must not suggest it. Best the loop produces is a research-status candidate.
findings.md as gospel — it's LLM-summarized prose, advisory only. The truth is in EXPERIMENTS.jsonl.

Result delivery

When the loop exits, it produces three artifacts:

Best candidate at research/clients/{client}/candidates/{trial_id}.yaml — a normal quantbox config you can run independently.
Summary report at research/clients/{client}/autoresearch-report-{date}.md — human-readable.
Discord post in the configured channel — metrics, chart, and a "Promote?" prompt.

The skill's job ends here. Promotion is human-driven via /promote-lock (existing flow). See quantbox-promote.

quantbox-autoresearch