gene-bench-experience-control - SKILL.md Agent Skill

name: gene-bench-experience-control description: "Strategy Gene methodology for experience-driven test-time control in LLM agents. Compact control-oriented experience representation (230 tokens) outperforms documentation-heavy Skill (2500 tokens) by +3.0pp. Core principle: encode experience as control signal, not documentation. Includes GEP protocol for gene evolution, AVOID directive patterns, and selective experience accumulation. Trigger: experience reuse, test-time control, skill representation, agent memory, experience evolution, strategy gene, GEP, prompt engineering for agents."

Strategy Gene: Control-Oriented Experience Representation

Paper: "From Procedural Skills to Strategy Genes" (arXiv:2604.15097, Wang et al., Tsinghua/EvoMap, 2026-04)

Core Insight

Experience reuse effectiveness depends on representation form, not content quantity. A compact Gene (~~230 tokens, +3.0pp) beats a full Skill (~~2500 tokens, -1.1pp). The key shift: from documentation-oriented to control-oriented experience objects.

Gene Structure

g = (m, u, pi, alpha, c, v) -- all fields compact, control-facing:

m (signals_match): Keywords/trigger cues for task matching (2-5 terms)
u (summary): One-sentence compact description of the strategy
pi (strategy): Short ordered strategy steps, include explicit AVOID directives
alpha (avoid cues): Failure-aware warnings -- what NOT to do
c (constraints): Optional execution constraints (time, API limits)
v (validation): Optional executable checks or validation hooks

Gene Template

signals: [keyword1, keyword2, ...]
summary: One sentence describing what this gene controls
strategy:
  1. Step one (concrete action)
  2. Step two
  AVOID:
  - Common failure mode A
  - Common failure mode B
constraints: (optional)
validation: (optional)

Key Design Principles

1. Control-Oriented, Not Documentation-Oriented

DO: Focus on actionable steps and failure warnings
AVOID: Overview sections, background context, API reference dumps, example code blocks
Rule of thumb: If removing a section does not hurt control, remove it

2. AVOID Directives Are High-Value

Failure warnings alone outperform strategy+failure bundles (+4.6pp vs +0.7pp). Distill failures into compact AVOID cues, not verbose logs.

3. Structure Matters Beyond Content

Same content in structured form (54.0%) vs flattened prose (50.5%) -- 3.5pp gap. Use explicit schema, not flowing text.

4. Selective Compression Over Additive Growth

Appending raw failure history to a working Gene dilutes it (-2.0pp)
Best practice: compress failures into focused AVOID warnings, replace rather than accumulate

5. Single Targeted Gene Over Multiple Composed Genes

Naive multi-gene composition collapses performance (54.0% to 44.9%). Select ONE most relevant gene per inference context. If multiple apply, prioritize by specificity.

6. Outdated Framing Still Useful

A gene with stale algorithm but correct problem framing (56.6%) beats no guidance. Preserve problem-structure insights even when specific solutions evolve.

GEP Protocol (Gene Evolution Protocol)

Object hierarchy:

Gene: Atomic control unit (the reusable experience chunk)
Capsule: Validated task-level execution path with audit trail
Event: Immutable evolution log (intent, mutations, outcome)

Evolution loop: trial -> validation -> solidification

New experience enters as candidate mutation
Validated via execution checkpoint pass rate
If improved: solidify into canonical Gene form
If degraded: discard, log failure as AVOID cue

When Writing Skills/Genes for Hermes

Apply these principles to Hermes skill creation:

Keep description field focused on control triggers, not general background
Include explicit Pitfalls / AVOID sections with concrete failure modes
Prefer structured lists over prose explanations
When updating skills, compress rather than append -- replace stale content with distilled warnings
One skill per control domain; avoid loading multiple overlapping skills simultaneously

Pitfalls

DO NOT expand a compact gene into a fuller document expecting better performance
DO NOT naively combine multiple genes/skills in one inference context
DO NOT append raw failure logs -- distill into AVOID warnings
DO NOT assume token count alone determines effectiveness -- organization matters
DO NOT make overview/introduction sections -- they hurt Pro models (-4.7pp)