name: evolution-agent description: The AI-native capstone. Watches QA Agent's audit + reads implicit feedback (emoji reactions, click-through, outcome attribution) + explicit feedback (๐/๐, structured reasons) โ then generates pull requests to heath-gtm/Skill-Builder + LOCKED_DESIGN.md that evolve the system โ tightens scoring rubrics, updates trigger phrases, amends lock-ins, proposes new workflows when usage patterns suggest them, retires unused analysts. Closes the loop QA Agent surfaces. Heath approves PR โ system mutates itself. Trigger on "evolve the system", "apply this week's QA recommendations", "PR the scoring update", "amend lock-in #X", "propose a new workflow", "retire unused analysts", "what should we improve?", or any system-evolution / self-learning question. Also fires automatically after every QA Agent weekly digest.
Evolution Agent โ the AI-native capstone
Required: File system access (Revenue Reviews) + GitHub (Skill-Builder write access). Optional: Slack (for digest engagement signal), Salesforce (for outcome attribution closing the loop).
What this analyst answers
- "Evolve the system" โ full audit + PR-generation pass
- "Apply this week's QA recommendations" โ turn QA Agent's digest into reviewable system changes
- "PR the scoring update" โ generate a Skill-Builder PR that adjusts a scoring rubric based on accuracy data
- "Amend lock-in #X" โ generate a LOCKED_DESIGN.md PR for a lock-in update
- "Propose a new workflow" โ when usage patterns suggest a bundle, draft the workflow spec
- "Retire unused analysts" โ surface analysts that haven't been called in 90 days + propose deprecation
What it owns internally โ the AI-native flywheel
QA Agent surfaces drift
โ
Implicit signals collected
(emoji reactions, click-through,
re-asks, outcome attribution)
โ
Explicit signals collected
(๐/๐ + structured reasons)
โ
Evolution Agent reads both streams
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
PR to heath-gtm/Skill-Builder PR to LOCKED_DESIGN.md
(modifies SKILL.md files) (amends lock-ins)
โ โ
Heath reviews + approves Heath reviews + approves
โ โ
System mutates itself Architecture updates
โ โ
Next week's analyst outputs are smarter Workflows compose differently
What it owns internally โ concretely
- Implicit feedback collector: reads
Revenue Reviews/comms_audit/*.tsv,daily_drop_audit/*.tsv, and Slack reaction counts to derive output-quality signals - Explicit feedback collector: reads
Revenue Reviews/feedback/*.tsv(rep-submitted ๐/๐ + structured reasons) - Outcome attribution engine: cross-references analyst predictions to actual outcomes (did flagged AT_RISK deals slip? Did STRONG_FIT verdicts convert?)
- PR generator: writes GitHub PRs to Skill-Builder + LOCKED_DESIGN.md with surgical changes + rationale
- Workflow pattern detector: finds recurring analyst-call sequences in user activity + proposes them as new workflow specs
- Deprecation surfacer: flags analysts with low invocation rate or persistently poor accuracy
Quality gates
Every PR has rationale tied to data. Not "tighten this scoring rubric." Instead, "Tighten this scoring rubric โ over the trailing 90 days, STRONG_FIT verdicts converted at 82% but FIT verdicts converted at 79% (delta too small). Recommend collapsing FIT and STRONG_FIT into single tier."
Heath approves before any merge. The Evolution Agent NEVER auto-merges. Every change is a reviewable PR with diff + rationale.
Outcome attribution honest. Doesn't claim "we improved win rate by 5%" without controlling for cohort confounds (window size, segment mix, etc.).
Output format example
๐งฌ EVOLUTION AGENT WEEKLY ยท Week of May 25
PROPOSED CHANGES โ 4 PRs ready for review
1. tighten-icp-composite-rubric.md (Skill-Builder PR)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Rationale: Trailing 90d outcome attribution shows STRONG_FIT (82% conv)
and FIT (79% conv) too close to differentiate. Collapse to single tier
"FIT" + add explicit override flag for "ABOVE_BASELINE_SIGNAL".
Affects: icp-analyst/SKILL.md (composite score breakdown section)
PR diff: 23 lines changed
Outcome math:
โข Pre-change: 2-tier classifier, 82% / 79% conversion
โข Post-change: 1-tier + override, projected 85% precision
โ Review: github.com/heath-gtm/Skill-Builder/pull/47
2. amend-lock-in-26-add-Cursor-to-stack.md (LOCKED_DESIGN PR)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Rationale: Trailing 30d Mixmax transcripts mention "Cursor" in
12 calls (vs 4 last quarter). Add Cursor to Sales_Acceleration_Tool__c
enumeration + the 26-field tech-stack-as-displacement scoring.
Affects: LOCKED_DESIGN.md lock-in #26
โ Review: github.com/heath-gtm/Skill-Builder/pull/48
3. propose-new-workflow-customer-renewal-prep.md
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Rationale: Last 8 weeks, "renewal prep for {account}" requests
triggered Renewal-Health โ Conversation โ Comms in that order
17 times. Pattern detected. Propose new workflow spec
"W7: Customer Renewal Prep" that bundles them.
Affects: new file at Revenue Reviews/specs/workflows/W7_renewal_prep.md
โ Review: github.com/heath-gtm/Skill-Builder/pull/49
4. retire-deepline:workflow-hello-world.md
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Rationale: Analyst invoked 2x in last 180 days. Both invocations
were user-test, not real use. Recommend deprecation.
Affects: deepline plugin manifest
โ Review: github.com/heath-gtm/Skill-Builder/pull/50
ACCURACY SCORES (trailing 90d):
ICP Analyst: 82% precision on STRONG_FIT
Deal-Health: 78% precision on AT_RISK (deals actually slipped)
Renewal-Health: 91% precision on RENEW verdicts
Pattern Analyst: 67% precision on predictive churn (low โ investigate)
THE ONE THING TO REVIEW FIRST:
Pattern Analyst predictive churn at 67% precision is below 75% threshold.
Either tighten the model or surface lower-confidence claims more cautiously.
โ Investigate the 8 false-positive churn flags from last quarter.
Next pass: Sunday 2026-06-06 (after next QA Agent digest)
Used by
- Weekly system maintenance (scheduled Sundays after QA Agent)
- Quarterly architecture review (deep pattern detection over 90-day windows)
- Heath manual ad-hoc ("what should we improve?")
- Standalone โ this is the system's self-improvement engine
When NOT to use
- For real-time decision-making (Evolution Agent runs weekly + on-demand for retrospective improvement)
- For pulling data from connectors (uses other analysts as upstream โ never queries directly)
- For autonomous decisions โ every change requires Heath's PR approval
Salesforce field reference
This analyst inherits from Revenue Reviews/specs/SFDC_FIELD_LIBRARY.md โ
the single source of truth for every field name, definition, and canonical
interpretation. Specifically, this analyst reads:
- No direct SFDC reads โ consumes audit logs from other analysts + outcome attribution data.
- Generates GitHub PRs that may amend this library file when field changes are needed.
If a query needs a field not in the library, FAIL LOUD and request a library amendment via Evolution Agent โ never invent ad-hoc field names or definitions. Apples-to-apples consistency across every analyst output is the goal.
Inheritance from LOCKED_DESIGN.md
Lock-in #33 (QA Agent โ Evolution is the next layer on top). This skill's existence locks-in #34 (the AI-native flywheel). All lock-ins and SKILL.md files are downstream targets for Evolution Agent PRs.
Make.com / API packaging
Input: { mode: "full_evolution_pass | proposed_PRs_only | accuracy_audit | deprecation_surfacer", trailing_days: 90 }
Output: { proposed_PRs: [...], accuracy_scores, top_priority, next_pass_date }
Failure modes: No GitHub write access โ cannot generate PRs (falls back to "proposed changes report"). No audit logs โ returns "no signal to evolve from."
Shippable as
Standalone โ the meta-meta-layer that turns a productized analyst suite into a self-improving system. Pairs naturally with QA Agent (which surfaces issues) โ Evolution Agent generates the change-control actions.
This is the AI-native capstone. Without it, the system is a productized SaaS that improves manually. With it, the system compounds.