creative-testing-framework - SKILL.md Agent Skill

name: creative-testing-framework description: Use this skill when the user asks Claude to design a creative testing plan, structure a hook/angle/format test, or decide what to test next. Encodes the principle of isolating one variable per test, batch sizes for statistical confidence, and the iteration cycle that compounds learnings.

Creative Testing Framework

Purpose: Design tests that produce clean signal, not noise. One variable per test. Batch sizes that justify decisions.

THE THREE LEVERS (from Ecom Ops Architect)

Per Jake's Ecom Ops role definition, creative testing consolidates into:

Hook performance — pattern interrupts, curiosity, pain-first
Message clarity — tight problem/solution, proof, objection handling
CTA calibration — soft vs direct, calibrated to funnel temperature

A single test isolates ONE of these three. Never test hook + message + CTA in one ad set — the data won't tell you which moved the needle.

TEST DESIGN PATTERNS

Pattern 1 — Hook Test (most common)

Hold message, format, CTA constant
Vary the hook (3-5 variants)
Run as ads within a single ad set
Outcome: which hook attribute (pattern interrupt vs pain-first vs proof) wins

Pattern 2 — Angle Test

Hold hook style constant
Vary the underlying angle (different avatar/pain/promise)
Run as separate ad sets
Outcome: which avatar or angle is biggest market

Pattern 3 — Format Test

Hold message constant
Vary format (static vs video vs UGC vs slideshow)
Run as separate ad sets OR ads within
Outcome: which format the audience responds to (informs creative production pipeline)

Pattern 4 — Offer Test

Hold creative constant
Vary the offer (bonus stack, guarantee terms, urgency mechanism)
Requires distinct landing pages or PDP variants
Outcome: which offer mechanic moves CVR

Pattern 5 — Awareness Level Test (Eugene Schwartz)

Hold avatar constant
Vary the awareness-level lead (problem-aware → solution-aware → product-aware → most-aware)
Run as separate ad sets
Outcome: which awareness stage the cold audience is actually at

BATCH SIZE LOGIC

For statistical confidence on conversion events:

Spend per ad set	Conversions to expect	Confidence level
<$100	Single digits	Anecdotal — kill obvious failures only
$100-300	5-15	Directional — kill clear losers, hold middle
$300-500	15-30	Confident on extremes — kill bottom, scale top with caveats
$500+	30+	Statistical confidence on most decisions

[VERIFY — these thresholds assume DTC product ~$50-150 AOV. Adjust for high-ticket or low-ticket.]

Rule: Don't make scaling decisions on <15 conversions. The variance is too high.

ITERATION CYCLE (compound learnings)

Test Round 1 → identify winner → Test Round 2 builds on Round 1's winner

Round 1: Test 3 hooks (A, B, C)
  Winner: Hook B
  Insight: Pain-first hooks outperform curiosity for this avatar

Round 2: Build on Hook B's pattern
  Test 3 new pain-first hooks (B-variant-1, B-variant-2, B-variant-3)
  Winner: B-variant-2
  Insight: Specific pain ("morning hand stiffness") beats general pain ("hand pain")

Round 3: Build on B-variant-2's insight
  Test 3 specific-pain hooks across new pain segments
  Winner: ...

After 5-6 rounds, you have a hook bank of 10-15 validated patterns instead of 5-6 random tests.

This is how the brand brain compounds — each test answer goes into dtc-second-brain/raw/ads/ and outputs/, then gets pulled into wiki/hooks-that-work.md on next compile.

ANTI-PATTERNS TO AVOID

Anti-pattern	Why it's wrong
Testing 8 hooks in one ad set	No budget per hook reaches statistical confidence
Testing hook + format + CTA simultaneously	Can't attribute which variable moved the needle
Killing an ad set at $40 spend with 1 purchase	Sample size too small; variance dominates
Running the same creative for 30+ days without rotation	Creative fatigue kills CTR; rotate before the curve drops
Scaling on Day 3 because hook rate looks good	Hook rate predicts attention, not purchase
Testing "different audiences" with the same creative	You're testing audiences, not creative — separate ad set per audience needed
Ignoring Meta's auto-optimization	Letting Meta run for 3-5 days before manual intervention gives the algo room to find pockets

INTEGRATION WITH OTHER MODES

Mode	Handoff to / from this skill
`creative-strategy`	Provides avatar + angles + necessary beliefs → feed test design
`static-ad-generator`	Generates the creative variants for hook tests
`dtc-second-brain`	Receives test results in `raw/ads/`, `raw/performance/`
`campaign-launcher`	Executes the test plan you design here
`post-launch-analysis`	Reads results back; you redesign the next round

OUTPUT FORMAT

Write test plan to plans/test-round-<N>-<date>.md:

# Test Round <N> — <Date>

## Hypothesis
[What you're testing and why]

## Variable being isolated
[Hook / Angle / Format / Offer / Awareness Level]

## Held constant
- [Things NOT varied in this test]

## Variants
1. [Variant A — description]
2. [Variant B — description]
3. [Variant C — description]

## Success criteria
- Primary metric: [hook rate / CTR / CPA]
- Threshold: [specific number]
- Decision rule: [what wins, what kills]

## Sample size plan
- Budget per ad set: $X
- Expected conversions: Y
- Duration: Z days

## What this test rules out
[If variant X wins, we'll know it's NOT [hypothesis Y]]

## Next round depends on
[How this test's outcome shapes the next test]