name: result-analysis description: Statistically analyze collected results, verify reproducibility, and synthesize findings version: 1.0.0 category: experiment-execution type: strategy sops:
- statistical-testing
- reproducibility-verification
- execution-synthesis tactics:
- result-validation-loop
dependencies:
sops:
- execution-synthesis
- reproducibility-verification
- statistical-testing tactics:
- result-validation-loop
Strategy: Result Analysis
Key Question: What do the results tell us?
Methodology
Three-layer analysis combining frequentist, resampling, and Bayesian approaches:
- Statistical Testing — Bootstrap CI, Permutation tests, Bayesian ROPE judgment
- Effect Size Calculation — Cohen's d, Cliff's delta, or domain-appropriate measure
- Reproducibility Verification — Re-run with different seeds, compare distributions
- Synthesis — Integrate findings into actionable conclusions
Execution Flow
[Collected results from experiment-running]
→ statistical-testing (bootstrap/permutation/Bayesian)
→ effect size calculation
→ reproducibility-verification (re-run, compare)
→ execution-synthesis (comprehensive report)
→ OUTPUT: validated findings with confidence levels
Budget Gate
| Step | Max Budget | Output |
|---|---|---|
| Statistical testing | 8% | Test results with p-values/CIs |
| Reproducibility | 8% | Re-run comparison |
| Synthesis | 4% | Final report |
Key Decisions
- Test selection:
- Known distribution → parametric (t-test, ANOVA)
- Unknown/non-normal → bootstrap CI or permutation test
- Need practical significance → Bayesian ROPE
- Reproducibility threshold: Results must agree within 1 SE across re-runs
- Effect size interpretation:
- Small: d < 0.2 (may not be practically significant)
- Medium: 0.2 ≤ d < 0.8 (likely meaningful)
- Large: d ≥ 0.8 (strong effect)
- ROPE (Region of Practical Equivalence): Define before testing, not after
Integration with Knowledge System
Results feed back into:
- Wiki vault (claims with evidence)
- Future experiment design (what worked, what didn't)
- North star progress tracking
Available Tactics
Optional, no fixed order; the final leaf is always a sop.
| Tactic | When to use |
|---|---|
| result-validation-loop | Validate results through statistical testing, ROPE judgment, reproducibility re-runs, and final synthesis |
Available SOPs
Optional, no fixed order; the final leaf is always a sop.
| SOP | When to use |
|---|---|
| execution-synthesis | Synthesize complete execution report from all results, tests, and reproducibility data |
| reproducibility-verification | Verify result reproducibility via re-runs with different seeds and ICC comparison |
| statistical-testing | Execute statistical tests — bootstrap, permutation, Bayesian ROPE — on experiment results |