name: idea-scoring description: Aggregates all dimension scores into a final idea score (0–100) and issues a verdict. Implements a multiplicative-floor algorithm with Riskiest Assumption Test (RAT). The final output of every validation workflow.
Skill: idea-scoring
Purpose
Produce a single, defensible verdict on an idea by aggregating all available dimension scores using a multiplicative-floor algorithm — a single catastrophic weakness kills the score, just like it kills a real startup. Includes a Riskiest Assumption Test (RAT) to convert the verdict into a concrete next action.
Input
- Idea slug
- One or more of the following (uses whatever is available):
memory/ideas/<slug>/idea.mdmemory/ideas/<slug>/desire_scores.jsonmemory/ideas/<slug>/competitors.jsonmemory/ideas/<slug>/pricing.jsonmemory/ideas/<slug>/cac.jsonmemory/ideas/<slug>/market_size.jsonmemory/ideas/<slug>/distribution.jsonmemory/ideas/<slug>/retention.jsonmemory/ideas/<slug>/complexity.jsonmemory/ideas/<slug>/weighted_signals.json
- Optional:
memory/user_profile.md(for founder-market fit)
Minimum Viable Input
At least 3 of 7 dimensions must have source data. If fewer are available, refuse to score and list what's missing. Two dimensions are mandatory — Demand and Distribution. Without evidence of a real problem and a path to reach users, scoring is meaningless.
Scoring Dimensions
| Dimension | Weight | Source | What it measures |
|---|---|---|---|
| Demand | 20% | desire_scores.json + weighted_signals.json + idea.md |
Real human desire + validated market signals |
| Competition | 10% | competitors.json |
Positioning gaps and defensibility |
| Monetization | 20% | pricing.json + cac.json + market_size.json |
Unit economics viability (LTV:CAC, WTP, market size) |
| Distribution | 20% | distribution.json |
Organic reach, paid viability, founder edge |
| Retention | 15% | retention.json |
Habit formation, churn risk, usage frequency |
| Founder-Market Fit | 15% | user_profile.md + domain overlap with idea |
Builder's edge, domain expertise, distribution advantage |
Dimension Score Mapping
Each dimension maps source data to a 0–100 sub-score using the rubrics below. When source data uses qualitative labels, apply these conversions.
Demand (0–100)
| Condition | Score range |
|---|---|
desire_strength_label = "strong" AND trend_velocity = "rising-fast" |
80–100 |
desire_strength_label = "strong" OR trend_velocity = "rising" |
60–79 |
desire_strength_label = "moderate" AND some signal validation |
40–59 |
desire_strength_label = "weak" OR trend_velocity = "declining" |
15–39 |
| No signal data, speculation only | 0–14 |
Adjust within range: +10 if cross-platform resonance confirmed, +5 if monetization_validated is true in idea.md.
Competition (0–100)
Higher = more favorable competitive landscape (counterintuitive — think of it as "opportunity score").
| Condition | Score range |
|---|---|
market_saturation = "low", clear positioning gaps, no dominant incumbent |
75–100 |
market_saturation = "medium", 1–2 positioning gaps identified |
50–74 |
market_saturation = "high" but differentiation opportunities exist |
25–49 |
market_saturation = "high", no differentiation, dominant incumbents |
0–24 |
Adjust: +10 if top competitor complaints reveal an unserved pain point. -15 if a FAANG-class player owns the category.
Monetization (0–100)
| Condition | Score range |
|---|---|
| LTV:CAC ≥ 3:1 on at least 2 channels, WTP target ≥ $5/mo, viable SOM | 80–100 |
| LTV:CAC ≥ 3:1 on 1 channel, WTP target ≥ $3/mo | 60–79 |
| LTV:CAC ≥ 2:1, WTP target $1–$3/mo, marginal unit economics | 35–59 |
| LTV:CAC < 2:1 OR viability_verdict = "not-viable" | 10–34 |
| No pricing data or CAC data | 0–9 (flag as missing) |
Adjust: +10 if freemium_conversion_estimate > 5%. +5 if market_size_verdict = "large".
Distribution (0–100)
| Condition | Score range |
|---|---|
distribution_verdict = "strong", viral_loop_exists = true |
80–100 |
distribution_verdict = "strong" OR (organic_reach = "high" + creator_economy_fit = "high") |
60–79 |
distribution_verdict = "moderate", at least one viable organic channel |
40–59 |
distribution_verdict = "weak", paid-only path |
15–39 |
| No viable channel identified | 0–14 |
Adjust: +10 if founder has existing audience or distribution edge (from user_profile.md).
Retention (0–100)
| Condition | Score range |
|---|---|
retention_verdict = "sticky", D30 ≥ 20%, habit_formation_score ≥ 4 |
80–100 |
retention_verdict = "sticky" OR D30 ≥ 15% |
60–79 |
retention_verdict = "moderate", D30 ≥ 8% |
40–59 |
retention_verdict = "disposable" OR D30 < 8% |
15–39 |
churn_risk = "high" AND no habit loop |
0–14 |
Adjust: +5 if natural_usage_frequency is daily. -10 if weekly-or-less with no external trigger.
Founder-Market Fit (0–100)
| Condition | Score range |
|---|---|
Strong domain match in strong_domains, distribution advantages align, builder/growth tier |
75–100 |
| Moderate domain overlap OR builder tier with adjacent experience | 50–74 |
| Beginner tier but high motivation and time commitment (≥ 20 hrs/wk) | 30–49 |
| No domain overlap, beginner tier, low time commitment | 0–29 |
If user_profile.md is unavailable, default to 50 (neutral) and flag as missing.
Scoring Algorithm
Step 1 — Compute dimension sub-scores
Apply the mapping rubrics above. Record each as d_i (0–100).
Step 2 — Apply floor penalty (the "Killer Dimension" rule)
Any dimension scoring below 25 is a potential startup killer. Apply this penalty:
floor_penalty = 1.0
for each dimension d_i:
if d_i < 25:
floor_penalty *= (d_i / 25)
This multiplicative penalty means a single catastrophic weakness (score 0–10) can halve or destroy the final score, regardless of how strong other dimensions are. This reflects startup reality: brilliant distribution cannot save a product nobody wants.
Step 3 — Compute weighted base score
weights = {
demand: 0.20,
competition: 0.10,
monetization: 0.20,
distribution: 0.20,
retention: 0.15,
founder_market_fit: 0.15
}
base_score = sum(d_i * w_i for each dimension)
Step 4 — Apply floor penalty and missing-input discount
missing_discount = available_dimensions / total_dimensions
adjusted_score = base_score * floor_penalty * missing_discount
final_score = round(clamp(adjusted_score, 0, 100))
The missing-input discount ensures that ideas scored on only 3 of 7 dimensions can never reach the "pursue" tier without completing more analysis.
Step 5 — Determine confidence level
| Available dimensions | Confidence |
|---|---|
| 6–7 of 7 | high |
| 4–5 of 7 | medium |
| 3 of 7 (minimum) | low |
Step 6 — Issue verdict
| Score | Verdict | Meaning |
|---|---|---|
| 75–100 | pursue | Strong across dimensions. Build an MVP. |
| 55–74 | test | Promising but unproven. Run the RAT experiment first. |
| 35–54 | pivot | Structural weakness. Use pivot-engine to explore alternatives. |
| 0–34 | drop | Fatal flaws. Move to next idea. |
Riskiest Assumption Test (RAT)
Every idea rests on assumptions. The RAT identifies the single assumption that, if wrong, kills the idea — and designs the cheapest possible experiment to test it before building anything.
RAT Identification Process
List all assumptions embedded in the idea (drawn from dimension scores and source data):
- Demand: "People actually have this problem and will seek a solution"
- Monetization: "Users will pay $X/mo for this"
- Distribution: "We can reach users via [channel] at acceptable cost"
- Retention: "Users will come back [frequency]"
- Competition: "Our differentiator matters to users"
- Founder fit: "I can build this with my current skills/resources" (using user_profile.md if available)
Rank by two axes (each 1–5):
- Criticality: If wrong, how dead is the idea? (5 = instant kill)
- Uncertainty: How little evidence do we have? (5 = pure speculation)
RAT = assumption with highest (criticality × uncertainty). Ties broken by criticality.
RAT Experiment Design
For the identified RAT, design an experiment following these constraints:
| Constraint | Requirement |
|---|---|
| Time to run | ≤ 2 weeks |
| Cost to run | ≤ $100 (indie budget) |
| Signal type | Behavioral (what people DO, not what they SAY) |
| Sample size | Minimum credible: 30 responses or 100 landing page visitors |
Experiment types by assumption category
| Assumption category | Experiment template |
|---|---|
| Demand exists | Landing page with email capture. Pass: ≥ 10% signup rate from ≥ 100 targeted visitors. |
| WTP is real | Landing page with price shown + "buy" button (payment step). Pass: ≥ 3% click-to-buy from ≥ 100 visitors. |
| Distribution works | Run 1 channel for 7 days (e.g., 5 TikToks, 10 Reddit posts, ASO test). Pass: CAC below modeled threshold. |
| Retention holds | Concierge MVP or manual-ops version with 10–30 users for 14 days. Pass: ≥ 3 return sessions per user. |
| Differentiator matters | Show competitor + your concept side-by-side to 30 target users. Pass: ≥ 60% prefer your concept. |
Pass/Fail Threshold
Define the threshold before running the experiment. The threshold is written into scores.json so it can be evaluated later. Thresholds must be:
- Specific: a number, not "good engagement"
- Time-bound: measured within the experiment window
- Binary: pass or fail, no "sort of passed"
Process (step by step)
- Load all available dimension files from
memory/ideas/<slug>/. - Check minimum viable input (≥ 3 dimensions, including Demand and Distribution). If not met, refuse and list missing inputs.
- Map each dimension's source data to a 0–100 sub-score using the rubrics.
- Compute
floor_penaltyfrom any sub-scores below 25. - Compute
base_scoreusing weighted sum. - Apply
floor_penaltyandmissing_discountto getfinal_score. - Determine
score_confidence. - Issue
verdictfrom threshold table. - Identify
top_strengths(top 2 dimensions) andtop_weaknesses(bottom 2 dimensions). - Run RAT identification: list assumptions, score criticality × uncertainty, select the riskiest.
- Design RAT experiment with pass/fail threshold.
- If this is a pivot re-score, write to
pivot_scores.jsoninstead.
Output
Write to memory/ideas/<slug>/scores.json (or pivot_scores.json for re-scores):
{
"dimension_scores": {
"demand": 0,
"competition": 0,
"monetization": 0,
"distribution": 0,
"retention": 0,
"founder_market_fit": 0
},
"weights_applied": {
"demand": 0.20,
"competition": 0.10,
"monetization": 0.20,
"distribution": 0.20,
"retention": 0.15,
"founder_market_fit": 0.15
},
"floor_penalty": 1.0,
"base_score": 0,
"missing_discount": 1.0,
"final_score": 0,
"verdict": "pursue | test | pivot | drop",
"score_confidence": "high | medium | low",
"missing_inputs": [],
"top_strengths": [
{ "dimension": "", "score": 0, "reason": "" }
],
"top_weaknesses": [
{ "dimension": "", "score": 0, "reason": "" }
],
"killer_dimensions": [],
"riskiest_assumption_test": {
"assumption": "",
"category": "demand | monetization | distribution | retention | competition | founder_fit",
"criticality": 0,
"uncertainty": 0,
"rat_score": 0,
"experiment": {
"type": "",
"description": "",
"duration": "",
"estimated_cost": "",
"pass_threshold": "",
"fail_action": "pivot | drop | re-test with different channel"
},
"all_assumptions_ranked": [
{ "assumption": "", "criticality": 0, "uncertainty": 0, "rat_score": 0 }
]
}
}
Notes
- Re-scoring pivots: When scoring a pivot variant from
pivot_options.json, write output tomemory/ideas/<slug>/pivot_scores.json. Include apivot_idfield referencing the option. - Score decay: If source data is older than 90 days (check
analyzed_ator file timestamps), apply a 10% confidence penalty and flag stale inputs. - Geometric vs. additive: The floor penalty provides multiplicative dynamics (one zero kills the score) while the weighted sum provides interpretable dimension contributions. This hybrid outperforms pure additive (hides fatal flaws) and pure geometric (too punishing for moderate weaknesses).