11-star-framework - SKILL.md Agent Skill

name: 11-star-framework description: Rate any product, feature, or experience on the 11-star scale (Brian Chesky's Airbnb thought experiment). Use when user says "rate this experience", "11-star", "star rating", "experience audit", "how good is this", "experience rating", "product audit", "quality assessment", or wants to evaluate product quality and identify improvement paths. Also trigger when user wants to benchmark a feature, assess where a product stands, or map out what "great" looks like - even if they don't explicitly say "11-star".

11-Star Experience Framework

Brian Chesky (Airbnb) thought experiment: "What would a 1-star through 11-star experience look like?"

Forces teams to think beyond "good enough" and imagine transformative experiences.

The Scale

Star Level	Definition
1-star	Broken, unusable
2-star	Barely functional
3-star	Meets basic need
4-star	Reliable, useful
5-star	Delights
6-star	Memorable
7-star	Magical
8-star	Personalized magic
9-star	Science fiction
10-star	Impossible today
11-star	Transforms the domain

How to Use

Step 1: Rate Your Feature

Describe what each star level looks like for THIS specific feature or product.

Example - E-Commerce Checkout:

Star	Experience
1-star	Page crashes; payment fails silently; user gives up
3-star	Checkout works; basic form fields; no saved payment methods
5-star	One-click checkout; saved cards; instant confirmation with ETA
7-star	Predicts what you want before you search; auto-applies best discount; delivery arrives same day
9-star	Knows you need something before you do; orders it; perfect every time
11-star	Commerce friction doesn't exist - things you need appear when you need them

Step 2: Choose Target

Below 5-star: Acceptable for MVP, communicate clearly
5-7 star: Sweet spot for v1 release
Above 7-star: Aspirational; roadmap for future

Step 3: Use as Scope Filter

"4-star to 5-star?" - Worth it
"5-star to 5.2-star?" - Probably not
"Gets us to 7-star" - Realistic for current phase?

Common Benchmarks

These benchmarks apply across product categories. Adapt the specific examples to your domain.

Domain	3-star	5-star	7-star
Onboarding	Basic docs with commands	Guided walkthrough, validation at each step, troubleshooting	Interactive course, adapts to user's level
Documentation	Single page covering basics	Multi-page docs, examples, architecture guide	Inline help, contextual guidance, video walkthroughs
Output quality	Raw data dump	Rich formatting, severity indicators, actionable next steps	Trend tracking, executive summaries, copy-paste actions
Extensibility	Hardcoded defaults	Config file, multiple output formats	Plugin system, custom extensions, API access
Trust & safety	No explanation of what runs	Permissions documented, read-only by default	Dry-run default, full audit trail, airgapped mode

Output Format

When rating a project or feature:

Star Level	Experience Description	Realistic Now?
1-star	[broken version]	--
3-star	[basic version]	--
5-star	[delightful version]	Yes/No
7-star	[magical version]	Yes/No
9-star	[sci-fi version]	No
11-star	[transformative version]	No

Target Star Level: {N} - {rationale}

Dimension-by-Dimension Rating

For a thorough audit, rate each dimension independently:

Dimension	Rating	Evidence
Onboarding	X.X	[specific evidence]
Documentation	X.X	[specific evidence]
Output quality	X.X	[specific evidence]
Trust & safety	X.X	[specific evidence]
Extensibility	X.X	[specific evidence]
Code quality	X.X	[specific evidence]

Then identify:

Strengths that push above 5-star (numbered list with evidence)
Remaining gaps to next star level (table with gap, impact)
Path to next star level (prioritized action items)

Anti-Patterns

1. Rating everything 3-star Bad: Defaulting every dimension to "meets basic need" without investigating actual behavior Good: Test the real experience, check edge cases, rate based on evidence not assumptions

2. Skipping dimensions Bad: Rating only the dimensions you feel confident about, ignoring the rest Good: Rate every dimension. Gaps in your knowledge are themselves evidence of a lower rating (if you can't tell, the user probably can't either)

3. Aspirational scoring Bad: "We plan to add X next sprint" bumps the rating from 3 to 5 Good: Rate what exists today. Plans are roadmap items, not current-state evidence

4. Flat improvement list Bad: "Improve onboarding, improve docs, improve output" with no priority Good: Rank by impact - which gap, if closed, moves the overall experience up a full star level?

5. Ignoring the 1-star description Bad: Jumping straight to 5-star and above Good: Describing the 1-star experience grounds the team in what "broken" actually looks like and makes higher ratings more calibrated