name: kpi-frameworks description: KPI hierarchy definition, north star metrics, OKR trees, and leading vs. lagging indicator frameworks. Use when defining success metrics for a product, building a measurement plan, or evaluating whether current metrics actually reflect business health.
KPI Frameworks
Metrics that don't connect to business outcomes are decoration. This skill covers how to build a coherent metric hierarchy — from north star down to feature-level indicators — so that every measurement tells you something actionable.
When to Use This Skill
- Defining success metrics for a new product or feature
- Auditing whether existing metrics reflect what actually matters
- Building an OKR tree for a planning cycle
- Distinguishing leading from lagging indicators for a feature
- Designing the measurement plan for a product evaluation
Core Concepts
1. North Star Metric
The north star metric is the single number that best captures the core value your product delivers to users. It should:
- Reflect genuine user value, not proxy activity
- Correlate with long-term business health
- Be something the product team can influence directly
| Product Type | Example North Star |
|---|---|
| Marketplace | GMV (Gross Merchandise Value) |
| SaaS | Weekly Active Users (WAUs) or NRR |
| Content platform | Time in product (engaged minutes) |
| Communication tool | Messages sent per active user |
| Payments product | Transaction volume |
North Star ≠ Revenue. Revenue is a lagging outcome; the north star explains why revenue happens.
2. Leading vs. Lagging Indicators
| Type | Definition | Example | Lag Time |
|---|---|---|---|
| Leading | Predicts future outcomes; actionable early | Onboarding completion rate | Days to weeks |
| Lagging | Confirms outcomes after they've occurred | Monthly revenue, annual churn | Weeks to months |
For product evaluation, leading indicators let you course-correct before lagging indicators confirm a problem. Track both but act on leading indicators.
3. Guardrail Metrics
Guardrail metrics are things that must NOT get worse when you optimize for your primary metric. Every feature evaluation needs them.
Optimizing for: Checkout conversion rate
Guardrails:
- Cart abandonment rate must not increase > 2pp
- Support ticket volume must not increase > 10%
- Average order value must not decrease > 5%
Metric Hierarchy
North Star Metric
├── Revenue Metrics (lagging)
│ ├── MRR / ARR
│ ├── ARPU
│ └── LTV
├── Retention Metrics
│ ├── D7 / D30 / D90 retention
│ ├── Monthly churn rate
│ └── Net Revenue Retention (NRR)
├── Engagement Metrics (leading)
│ ├── DAU / WAU / MAU
│ ├── Session depth
│ └── Core action completion rate
└── Acquisition Metrics
├── New user activations
├── Activation rate (registered → activated)
└── CAC by channel
OKR Tree Format
OKRs (Objectives and Key Results) translate strategic intent into measurable quarterly commitments.
## Objective: [What we're trying to achieve — qualitative]
### KR1: [Measurable outcome 1]
Current baseline: [X]
Target: [Y]
Owner: [Team / Person]
### KR2: [Measurable outcome 2]
Current baseline: [X]
Target: [Y]
Owner: [Team / Person]
### KR3: [Measurable outcome 3]
Current baseline: [X]
Target: [Y]
Owner: [Team / Person]
OKR Quality Checklist
- Objective is inspiring and direction-setting, not a task
- Each KR is a measurable outcome, not a deliverable ("ship X" is not a KR)
- Each KR has a current baseline documented
- Targets are ambitious but achievable (70–80% confidence)
- KRs together would constitute evidence that the Objective was achieved
- No KR is a leading indicator that could be gamed without achieving the Objective
Feature-Level Measurement Plan
For each feature, define:
## Measurement Plan: [Feature Name]
**Measurement window:** [How long after launch before evaluating?]
**Minimum viable data:** [What volume of data is needed for meaningful conclusions?]
### Primary Success Metric
- Metric: [Name]
- Baseline: [Current value]
- Target: [Success threshold]
- Instrument: [How it's tracked — event name, query, dashboard]
### Leading Indicators (visible within 1-2 weeks)
| Metric | Baseline | Target | Instrument |
|--------|---------|--------|------------|
| [Metric] | [Current] | [Target] | [Tracking method] |
### Lagging Indicators (confirm success, 4-8 weeks)
| Metric | Baseline | Target | Instrument |
|--------|---------|--------|------------|
| [Metric] | [Current] | [Target] | [Tracking method] |
### Guardrail Metrics (must not degrade)
| Metric | Baseline | Threshold | Instrument |
|--------|---------|-----------|------------|
| [Metric] | [Current] | Must stay > [value] | [Tracking method] |
Statistical Significance for Feature Evaluation
When evaluating whether a metric moved meaningfully:
| Sample Size | Minimum Detectable Effect | Confidence Level |
|---|---|---|
| < 100 events | ≥ 20% relative change | Low — directional only |
| 100–500 events | ≥ 10% relative change | Moderate |
| 500–2,000 events | ≥ 5% relative change | Good |
| 2,000+ events | ≥ 2% relative change | High |
Rule: Don't declare success or failure until you have sufficient data. "We launched 3 days ago and conversion went up 2%" is noise, not signal.
A/B Test Significance
For A/B tests, use a minimum:
- 95% confidence level (p < 0.05) for shipping decisions
- 80% confidence level for directional learning only
- Run tests for at least 2 full weeks to account for weekly variation
Common Metric Mistakes
| Mistake | Effect | Fix |
|---|---|---|
| Optimizing for activity metrics (page views, clicks) | Looks good; may not reflect value | Replace with outcome metrics |
| No baseline documented | Can't evaluate if the metric moved | Always record baseline before launch |
| Measuring too early | Noise looks like signal | Wait for minimum viable data volume |
| No guardrail metrics | Win on primary while breaking something else | Define guardrails before launch |
| Single metric evaluation | Optimization for one metric degrades others | Multi-metric evaluation with guardrails |
Best Practices
- Define metrics before building, not after — post-hoc metric selection finds the metric that makes the feature look good
- Instrument before launch — if tracking isn't in place on day 1, early data is lost
- Distinguish "did it move" from "did we cause it" — correlation vs. causation matters; control for concurrent changes
- Publish baselines — metrics without baselines can be selectively interpreted
- Review metrics quarterly — a metric that made sense 18 months ago may be measuring the wrong thing now