name: configuring-experiment-analytics
description: Configures the analytics side of a PostHog experiment — exposure criteria (default $feature_flag_called vs custom exposure events), primary and secondary metrics, the supported metric types (count, sum, ratio with math and math_property, retention with retention_window_start and start_handling), multivariate user handling ("Exclude" vs "First seen variant"), and how to read results once the experiment is live. Use when the user adds or edits a primary or secondary metric (e.g. "add a secondary metric tracking 'downloaded_file' per user"), sets up a ratio metric (e.g. "revenue from purchase_completed / pageviews"), sets up a retention metric (e.g. "$pageview → uploaded_file, 7-day window"), configures custom exposure (e.g. "only count users who hit /checkout"), changes multivariate handling, or asks "who is in the analysis?", "how do I measure impact?", "is this winning?", "what's the confidence level?", or "should I ship?".
Configuring experiment analytics
This skill answers: Who is included in the analysis? and How to measure impact?
Exposure criteria
Exposure criteria determine which users are counted in the experiment analysis.
Include people when
Two options:
- Feature flag called (default) — users are included when the
$feature_flag_calledevent fires for the experiment's flag. This is the standard approach — it means a user is included only when they actually encounter the feature flag in your code. - Custom exposure event — users are included when a specific custom event fires. Use this when you want tighter control over who enters the analysis (e.g., only users who actually visit the page where the experiment runs).
Multiple variant handling
When a user is exposed to multiple variants (e.g., due to flag changes or race conditions):
- Exclude multivariate users — removes these users from the analysis entirely. Cleaner data, smaller sample.
- First seen variant — assigns users to the first variant they were exposed to. Keeps all users in the analysis. Note that "first seen" can introduce other biases as behavior cannot be clearly attributed to a single variant and is not recommended unless necessary.
Bias risk on uneven splits. "Exclude multivariate users" combined with an uneven variant split can introduce bias — multi-variant users are dropped asymmetrically and the smaller variant loses a larger fraction of its assignments. If those users behave differently from the rest, the smaller variant's metrics will be skewed.
The right mitigation depends on experiment state:
- Not yet launched, or only exposed to a few users so far — switch to an even variant split and
use the overall rollout percentage to limit test-variant exposure. This removes the bias and
preserves statistical power. See
configuring-experiment-rollout. - Live experiment with significant exposures — changing the split mid-run reassigns users across variants, which is bad for user experience and data quality. Switch this setting to "First seen variant" instead — it keeps already-assigned users in their original variant (no reassignment) and removes the asymmetric exclusion.
Filter test accounts
exposure_criteria.filterTestAccounts (default: true) — excludes internal/test users from the analysis.
Resolving experiments
Metric changes require an experiment ID. If the user refers to an experiment by name
or description (e.g. "add metrics to the checkout test"), load the finding-experiments
skill to resolve it to a concrete ID before proceeding.
Metrics
A metric reaches an experiment one of two ways, both via experiment-update:
- Inline metric — defined directly on the experiment. Sent in the
metricsarray, which replaces the entire inline list, so always get the current experiment first viaexperiment-getto preserve existing metrics. - Shared (saved) metric — a reusable metric object that can be attached to many experiments.
Attached by ID via
saved_metrics_ids(this list also replaces the experiment's existing saved-metric links, so resend the full set — see Step 1).
Prefer reusing a shared metric over duplicating it inline. Build a new inline metric only when no suitable shared metric already exists.
Step 1: Check for an existing shared metric (REQUIRED — always do this first)
Before building any new inline metric, you MUST call experiment-saved-metrics-list to see what
reusable metrics already exist in the project. Do NOT skip this step — duplicating a metric that is
already set up as a shared metric fragments measurement and is exactly what we want to avoid.
Workflow:
- Call
experiment-saved-metrics-list(passsearchto resolve by name, e.g.{ "search": "checkout conversion" }) and scan the results (id,name,description,query) for one that matches the user's intent. Results are paginated — if you're browsing without asearchterm and the project has many, page through withlimit/offsetbefore concluding nothing matches. - If a shared metric clearly matches — confirm the match with the user by name/description,
then attach it instead of building a new one:
- Call
experiment-getto read the experiment's currentsaved_metrics. - Call
experiment-updatewithsaved_metrics_idsset to the full desired set — it replaces existing links, so include the already-attached ones plus the new entry. Each entry has shape{ "id": <saved-metric id>, "metadata": { "type": "primary" } }— settypeto"primary"or"secondary".metadatais optional and defaults to primary. - Watch the id when rebuilding the set: each item in the
saved_metricsyou just read has a top-levelid(the link id) AND asaved_metricfield (the metric id).saved_metrics_idswants thesaved_metricvalue, not the linkid— sending the linkidattaches the wrong metric or fails validation. - You do not need to discover events (Step 2) — the shared metric already encodes them.
- Call
- If the list is empty or nothing matches — fall through to Step 2 and build an inline metric.
When a new inline metric you're about to build is likely to be reused across experiments, offer to
create it as a shared metric instead, via experiment-saved-metrics-create, then attach it as above.
Step 2: Discover available events (REQUIRED before building an inline metric)
Before suggesting or building any new inline metric, you MUST call read-data-schema to discover
what events actually exist in the project. Do NOT skip this step. Do NOT suggest event names
based on what you think the project might track — only use events you have confirmed exist.
(Attaching an existing shared metric from Step 1 does not need this — it already encodes its events.)
This applies even when:
- The user provides event names — look them up to confirm they exist and are spelled correctly
- The user asks "what metrics do you suggest?" — look up events first, then suggest from real data
- The context makes certain events seem obvious — they may not exist or may be named differently
Workflow:
- Call
read-data-schemato get the project's events - Present relevant events to the user based on the experiment's hypothesis
- User picks which events to use for metrics
- Configure metrics with those confirmed event names
Legitimate exception — allow_unknown_events: true:
Pass this on experiment-create / experiment-update only when the user is intentionally instrumenting an event that hasn't been ingested yet (e.g. setting up the experiment before the code change ships). Confirm this with the user — never use it as a workaround for "the event lookup didn't return what I expected".
Example:
User: "Let's add some metrics for the checkout experiment"
WRONG: "I'd suggest using purchase_completed as the primary metric..."
(hallucinated event name — never seen the project's actual events)
RIGHT: *calls read-data-schema* → "Here are the events in your project
related to checkout: `checkout_step_completed`, `payment_processed`,
`order_confirmed`. Which of these represents a successful checkout?"
Step 3: Choose metric type
There are four metric types. Each has kind: "ExperimentMetric":
| metric_type | When to use | Required fields |
|---|---|---|
"mean" |
Average of a numeric property per user (revenue, session duration, pageviews per user) | source |
"funnel" |
Conversion rate from exposure through one or more ordered actions | series (1 or more steps) |
"ratio" |
Rate of one event relative to another | numerator, denominator — set math: "sum" + math_property on a side to aggregate a property; filters never aggregate |
"retention" |
Do users come back after exposure? | start_event, completion_event, retention_window_start, retention_window_end, retention_window_unit, start_handling |
Funnel metrics and the implicit exposure step
Funnel metrics automatically prepend the experiment's exposure event as step_0.
So a funnel with 1 step in series is a valid 2-step funnel: exposure → action.
This is the correct choice for measuring "what percentage of exposed users did X?"
Examples:
- "What % of exposed users reached /login?" → funnel with 1 step (
$pageviewfiltered to /login) - "What % of exposed users completed checkout?" → funnel with 1 step (
checkout_completed) - "What % of exposed users went cart → checkout → purchase?" → funnel with 3 steps
Mean vs funnel for the same event
- Mean measures average count/value per user (e.g. "pageviews per user", "revenue per user").
- Funnel measures conversion rate (e.g. "% of exposed users who purchased").
Both can reference the same event — the difference is whether you care about count/magnitude (mean) or yes/no conversion (funnel).
Retention: same vs different start/completion event
The retention window is measured from the start event, so the events you pick decide what's measured: The start occurrence never counts as its own completion (only a distinct later event does), so both shapes are valid:
- Different start and completion events → conversion-style retention ("did they reach the target action within the window?").
- Same event → repeat retention ("did they fire it again?").
From 0counts a repeat from the same period onward (same-day repeats included);From ≥ 1requires an occurrence later. Usestart_handling: "first_seen". When a user says "retention of<event>" they usually mean repeat retention.
See references/metric-configuration.md for the full rendered ExperimentMetric schema (all four metric types, with required fields per type) plus WRONG/RIGHT JSON pairs for the failure modes that come up most often (ratio with is_set filter instead of math: "sum" + math_property; retention without retention_window_start / start_handling). Read it before assembling a ratio or retention payload — the required fields are authoritative.
Step 4: Primary vs secondary
- Primary metrics — the main success criteria for the experiment. These drive the ship/end decision.
- Secondary metrics — additional measurements for context. Useful for guardrail metrics (e.g., ensuring a conversion improvement doesn't increase error rates).
Interpreting results
See references/interpreting-results.md for guidance on reading experiment results, statistical significance, and when to ship vs end.