Growth Experiment Prioritization Template
ICE scoring with calibrated anchors so two people score the same idea the same way, plus 20 worked examples from real devtools growth backlogs, ranked by composite. Copy the process, swap in your hypotheses, cap the list at your actual bandwidth.
Format · ICE template · 20 worked examples · Free, no gate
Impact
If this works, how much does it move the one metric we are paid to move this quarter (activation, signups, conversion, retention)?
Cosmetic; nobody downstream notices the difference in any dashboard.
Helps a narrow segment (one persona, one channel); single-digit percent lift at best.
Moves a secondary metric clearly, or nudges the primary metric a few points.
Meaningfully shifts the primary metric for a large share of users or accounts.
Step-change on the primary metric; the kind of result you put in the board deck.
Confidence
How sure are we this will work, based on data, prior tests, and analogues, rather than hope?
Pure hunch; no data, no analogue, no user signal supporting it.
One anecdote or a single support ticket; mostly opinion.
Qualitative signal plus a plausible mechanism, but no hard numbers yet.
Backed by our own funnel data or a directly comparable past experiment.
Proven elsewhere on this exact funnel, or a near-certain instrumentation or bug fix.
Ease
How little effort, scope, and cross-team coordination does it take to ship and measure this?
Multi-sprint build; needs engineering, design, and legal sign-off.
One to two weeks of focused work, plus a dependency on another team.
A few days of work owned mostly within the growth or marketing team.
One day of work; copy, config, or a small self-serve content change.
An afternoon; a settings toggle, a copy swap, or scheduling an existing asset.
20 Worked Examples, Ranked by Composite
Composite is the average of the three scores. Note how often a boring, easy fix outranks the exciting build.
If we rewrite the quickstart so the first code block returns a successful API response in under five minutes, then activation will rise because most drop-off happens before the first successful call.
First-call success is our top activation lever and funnel data shows the cliff, but a clean rewrite plus testing takes real days.
If we add a clear 'Get a free API key' call to action to the top of our open-source README, then signups from GitHub rise because high-intent visitors currently leave without a conversion path.
README traffic is high-intent and the fix is a one-line content change, though absolute signup volume from this surface is modest.
If we send a day-two activation email triggered when a user creates a key but sends zero events, then activation improves because we re-engage accounts at the exact moment they stall.
Behavior-triggered emails are a proven activation lever and the trigger is easy to wire on existing events.
If we add a transparent free-tier limit table next to the plans, then signup-to-trial rises because developers stop bouncing over fear of a surprise bill.
Free-tier transparency is a known conversion booster for developer products and it is a simple page edit.
If we publish a comparison page targeting 'us vs the incumbent for X use case', then organic high-intent traffic rises because those queries currently send buyers straight to competitors.
Bottom-funnel comparison pages convert well and we can write one fast, but SEO ranking takes weeks to confirm impact.
If we ship an empty-state checklist inside the dashboard (create a key, send a test event, invite a teammate), then week-one retention rises because new accounts get a guided path to value.
Checklists reliably lift activation in SaaS analogues; scoped product work with design, so moderate ease.
If we add copy-paste snippets for the five most common languages to every API reference endpoint, then time-to-first-call drops because developers stop translating curl examples by hand.
Multi-language snippets remove real friction and support tickets back it, but generating and testing them across endpoints is steady work.
If we pre-provision a working demo project and sample data on signup, then activation rises because users explore a populated product instead of facing a blank account.
Seeded demo data reliably reduces blank-state churn, but provisioning it cleanly per account is a moderate product task.
If we turn each changelog entry into a short shareable post with a code snippet and screenshot, then product-led top-of-funnel grows because shipped features become repeatable content.
Cheap to operationalize from work we already do, but each entry drives only incremental reach.
If we run a small search campaign on three high-intent competitor and integration keywords, then qualified trial signups rise because those searchers are mid-evaluation.
High-intent paid can convert, but for a developer audience CTRs and CAC are uncertain at our spend, so confidence is middling.
If we add a weekly usage digest showing events processed and errors caught, then paid retention improves because the recurring email reinforces ongoing value to the buyer.
Recurring value reminders help retention, but the effect is gradual and depends on the digest staying genuinely useful.
If we add a persistent search bar with instant results across docs and API reference, then activation and retention improve because developers self-serve answers instead of filing tickets or churning.
Better docs search measurably reduces support load and stuck users, but integrating and tuning a search index takes real work.
If we add an interactive in-browser sandbox to the docs so developers run our SDK without installing anything, then trial-to-activation improves because we remove the local setup barrier.
High potential ceiling, but it is a real engineering build and we have no internal data proving sandboxes lift activation for us yet.
If we add a usage-based pricing calculator to the pricing page, then free-to-paid conversion improves because developers can self-estimate cost before they ever talk to sales.
Pricing clarity removes a known objection, but it needs accurate metering logic and design, and we lack a prior test on it.
If we run weekly Discord office hours where an engineer debugs live, then community-sourced activation rises because stuck users get unblocked instead of churning silently.
Plausible retention and word-of-mouth upside, but the link to a hard metric is fuzzy and depends on attendance.
If we surface a real-time 'requests in the last hour' counter on the dashboard home, then retention improves because users see the product working and feel the value of staying.
A visible value moment is a reasonable retention bet, but it is a clear product build with no prior result to lean on.
If we publish an original benchmark comparing latency across our SDK and three alternatives, then referral and citation traffic rises because original data attracts links and AI-overview mentions.
Original research earns links and credibility, but running a defensible benchmark takes engineering time and the payoff compounds slowly.
If we ship a 'Powered by us' badge that users can embed, then referral signups rise because happy developers passively advertise us in their own projects.
Cheap to build, but adoption and click-through on developer badges are historically low, capping both impact and confidence.
If we replace the generic signup form with a role question (backend, ML, platform) and branch the first-run experience, then activation rises because each persona sees the relevant quickstart first.
Personalization could help, but branching adds product complexity and we have weak evidence the segments diverge enough.
If we get listed in two adjacent platform integration marketplaces, then top-of-funnel signups rise because we reach developers already in a buying context on those platforms.
Marketplace placement can drive durable qualified traffic, but approval timelines and required engineering work make it slow and uncertain.
The Process That Makes the Scores Matter
Backlog intake
Capture every idea as a one-line 'If we X, then Y because Z' hypothesis in a single backlog. No idea gets discussed or scored until it is written in that form, which forces a metric and a mechanism up front.
Weekly scoring ritual
Once a week the growth team scores new hypotheses on Impact, Confidence, and Ease (1 to 10) using the shared anchors, then re-ranks the whole backlog by composite (the average of the three). Score live together so anchors stay calibrated and outliers get debated, not buried.
Bandwidth cap
Pull from the top of the ranked list only as many experiments as the team can actually run this week (typically two to three). Ranking is meaningless without a cap; the cap is what turns the list into a plan.
Run rules
Each experiment names one primary metric, one owner, and a minimum sample or duration agreed before launch (for example, 500 exposed users or two full weeks). One metric per experiment keeps the read clean; the pre-agreed sample stops you from calling a result early.
Retro loop
When an experiment closes, log the outcome (win, loss, or inconclusive) against its original scores and feed the learning back into Confidence anchors. Over a few cycles the retro loop is what makes future Confidence scores honest instead of optimistic.
Score interactively with the free A/B Test Calculator for sample sizes, or let the Growth Engine retainer run the whole experimentation cadence for you.
Ready when you are.
Discovery calls are 20 minutes. First one's on me.