
From the conversion glossary
Concepts referenced in this article, defined.

Concepts referenced in this article, defined.
Run rigorous A/B tests and personalize every visit on Shopify or any storefront — no engineers required.
PIE and ICE are scoring frameworks that help CRO teams prioritize which tests to run first. Both are better than gut instinct or the CEO's preference. Neither is perfect. The real value of either framework isn't the exact scores—it's the discipline of structured thinking that forces you to articulate why a test is worth running before you run it. This guide explains both frameworks, when each is better, and how to avoid the subjectivity problem that makes scoring unreliable.
PIE was developed by WiderFunnel and is widely used by ecommerce CRO teams. It scores test ideas on three dimensions:
How much room is there for improvement at this point in the funnel?
A page with a 4% conversion rate and clear UX problems has high potential. A page that's already highly optimized and performing near best-practice benchmarks has low potential.
Data signals for high Potential:
Scoring guide:
How much traffic does this page or step affect?
Testing your highest-traffic page is more important than testing a niche category page, all else equal. A 5% lift on 10,000 monthly visitors generates 500 additional conversions. The same 5% lift on 200 monthly visitors generates 10.
Data signals for high Importance:
Scoring guide:
How easy is this test to design, build, and launch?
A copy change on a product page is a 9. A complete checkout redesign that requires backend changes is a 2. Ease matters because an untested promising hypothesis is worth nothing—it needs to get live to generate learning.
Scoring guide:
PIE Score = (Potential + Importance + Ease) / 3
Run tests in descending PIE score order.
ICE was popularized by Sean Ellis (of growth hacking fame) and is common in product and growth teams. It scores:
How much will this change impact the key metric if it works?
Similar to PIE's Potential, but Impact often incorporates both the size of the drop-off and the likely magnitude of improvement. A small UX fix might have high Potential (big problem) but low Impact (the fix only helps marginally). Impact asks: if this test wins, how big is the win?
How confident are you this will work?
This is the key differentiator from PIE. Confidence scores your evidence quality:
High Confidence score: Customer research shows 40% of buyers cite the specific issue you're testing. Published case studies show similar tests winning in comparable contexts.
Low Confidence score: "I saw a good-looking competitor doing this" with no supporting data.
Same as PIE's Ease—how hard is implementation?
ICE Score = (Impact + Confidence + Ease) / 3
| Situation | Use PIE | Use ICE |
|---|---|---|
| Traffic varies a lot across pages | ✓ | |
| You want to reward high-traffic pages | ✓ | |
| Evidence quality varies significantly | ✓ | |
| Product team context (feature testing) | ✓ | |
| Pure CRO / ecommerce context | ✓ | |
| Startup with limited test data | ✓ | |
| Strong research culture | ✓ |
The practical summary:
Both frameworks suffer from the same weakness: scores are subjective. Two people scoring the same test idea will give different numbers. "This is a 7 for Potential" means different things to different people.
Without calibration, scoring becomes post-hoc justification—people score tests high that they already want to run.
How to reduce subjectivity:
Anchor scores to specific data: Potential of 8 = exit rate above 70% on this page. Importance of 9 = more than 5,000 sessions per month. Define what each score level means before scoring.
Score as a team: Have 2–3 people score each test independently, then discuss discrepancies. The discussion reveals assumptions and forces better rationale.
Review scoring retrospectively: After a test concludes, revisit your original scores. If you gave Potential a 9 but the test showed only 2% lift, you were wrong. Use this to recalibrate future scoring.
Separate scoring from politics: Scores should be documented before anyone knows what "result" would make leadership happy. If your CEO wants to test a new homepage hero and you know that's the expected answer, score it honestly before framing the roadmap conversation.
Some teams combine elements of PIE and ICE into a four-factor model. RICE (Reach, Impact, Confidence, Effort) is one variant:
RICE Score = (Reach × Impact × Confidence) / Effort
RICE is more precise because it uses actual numbers (visitors/month) rather than subjective scores for Reach. It's more complex to calculate but rewards rigor.
A mid-size D2C skincare brand scores 5 potential tests:
Test A: Mobile PDP headline rewrite
Test B: Checkout address autofill
Test C: Product page social proof section reorder
Test D: Homepage trust badges test
Test E: Email-to-landing-page personalization
Priority order: A (8.7) → D (8.0) → C (7.7) → B / E (tied at 6.3)
This ordering might surprise you—the homepage trust badge test ranks second despite lower Potential, because its Importance (highest traffic page) and Ease (simple visual element) compensate.
Once you've scored and prioritized your tests, you need to launch them efficiently. CustomFit.ai's no-code editor means Ease scores shift upward for most ecommerce tests—changes that would have been a 4 (requires development) become an 8 (visual editor handles it).
This matters because Ease scores affect prioritization. When development is removed from the equation, more high-Potential tests become feasible to run in parallel, and your overall test velocity increases.
Start testing your highest-PIE ideas with CustomFit.ai →