
From the conversion glossary
Concepts referenced in this article, defined.

Concepts referenced in this article, defined.
Run rigorous A/B tests and personalize every visit on Shopify or any storefront โ no engineers required.
A holdout group is a segment of users permanently excluded from all improvements during an experimentation period, allowing you to measure the true cumulative impact of your entire CRO program โ not just individual tests. While standard A/B tests measure whether one change is better than another in isolation, holdout groups answer a more important business question: "Is our entire experimentation program actually generating revenue?" For ecommerce brands running continuous optimization, holdout groups are the most honest proof of program value.
Run 20 A/B tests. Ship all 20 winners. Your conversion rate should be 20x better, right?
Not quite. The problem is interaction effects and cumulative drift.
Interaction effects: When you run multiple simultaneous tests, or ship multiple winners in sequence, the changes can interact with each other in unexpected ways. A new CTA button color that won in isolation might not work as well combined with a new homepage layout that also won in isolation โ because buyers' attention patterns have shifted.
Cumulative drift: Over time, your buyer population changes, competitive context changes, and seasonal patterns shift. A winner from January may not represent a true improvement by December. Individual tests don't capture this drift.
Attribution: If you shipped 20 changes and your conversion rate improved 8%, which changes drove the improvement? Some winners add no incremental value after the previous changes. Some changes that lost individually might perform better in combination with others.
A holdout group measures the real, net cumulative impact of everything you've shipped.
Setup:
What you're measuring: The difference between the holdout group's performance and the rest of your users is the true, cumulative impact of all your optimizations.
If the holdout group converts at 2.5% and optimized users convert at 2.9%, your CRO program has generated a genuine 0.4 percentage point improvement (16% relative lift) that wouldn't exist without the program.
Step 1: Define the holdout period 30โ90 days is typical. Shorter periods may not capture enough data; longer periods mean withholding improvements from users for an extended time.
Step 2: Size the holdout group correctly Use a power analysis to determine the minimum holdout group size. Rules of thumb:
Step 3: Ensure consistent assignment Users in the holdout group should be consistently assigned. A user in the holdout on Monday should still be in the holdout on Friday. This requires cookie-based or user-ID-based assignment, not session-based assignment.
Step 4: Freeze the holdout group's experience While your primary users receive all new features and test winners, holdout group users see the baseline experience. This requires either:
Step 5: Monitor health metrics Check that holdout users have similar characteristics to your general population (traffic source mix, device type, new vs. returning ratio). If the holdout group skews differently, results will be misleading.
Step 6: Analyze at end of period Compare primary metrics (CVR, AOV, revenue per visitor) between holdout group and rest-of-users. Also compare secondary metrics to ensure no unintended effects.
Proving CRO program ROI: If leadership is questioning whether your experimentation program generates real value, a holdout group is the definitive answer. "Our CRO program generated โนX in incremental annual revenue, proven by comparing optimized users vs. holdout users" is a compelling business case.
Measuring personalization impact: Personalization programs โ showing different content to different segments โ are hard to measure with standard A/B tests. A holdout group that sees no personalization, compared to the personalized majority, shows the true lift from personalization.
This is particularly relevant for Indian D2C brands using tools like CustomFit.ai for personalization. The platform's holdout functionality lets you prove the ROI of your personalization program.
Evaluating compounding optimization: When you run 30+ tests per year and ship most winners, holdout groups let you understand whether improvements are truly compounding or whether later changes are cannibalizing earlier wins.
Before major platform changes: If you're considering switching CRO tools or making major technical changes to your site, establish a holdout baseline first. You can then compare post-change performance to the holdout to isolate the impact of the platform change from natural conversion trends.
| Feature | Control Group | Holdout Group |
|---|---|---|
| Exists during | A/B test period | Extended period post-ship |
| Purpose | Compare test variants | Measure cumulative program impact |
| Duration | Test duration (2โ4 weeks) | 30โ90+ days |
| What they receive | Original version of one element | Original site experience (all elements) |
| When dissolved | When test ends | At end of holdout period |
| Traffic allocation | 50% (standard A/B test) | 5โ10% |
The key insight: you can run A/B tests AND maintain holdout groups simultaneously. Your 5% holdout group never sees any winning variant. Your remaining 95% participate in normal A/B testing.
Challenge: Users don't want to be in the holdout You're withholding improvements from 5โ10% of your buyers. These users receive an inferior experience for 30โ90 days.
How to handle: Accept this as a cost of measurement. The cost (slightly worse experience for a small segment for a limited period) is worth the value of knowing whether your program generates real returns. Critically, dissolve the holdout group after the measurement period.
Challenge: The holdout group is too small to detect differences If your store has low traffic, a 5% holdout group may not achieve statistical significance in 30 days.
How to handle: Extend the holdout period or increase holdout percentage temporarily. Accept that holdout groups are more suited to medium/high traffic stores. For very small stores, focus on individual test significance rather than holdout measurement.
Challenge: Holdout users see stale content during seasonal events If you run a holdout during Diwali, the holdout group sees pre-Diwali content while the rest see your Diwali campaign. This creates an unintended experiment within your experiment.
How to handle: Avoid running holdouts through major seasonal events, or design holdout groups that see seasonal campaign updates but not conversion optimization changes. Be explicit about what the holdout does and doesn't include.
Challenge: Tracking holdout group membership over time Users who clear cookies or switch devices may exit and re-enter the holdout group inconsistently.
How to handle: Use user-ID-based holdout assignment where possible (requires login). If using anonymous tracking, accept some noise in holdout group membership and correct for it in analysis.
Positive result (optimized > holdout): Your program is working. The delta is the measurable value of your CRO investment. Calculate the annualized revenue impact: delta in CVR ร annual traffic ร average order value.
Null result (no significant difference): Your optimization program hasn't generated cumulative improvement. This is valuable information โ investigate whether: (a) individual test wins were real, (b) negative interactions between changes are canceling out wins, or (c) external factors are masking improvement.
Negative result (holdout > optimized): Some combination of your shipped changes has hurt overall performance. This is the most actionable result โ it tells you to review recent changes, identify which may be causing harm, and run reverse experiments to diagnose.
Build holdout group measurement into your annual CRO planning and quarterly CRO reviews. It transforms your CRO reporting from "here are the tests we ran and the individual results" to "here is the business value our program generated."
The cumulative lift number from holdout group analysis becomes the definitive measure of CRO program ROI โ more honest than aggregating individual test wins.