How Long Should You Run an A/B Test?

Run your A/B test long enough to collect a statistically valid sample — typically a minimum of 7 days and until you reach your pre-calculated sample size. Stopping early because one variant looks like it's winning is the single most common mistake in A/B testing. The answer depends on your traffic volume, baseline conversion rate, and the minimum detectable effect you care about.

Most D2C brands in India make this mistake constantly: they see Variant B performing 20% better after two days and declare victory — only to watch conversions revert after rolling out the change. This guide gives you the exact framework to know when your test is truly done.

Why Test Duration Matters More Than Statistical Significance

Statistical significance is a probability, not a certainty. At 95% confidence, you're accepting a 1-in-20 chance that your result is random noise. The earlier you stop, the worse this gets.

The peeking problem: Every time you check results and consider stopping, you're running an implicit hypothesis test. If you check daily for two weeks, you've run 14 implicit tests — not one. This dramatically inflates your false-positive rate.

Day-of-week effects: Consumer behavior on weekdays differs from weekends. Indian shoppers buying beauty products on Nykaa or Plum's website behave differently on Saturday evenings versus Tuesday mornings. A test running only 3 days may capture only weekday behavior.

Novelty effects: When you change something on your site, some visitors click it simply because it's new. This inflates early results. Running a test for at least one full business cycle smooths this out.

Seasonality and campaigns: A test launched during a sale or festive season (Diwali, Holi, Raksha Bandhan) captures atypical behavior. Either avoid launching tests during major promotions or explicitly account for this in your analysis.

The Right Formula: How to Calculate Test Duration

Peeking

Use this process before launching any test:

Step 1: Establish your baseline conversion rate Pull your current conversion rate for the specific goal you're testing. If you're testing a product page add-to-cart button, your baseline might be 4.2%. Use at least 30 days of historical data.

Step 2: Define your Minimum Detectable Effect (MDE) This is the smallest improvement worth detecting. If you need at least a 15% relative lift (e.g., 4.2% → 4.83%) to justify the effort, set that as your MDE. Smaller MDEs require larger samples.

Step 3: Set your statistical parameters

Confidence level: 95% (standard)
Statistical power: 80% (standard) — this means an 80% chance of detecting a real effect

Step 4: Calculate required sample size For a baseline CVR of 4%, MDE of 15%, 95% confidence, 80% power:

Required visitors per variant: ~5,000
Total visitors needed: ~10,000

Step 5: Divide by daily traffic If your page gets 500 visitors/day, you need 20 days minimum. Round up to the nearest full week — so 3 weeks.

Quick reference table:

Daily Visitors	Baseline CVR	MDE	Test Duration
200	3%	20%	6–8 weeks
500	4%	15%	3–4 weeks
1,000	5%	10%	3 weeks
2,000	5%	10%	1–2 weeks
5,000+	5%	10%	7–10 days

Minimum Duration Rules: Always Apply These

Regardless of what your sample size calculator says, follow these non-negotiable minimums:

1. Always run for at least 7 full days This captures at least one complete weekly cycle. A Kapiva ayurvedic supplement brand, for example, sees very different conversion patterns on weekdays (research intent) versus weekends (purchase intent).

2. Run through at least one full business cycle If your brand runs weekly email campaigns, run the test long enough to include two send cycles. If you do UPI cashback promotions every fortnight, include both cycles.

3. Don't run longer than 4–6 weeks Extended tests get contaminated by seasonality shifts, competitor actions, and user learning effects. If you can't reach significance in 6 weeks, either increase traffic to the test (via paid promotion) or increase your MDE.

4. Pre-commit to your duration before launch Write it down. "This test runs from March 1–21 and will be evaluated on March 22, regardless of interim results." This prevents peeking-induced bias.

Common Mistakes Indian D2C Brands Make with Test Duration

Sample size

Stopping during a sale spike: Mamaearth or mCaffeine brands often run tests during their sale events and declare winners based on inflated sale-period CVRs. The winner often fails after the sale ends because it was optimized for a different customer cohort.

Ignoring COD vs prepaid split: Indian ecommerce has a unique COD (cash on delivery) behavior. COD customers have different purchase patterns and return rates. If your test shifts the COD/prepaid ratio, your CVR lift may be artificial.

Testing on too-narrow segments: Running a test only on mobile visitors but applying results to all devices is a common error. Always segment your results by device and validate before full rollout.

Confusing sessions with visitors: Some analytics tools report sessions, not unique visitors. A single visitor might have 3 sessions. Use unique visitors for sample size calculations.

How CustomFit.ai Handles Test Duration

CustomFit.ai runs on your Shopify store and includes a built-in sample size calculator that tells you exactly how long to run each test before you launch. The platform:

Flags if you're about to stop a test prematurely
Shows confidence intervals, not just a "winning" label
Automatically pauses tests when reaching the pre-set sample size
Sends alerts when tests hit 95% confidence

This is especially useful for brands like Bellavita, which achieved an 11% CVR improvement — those results came from tests that ran to statistical completion, not from early winners being called.

Tips / Best Practices

Use a sample size calculator every single time — don't guess. Tools like Evan Miller's calculator or CustomFit.ai's built-in tool take 2 minutes.
Write your test plan before launch — include start date, end date, sample size target, and what "winning" means in absolute numbers, not just percentages.
Never peek at results and adjust the duration — if you extend a test because the current variant is losing, you've invalidated the test.
Run one full festive cycle minimum for seasonal businesses — for brands selling during Diwali, Holi, or Valentine's Day, test during the season and validate outside it too.
Split traffic 50/50 unless you have strong reasons not to — unequal splits require larger total sample sizes and increase test duration.
Document novelty effects — for major redesigns, watch your data for a novelty spike in the first 3–5 days and weight later data more heavily.
Validate on a holdout group — after declaring a winner, roll out to 80% of traffic and keep 20% on control for 1 week to confirm the lift holds.

Key Takeaways

Never stop an A/B test just because one variant looks like it's winning — always complete your pre-determined sample size
Run a minimum of 7 days to capture weekly traffic variation, even on high-traffic sites
Calculate required sample size before launching using your baseline CVR, MDE, confidence level, and power
Indian D2C brands must account for COD behavior, festive season effects, and weekly promotional cycles when setting test duration
Cap tests at 4–6 weeks maximum to avoid seasonal contamination
Pre-commit to your test end date before launch to eliminate peeking bias