Sequential Testing in A/B Testing

Sequential testing is a statistical approach that allows you to monitor A/B test results continuously and make valid decisions before reaching a fixed sample size — without inflating false positive rates. Unlike standard fixed-horizon testing (which requires you to pre-specify a sample size and not peek at results), sequential testing adjusts significance thresholds as data accumulates, enabling earlier decisions on clear winners or losers. It's particularly valuable for time-sensitive ecommerce tests and low-traffic stores where reaching fixed sample sizes takes too long.

The Problem with "Peeking" at Test Results

The most common A/B testing mistake is checking results before the test reaches its target sample size — and stopping the test when you see a significant result. This is called "peeking" and it dramatically inflates false positive rates.

Here's why: if you check a test every day for 30 days and stop whenever you see p < 0.05, your actual false positive rate is much higher than 5%. You're essentially running 30 hypothesis tests (one per day) but only reporting the favorable one.

Studies show that peeking and stopping early can inflate false positive rates to 25–30% — meaning one in three "winning" variants is actually no better than the control.

The conventional solution: Don't peek. Pre-specify your sample size, collect all the data, then analyze once. This is statistically rigorous but often impractical:

A test might take 3 months to reach the pre-specified sample size
A clearly terrible variant continues running, hurting conversion
Business urgency (upcoming festive campaign, product launch) demands faster decisions

Sequential testing provides a statistically valid alternative.

How Sequential Testing Works

Sequential testing uses methods that adjust the significance threshold over time to account for multiple looks at the data. The key approaches:

Sequential Probability Ratio Test (SPRT)

The original sequential testing method developed by Abraham Wald in 1945. At each observation (or batch of observations), it calculates a likelihood ratio that determines whether to:

Conclude the variant is better
Conclude the variant is worse
Continue collecting data (the result is still uncertain)

SPRT is theoretically elegant but sensitive to distributional assumptions and pre-specified effect sizes.

mSPRT (Mixture Sequential Probability Ratio Test)

A modern enhancement of SPRT that mixes over possible effect sizes rather than requiring a single pre-specified effect size. mSPRT is used by companies like Airbnb, Booking.com, and Netflix for continuous monitoring of their A/B tests.

The math: rather than testing against one specific alternative (e.g., "the variant is exactly 10% better"), mSPRT tests against a weighted mixture of possible effect sizes — making it more robust to uncertainty about the expected effect.

Bayesian Sequential Testing

Bayesian approaches don't use p-values at all. Instead, they calculate the probability that variant B is better than variant A — expressed as "there's a 83% chance B is better." This probability updates continuously as data accumulates.

Bayesian sequential testing has no fixed-horizon requirement by design. You can check results at any time and get a meaningful, interpretable probability statement — though you should still define decision thresholds (e.g., "stop when we're 95% confident B is better, or 95% confident it's not").

CustomFit.ai's statistical engine uses Bayesian methods that allow continuous monitoring without peeking inflation — well-suited to the smaller sample sizes of typical D2C ecommerce stores.

Sequential Testing vs Fixed-Horizon Testing

Factor	Fixed-Horizon	Sequential Testing
Sample size	Pre-specified	Flexible, test ends when conclusion reached
Can peek at results?	No (inflates false positives)	Yes (by design)
Stops early?	Never	Yes, when a conclusion is reached
Statistical guarantee	Type I error rate = α	Type I error rate ≤ α across all peeks
Complexity	Simpler	More complex (requires correct method)
Best for	Planned experiments with fixed timelines	Ongoing experimentation, time-sensitive tests

When Sequential Testing Adds Value for Ecommerce

Festive season campaigns: During Diwali, Republic Day, or Holi campaigns, you may only have 7–10 days to run a test before the campaign ends. Sequential testing allows you to identify a clear winner faster — or stop a clearly losing variant before it costs you more festive traffic.

Product launches: A new product page launching on a specific date needs to be optimized quickly. Sequential testing's early-stopping capability lets you identify the better variant within the launch window.

Budget-constrained traffic: If you're spending ₹50,000/day on paid traffic to test a new landing page, running a clear loser for 4 more weeks at $5 significance costs ₹14 lakh in wasted spend. Sequential testing stops the loser faster.

Low-traffic stores: Sequential methods that allow earlier decisions (when the probability threshold is met rather than a fixed sample size) are particularly useful for stores where reaching a pre-specified fixed sample takes months.

Implementing Sequential Testing

Most major testing platforms are moving toward always-valid inference:

Statsig: Uses sequential testing methods by default — you can check results at any time without inflating false positives.

Optimizely: Stats Accelerator uses adaptive sampling and early stopping with false positive controls.

VWO: Bayesian engine allows continuous monitoring without fixed-horizon requirements.

CustomFit.ai: Uses Bayesian methods that provide always-valid probability statements.

For teams running analysis in Python or R, libraries like always_valid (Python) and the OptimalDesign package (R) implement mSPRT and related methods.

Common Mistakes with Sequential Testing

Using sequential testing as an excuse to peek without controls. Sequential testing is not permission to peek at standard (fixed-horizon) A/B test results. If your testing tool uses fixed-horizon statistics, peeking still inflates false positives regardless of how you frame it. Sequential testing requires a correctly implemented sequential method — not just checking results more often.

Confusing "95% probability B is better" with 95% confidence. Bayesian probability statements are not the same as frequentist confidence levels. A "90% probability B is better" from a Bayesian sequential test doesn't mean the result would be significant at the 10% level in a frequentist test. Interpret each approach on its own terms.

Stopping too early without a decision threshold. Sequential testing still requires pre-specified decision thresholds — "we'll call a winner when we're 95% confident" or "we'll stop at 10,000 visitors if no clear winner emerges." Without thresholds, teams stop tests based on impatience rather than statistical logic.

Tips and Best Practices

Use sequential testing for operational flexibility, not as a shortcut. The value of sequential testing is that it allows valid decisions when you genuinely need to make them early — not that it makes tests faster by default. Many sequential tests will still run to full sample.

Pair sequential testing with minimum test duration rules. Even with sequential methods, respect a minimum run time of 7–14 days to capture weekly behavioral variation. A test that reaches significance in 3 days may be capturing a novelty effect or weekend traffic anomaly, not a real behavioral difference.

Understand the guarantee your method provides. Different sequential testing methods make different guarantees. mSPRT guarantees false positive rate control across all sample sizes. Bayesian methods provide probability statements that require interpretation. Know what your tool is calculating before you act on it.

Document your sequential testing decision thresholds before the test starts. Pre-register your decision rules: "We'll declare a winner at 95% posterior probability B is better. We'll stop the test at 30,000 visitors if no conclusion is reached." This prevents motivated reasoning from driving early termination.

Key Takeaways

Sequential testing allows continuous monitoring of A/B test results without inflating false positive rates — unlike peeking at standard fixed-horizon tests.
The most common implementations are mSPRT (frequentist) and Bayesian sequential testing — both provide always-valid inference.
Sequential testing is most valuable for time-sensitive ecommerce tests (festive campaigns, product launches) and for stopping clearly losing variants faster.
Peeking at standard A/B test results without sequential methods inflates false positive rates to 25–30% — don't do it.
CustomFit.ai uses Bayesian statistical methods that allow continuous monitoring without fixed-sample-size requirements.
Always set decision thresholds before the test starts — sequential testing doesn't mean stopping whenever the number looks good.

From the conversion glossary

Concepts referenced in this article, defined.

Definition

What Is Sequential Testing? Definition & Guide

Definition

What Is False Positive? Definition & Guide

Definition

What Is Variant? Definition, Formula & Guide

Definition

What Is Sample Size? Definition & Guide

Definition

What Is Significance? Definition, Formula & Guide

← Back to Ab Testing guide

Sequential Testing in A/B Testing

The Problem with "Peeking" at Test Results

How Sequential Testing Works

Sequential Probability Ratio Test (SPRT)

mSPRT (Mixture Sequential Probability Ratio Test)

Bayesian Sequential Testing

Sequential Testing vs Fixed-Horizon Testing

When Sequential Testing Adds Value for Ecommerce

Implementing Sequential Testing

Common Mistakes with Sequential Testing

Tips and Best Practices

Key Takeaways

From the conversion glossary

Start lifting conversions today.

Built for every D2C category

The Problem with "Peeking" at Test Results

How Sequential Testing Works

Sequential Probability Ratio Test (SPRT)

mSPRT (Mixture Sequential Probability Ratio Test)

Bayesian Sequential Testing

Sequential Testing vs Fixed-Horizon Testing

When Sequential Testing Adds Value for Ecommerce

Implementing Sequential Testing

Common Mistakes with Sequential Testing

Tips and Best Practices

Key Takeaways

Sequential Testing in A/B Testing

The Problem with "Peeking" at Test Results

How Sequential Testing Works

Sequential Probability Ratio Test (SPRT)

mSPRT (Mixture Sequential Probability Ratio Test)

Bayesian Sequential Testing

Sequential Testing vs Fixed-Horizon Testing

When Sequential Testing Adds Value for Ecommerce

Implementing Sequential Testing

Common Mistakes with Sequential Testing

Tips and Best Practices

Key Takeaways

From the conversion glossary

Related articles

Statistical Significance in A/B Testing: A Plain-English Guide

How A/B Testing Works: Step-by-Step Explained

A/B Testing vs Split Testing: What's the Difference?

Start lifting conversions today.

Built for every D2C category

The Problem with "Peeking" at Test Results

How Sequential Testing Works

Sequential Probability Ratio Test (SPRT)

mSPRT (Mixture Sequential Probability Ratio Test)

Bayesian Sequential Testing

Sequential Testing vs Fixed-Horizon Testing

When Sequential Testing Adds Value for Ecommerce

Implementing Sequential Testing

Common Mistakes with Sequential Testing

Tips and Best Practices

Key Takeaways