A t-test is a statistical test that compares the means of two groups to determine whether the difference between them is statistically significant. In A/B testing, t-tests are most appropriate for continuous metrics — average order value, revenue per visitor, time on page, or items per cart — where you are comparing averages rather than proportions. The t-test accounts for the variability within each group (the spread of individual order values) when calculating whether the mean difference is meaningful or just noise.
For an independent samples t-test (the type used in A/B testing):
t = (Mean₁ − Mean₂) / √[(s₁²/n₁) + (s₂²/n₂)]
Where:
- Mean₁, Mean₂ = sample means for control and variant
- s₁², s₂² = sample variances
- n₁, n₂ = sample sizes
The t-statistic is converted to a p-value using a t-distribution with degrees of freedom ≈ n₁ + n₂ − 2. If p < 0.05, the difference in means is statistically significant at the 95% confidence level.
Why T-Tests Matter for Ecommerce
Conversion rate tests use chi-square or z-tests (binary outcomes). But when your experiment's success metric is average order value, cart size, or revenue per visitor, those tests don't apply — you need a t-test. AOV data in ecommerce is notoriously noisy: most orders cluster around a typical value, but a small number of high-value orders skew the mean. The t-test handles this by incorporating variance directly into the significance calculation, meaning it's less likely to be fooled by a few outlier orders into declaring a spurious winner.
Real-World Example
Sugar Cosmetics ran a bundle recommendation experiment where the variant showed "Frequently Bought Together" sets on the cart page. Their primary metric was AOV. Control AOV: ₹1,140 (standard deviation ₹680, n = 8,200). Variant AOV: ₹1,290 (SD ₹710, n = 8,400). Running a two-sample t-test gave t = 8.7, p < 0.001 — a highly significant result. The high SD (large spread in order values) was expected for a cosmetics brand with a wide price range; the t-test correctly accounted for this variance and confirmed the result was genuine.
When to Use a T-Test vs. Other Tests
- Use t-test for continuous metrics: AOV, RPV, session duration, items per cart.
- Use chi-square or z-test for binary metrics: conversion rate, click-through rate, form completion rate.
- Use Mann-Whitney U test (non-parametric alternative) when AOV or revenue data is heavily skewed — it's more accurate when distributions are very non-normal.
- Use Welch's t-test (the default in most software) when the two groups may have unequal variances.
How to Apply T-Tests Correctly
- Check that your sample sizes are sufficient — t-tests require reasonably large samples (typically 30+ per group) to be reliable.
- Look for and cap extreme outliers in revenue data before running the test, or consider a log transformation.
- Treat AOV as a secondary metric in most tests; use revenue per visitor as the primary metric since it includes both conversion rate and AOV effects.
- Use two-tailed t-tests unless you have a strong prior justification for a one-tailed test.
T-Test in A/B Testing
Advanced testing platforms allow you to specify revenue or AOV as an experiment metric and will apply a t-test automatically. In Python, scipy.stats.ttest_ind runs an independent samples t-test; in R, t.test() does the same. Understanding which test your platform uses for which metric type helps you interpret results accurately.
Run smarter A/B tests with CustomFit.ai — 14-day free trial, no credit card required.