What Is Multi-Armed Bandit? Definition & Guide

Multi-armed bandit (MAB) is an adaptive experimentation algorithm that simultaneously explores multiple variants and exploits the best-performing one by dynamically shifting traffic allocation in real time. The name comes from the analogy of a gambler facing multiple slot machines (one-armed bandits) with unknown payoff probabilities — the goal is to maximise total reward by learning which machine pays best while still occasionally trying other machines to avoid missing a better option. Unlike fixed A/B tests that split traffic 50/50 until a winner is declared, MAB continuously updates traffic weights based on observed performance.

Why Multi-Armed Bandit Matters for Ecommerce

Traditional A/B testing has a known opportunity cost: during the test, 50% of traffic is sent to a variant that may be significantly worse than control. For high-traffic, high-stakes pages like a product listing page or a flash sale landing page, this "regret" (the lost revenue from running an inferior variant) can be substantial.

MAB minimises this regret by sending progressively more traffic to the winning variant as evidence accumulates. If Variant B is clearly outperforming control after 2 days, a bandit algorithm might shift to 70% Variant B / 30% control automatically — capturing more revenue from the winner while still gathering data on the loser. This is particularly valuable during short, high-intensity events like festive sales where the cost of running an inferior experience for 7+ days at full traffic is high.

For Indian D2C brands with a mix of bestselling SKUs and experimental products, MAB is well-suited to recommendation systems and dynamic content personalisation — continuously optimising which product or offer each user sees.

Real-World Example

Pilgrim (Indian skincare brand) runs a multi-armed bandit test on their "Recommended for You" section on the cart page, testing three different recommendation algorithms. Control shows bestsellers; Variant B shows recently viewed items; Variant C shows "frequently bought together" bundles. The MAB algorithm starts at 33% traffic each. Within 48 hours, Variant C (bundle recommendations) shows an average order value (AOV) lift of ₹180 per session. The algorithm shifts traffic to 60% Variant C, 25% control, 15% Variant B. By Day 7, Variant C receives 80% of traffic and is generating an estimated additional ₹2.1 lakh/month in bundle revenue. The team ships Variant C as the default.

How to Improve / Optimize Multi-Armed Bandit

Choose MAB for optimisation, A/B testing for learning. MAB excels at finding and exploiting a winner fast. Traditional A/B tests are better when you need clean statistical evidence about why a variant won. Use MAB when you care more about revenue than clean data.
Set appropriate exploration rates. Algorithms like Epsilon-Greedy set how much traffic is permanently reserved for exploration (trying non-leaders). Too low and you may miss a better variant that starts slow; too high and you sacrifice exploitation gains.
Use Thompson Sampling for better exploration-exploitation balance. Compared to simpler bandit algorithms, Thompson Sampling is more sample-efficient and handles multiple variants gracefully — it's the default choice for most ecommerce applications.
Be cautious with MAB on long-tail pages. Low-traffic pages provide sparse reward signals — the bandit can't learn fast enough to outperform a standard A/B test. Set minimum daily visitor thresholds (at least 200–300 conversions/day total) before using MAB.
Monitor for concept drift. If user behaviour changes (seasonal shift, a viral social media post), the bandit's learned allocation may become stale. Set periodic resets or monitor variant performance for sudden reversals.

Multi-Armed Bandit in A/B Testing

MAB sits at the intersection of A/B testing and machine learning. It is most appropriate when speed and revenue optimisation matter more than statistical purity, and when you have multiple variants to evaluate simultaneously. CustomFit.ai supports bandit-style testing alongside traditional A/B experiments, letting teams choose the right approach for each experiment's goals.

Run smarter A/B tests with CustomFit.ai — 14-day free trial, no credit card required.

Put this into practice

Run A/B tests and personalize your store without code. 14-day free trial, no credit card.

Start free trial →

← Back to Conversion Glossary

Why Multi-Armed Bandit Matters for Ecommerce

Real-World Example

How to Improve / Optimize Multi-Armed Bandit

Choose MAB for optimisation, A/B testing for learning. MAB excels at finding and exploiting a winner fast. Traditional A/B tests are better when you need clean statistical evidence about why a variant won. Use MAB when you care more about revenue than clean data.

Set appropriate exploration rates. Algorithms like Epsilon-Greedy set how much traffic is permanently reserved for exploration (trying non-leaders). Too low and you may miss a better variant that starts slow; too high and you sacrifice exploitation gains.

Use Thompson Sampling for better exploration-exploitation balance. Compared to simpler bandit algorithms, Thompson Sampling is more sample-efficient and handles multiple variants gracefully — it's the default choice for most ecommerce applications.

Be cautious with MAB on long-tail pages. Low-traffic pages provide sparse reward signals — the bandit can't learn fast enough to outperform a standard A/B test. Set minimum daily visitor thresholds (at least 200–300 conversions/day total) before using MAB.

Monitor for concept drift. If user behaviour changes (seasonal shift, a viral social media post), the bandit's learned allocation may become stale. Set periodic resets or monitor variant performance for sudden reversals.

Multi-Armed Bandit in A/B Testing

Why Multi-Armed Bandit Matters for Ecommerce

Real-World Example

How to Improve / Optimize Multi-Armed Bandit

Multi-Armed Bandit in A/B Testing

Related Terms

Put this into practice

Built for every D2C category

Why Multi-Armed Bandit Matters for Ecommerce

Real-World Example

How to Improve / Optimize Multi-Armed Bandit

Multi-Armed Bandit in A/B Testing

Related Terms