Put this into practice
Run A/B tests and personalize your store without code. 14-day free trial, no credit card.
Start free trial →Run A/B tests and personalize your store without code. 14-day free trial, no credit card.
Start free trial →Epsilon-greedy is a simple multi-armed bandit algorithm that balances exploration (trying different variants to gather data) and exploitation (sending traffic to the current best variant to maximise reward). The algorithm works as follows: with probability epsilon (ε), it selects a random variant for exploration; with probability 1 − ε, it selects the variant with the highest observed conversion rate for exploitation. Epsilon is a fixed value set between 0 and 1 — commonly 0.1 (10% exploration) or 0.2 (20% exploration).
Decision rule: Select random variant with probability ε; select best-known variant with probability (1 − ε).
Epsilon-greedy is the simplest practical bandit algorithm, making it the easiest to implement and explain to non-technical stakeholders. For ecommerce teams running personalisation experiments or homepage variant tests, it offers a straightforward way to exploit winners while maintaining ongoing exploration — ensuring you don't permanently miss a better variant that underperforms early.
The fixed exploration rate (epsilon) is both its strength and weakness. At ε = 0.1, the algorithm guarantees that 10% of traffic is always exploring — even once a clear winner is established. This is consistent and predictable, but it means you're permanently sending 10% of traffic to potentially inferior experiences. In high-revenue contexts (a ₹50 crore/year ecommerce site), that 10% exploration tax is a meaningful ongoing cost.
For Indian D2C brands experimenting with push notification copy, email subject lines, or landing page headlines at scale, epsilon-greedy is a practical first bandit algorithm because it requires almost no statistical expertise to tune — just set epsilon and let it run.
A Shopify seller of home furnishings tests three product image styles (lifestyle photos, white background, 360-degree view) using epsilon-greedy at ε = 0.15. Over 10 days, the algorithm observes that lifestyle photos drive the highest add-to-cart rate (6.2% vs. 4.8% and 4.1%). It allocates 85% of traffic to lifestyle photos and splits the remaining 15% between the other two styles for continued exploration. The seller keeps the algorithm running because their catalogue has 200+ products — the 15% exploration keeps feeding data on which style works for different product categories (textiles, decor, furniture), allowing ongoing category-level personalisation.
Epsilon-greedy occupies the middle ground between a pure A/B test (fixed 50/50 split, no adaptation) and a fully adaptive algorithm (Thompson Sampling, UCB). It is best thought of as A/B testing with a traffic rebalancing rule: a predetermined portion of traffic always explores, and the majority always exploits the current leader. For teams starting with bandit-style optimisation, epsilon-greedy is a practical entry point before graduating to more sophisticated methods.
Run smarter A/B tests with CustomFit.ai — 14-day free trial, no credit card required.