TO Multi-Armed Bandit (MAB) It is a classic decision-making problem, where an agent must choose between multiple options (called “arms”) and maximize the total reward over a series of trials. The problem gets its name from a metaphor involving a player in a row of slot machines (one-armed bandits), each with a different but unknown payout probability. The goal is to find the best strategy for pulling arms (selected actions) and maximizing the player's overall reward over time. The MAB problem is a fancy name for the exploitation-exploration compensation.
The problem of multi-armed bandits is a fundamental problem that arises in numerous industrial applications. Let's explore it and examine interesting strategies to solve it.
You just arrived in a new city. You are a spy and plan to stay 120 days to complete your next mission. There are three restaurants in town: Italian, Chinese and Mexican. You want to maximize your gastronomic satisfaction during your stay. However, you don't know which restaurant will be the best for you. Here's how the three restaurants compare:
- italian restaurant: Average satisfaction score of…