We continue to delve deeper into Sutton's great book on RL (1) and here we focus on Monte Carlo (MC) methods. These can learn from experience alone, i.e. they do not require any kind of model of the environment, as required, for example, by the dynamic programming (DP) methods we introduced in the previous post.
This is extremely tempting, as the model is often unknown or it is difficult to model transition probabilities. Consider the game of Blackjack:although we fully understand the game and the rules, solving it through DP methods would be very tedious: we would have to calculate all kinds of probabilities, for example, given the cards currently being played, what is the probability of a “blackjack”, what is the probability of another seven being dealt…? Through MC methods, we don't have to deal with any of this, and we just play and learn from the experience.
Because they do not use a model, MC methods are unbiased. They are conceptually simple and easy to understand, but they have high variance and cannot be solved iteratively (bootstrapping).
As mentioned, here we will present these methods following Chapter 5 of Sutton's book…