In my previous articles on reinforcement learning, I showed you how to implement (deep) Q-learning using nothing more than a little numpy and TensorFlow. While this was an important step in understanding how these algorithms work on the inside, the code tended to be long, and I even just implemented one of the most basic versions of deep Q-learning.
Given the explanations in this article, understanding the code should be fairly simple. However, if we In fact If we want to get things done, we must rely on well-documented, maintained, and optimized libraries. Just as we don't want to implement linear regression over and over again, we don't want to do the same with reinforcement learning.
In this article, I will show you the booster library. Stable baselines3 which is as easy to use as scikit-learn. However, instead of training models to predict labels, we get trained agents that can navigate their environment well.
If you are not sure what (deep) Q-learning is all about, I suggest reading my previous articles. At a high level, we want to train an agent that interacts with its environment with the goal of maximizing its total reward. The most important part of reinforcement learning is finding a good reward function for the agent.
I usually imagine a character in a game looking for a way to get the highest score, for example Mario running from start to finish without dying and, ideally, as fast as possible.
To do this, in Q-learning, we learn quality values for each pair (yes, to) where yes It is a state and to It is an action that the agent can perform. Q(yes, to) is he…