Stablebaseline3 (sb3) is like a swiss army knife. It is a multifunction tool that can be used for many purposes. And, just as a Swiss Army knife can save your life if you’re stranded in the jungle, sb3 can save your life in the office, when you have seemingly impossible deadlines to meet.
This guide uses gym=0.28.1 and stable baselines=2.1.0. If you use different versions, or perhaps even consult other older guides, you may not get the following results. But don’t worry, an installation guide is also provided here. I guarantee you can get the results if you follow my instructions.
Stablebaseline3 is easy to use. It is also well documented and you can follow the tutorials on your own. But…
- Have you consulted older guides (perhaps those that use
gym
), only to find errors on your machine? - Can you always guarantee compatibility?
- What if you want to use?
gymnasium
The environment and perhaps modify the rewards? - Do you know how to wrap your own tasks, so that SOTA models can be applied in a few lines?
That is the goal of this article! After reading this guided demo, you will…
- Solve classic environments with sb3 models, visualize the results, and save (or load) the trained model in a few lines of code. (Section 3.1)
- Understand how to check the compatibility of action space and observation space. (Section 3.2)
- Learn to wrap
gymnasium
environments so that any SB3 model can be used, without restrictions ofbox
eitherdiscrete
. (Section 4.1) - Learn to wrap
gymnasium
environments for setting rewards. (Section 4.2) - Learn how to tune your own custom environments to be compatible with sb3, with minimal changes to your original code that may follow a different structure. (Section 5)
Create a virtual environment and configure the relevant dependencies. I serve the majority: here the guide is created using Windows…