Introduction to n-step time difference methods | by Oliver S | December 2024

Dissecting Richard S. Sutton's “Reinforcement Learning” with Custom Python Implementations, Episode V

In our previous post, we concluded the introductory series on fundamental reinforcement learning (RL) techniques by exploring temporal difference learning (TD). TD methods combine the strengths of Dynamic Programming (DP) and Monte Carlo (MC) methods, taking advantage of their best features to form some of the most important RL algorithms, such as Q-learning.

Building on that foundation, this post delves into TD learning in n stepsa versatile approach presented in chapter 7 of Sutton's book (1). This method bridges the gap between classical TD and MC techniques. Like TD, n-step methods use bootstrapping (leveraging prior estimates), but also incorporate the following n rewards, offering a unique combination of short and long-term learning. In a future post, we will generalize this concept even further with traces of eligibility.

We will follow a structured approach, starting with the prediction problem before moving on control. Along the way, we will:

Introduce n-step sauce,
extend it to learning outside of policies,
Explore the n-step tree backup algorithmand
Present a unifying perspective with n-step Q(σ).

As always, you can find all the code attached at GitHub. Let's dive in!

Introduction to n-step time difference methods | by Oliver S | December 2024

Technical Terrence Team

Experts question bird strike as cause of plane crash in South Korea By Reuters

Leave a Reply Cancel reply

Recommended.

Best JupyterLab Extensions for Machine Learning Research (2023)

Ethereum Fees Drop to Seven-Month Low as L2 Competition Heats Up

Exclusive-Discover to allow tracking of purchases at gun retailers starting in April By Reuters

Apple promised next-generation CarPlay in 2024, so where is it?

Vision Transformers (ViT) vs. Convolutional Neural Networks (CNN) in AI Image Processing

Categories

Important Links

Introduction to n-step time difference methods | by Oliver S | December 2024

Dissecting Richard S. Sutton's “Reinforcement Learning” with Custom Python Implementations, Episode V

Related

Technical Terrence Team

Experts question bird strike as cause of plane crash in South Korea By Reuters

Leave a Reply Cancel reply

Recommended.

Best JupyterLab Extensions for Machine Learning Research (2023)

Ethereum Fees Drop to Seven-Month Low as L2 Competition Heats Up

Exclusive-Discover to allow tracking of purchases at gun retailers starting in April By Reuters

Apple promised next-generation CarPlay in 2024, so where is it?

Vision Transformers (ViT) vs. Convolutional Neural Networks (CNN) in AI Image Processing

Categories

Important Links

Get daily news updates to your inbox!