Unraveling human reward-based learning: A hybrid approach combining reinforcement learning with advanced memory architectures

Human reward-guided learning is typically modeled using simple replay learning algorithms that summarize past experiences into key variables such as Q values, which represent expected rewards. However, recent findings suggest that these models oversimplify the complexity of human memory and decision making. For example, individual events and global reward statistics can significantly influence behavior, indicating that memory involves more than just summary statistics. ANNs, particularly RNNs, offer a more complex model by capturing long-term dependencies and intricate learning mechanisms, although they often need to be more interpretable than traditional replay learning models.

Researchers from institutions including Google DeepMind, the University of Oxford, Princeton University, and University College London studied human reward-based learning behavior using a hybrid approach that combines repetitive learning models with artificial neural networks. Their findings suggest that human behavior needs to be adequately explained by algorithms that incrementally update choice variables. Instead, human reward-based learning relies on a flexible memory system that forms complex representations of past events over multiple time scales. By iteratively replacing components of a classical repetitive learning model with artificial neural networks, they uncovered insights into how experiences shape memory and guide decision making.

A dataset was collected from a reward-learning task involving 880 participants. In this task, participants repeatedly chose between four actions, each of which was rewarded based on noisy and varying reward magnitudes. After filtering, the study included 862 participants and 617,871 valid trials. Most participants learned the task by consistently choosing actions with higher rewards. This large dataset allowed the extraction of significant behavioral variance using regressive neural networks and hybrid models, outperforming basic direct learning models in capturing human decision-making patterns.

The data were initially modeled using a traditional RL model (Best RL) and a flexible Vanilla RNN. Best RL, identified as the most effective among the incremental update models, employed a reward module to update Q-values and an action module for action perseverance. However, its simplicity limited its expressiveness. Vanilla RNN, which processes actions, rewards, and latent states together, predicted choices more accurately (68.3% vs. 58.9%). Other hybrid models such as RL-ANN and Context-ANN, while improving on Best RL, still fell short of Vanilla RNN. Memory-ANN, which incorporates recurrent memory representations, matched the performance of Vanilla RNN, suggesting that fine-grained memory usage was key to participants’ learning on the task.

The study reveals that traditional repetitive learning models, which rely solely on incrementally updated decision variables, need to catch up in predicting human choices compared to a new model that incorporates memory-sensitive decision making. This new model distinguishes between decision variables that drive choices and memory variables that modulate how these decision variables are updated based on past rewards. Unlike repetitive learning models, where decision and learning variables are intertwined, this approach separates them, providing a clearer understanding of how learning influences choices. The model suggests that human knowledge is influenced by compressed memories of task history, reflecting both short- and long-term reward and action histories, which modulate learning regardless of how they are implemented.

Memory-ANN, the proposed modular cognitive architecture, separates reward-based learning from action-based learning, supported by evidence from computational modeling and neuroscience. The architecture comprises a “shallow” level of decision rules that process observable data and a “deep” level that handles complex, context-rich representations. This dual-layer system enables flexible, context-driven decision making, suggesting that human reward-based learning involves simple, surface-level processes and deeper, memory-based mechanisms. These findings agree that complex models with rich representations should capture the full spectrum of human behavior, particularly in learning tasks. The insights gained here could have broader applications, extending to various learning tasks and cognitive science.

Take a look at the PaperAll credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our fact sheet..

Don't forget to join our Subreddit with over 48 billion users

Find upcoming ai webinars here

Sana Hassan, a Consulting Intern at Marktechpost and a dual degree student at IIT Madras, is passionate about applying technology and ai to address real-world challenges. With a keen interest in solving practical problems, she brings a fresh perspective to the intersection of ai and real-life solutions.

x-300-.png” alt=””/>

Unraveling human reward-based learning: A hybrid approach combining reinforcement learning with advanced memory architectures

Technical Terrence Team

Here's why I would invest £800 in the stock market now to start building wealth

Leave a Reply Cancel reply

Recommended.

Drink of the week: The coconut martini

Choosing the right Whisper model: When to use Whisper v2, Whisper v3 and Distilled Whisper?

Dr.Death NFTs Enroll in Web3 Mystery School

Stability AI introduces Stable Code: a general-purpose codebase language model

With a 7% dividend, is Aviva’s stock an obvious buy now?

Categories

Important Links

Unraveling human reward-based learning: A hybrid approach combining reinforcement learning with advanced memory architectures

Related

Technical Terrence Team

Here's why I would invest £800 in the stock market now to start building wealth

Leave a Reply Cancel reply

Recommended.

Drink of the week: The coconut martini

Choosing the right Whisper model: When to use Whisper v2, Whisper v3 and Distilled Whisper?

Dr.Death NFTs Enroll in Web3 Mystery School

Stability AI introduces Stable Code: a general-purpose codebase language model

With a 7% dividend, is Aviva’s stock an obvious buy now?

Categories

Important Links

Get daily news updates to your inbox!