Cultural accumulation, the ability to learn skills and accumulate knowledge across generations, is considered a key factor in human success. However, current methodologies in artificial learning systems, such as deep reinforcement learning (RL), generally frame the learning problem as occurring during a single “lifetime.” This approach fails to capture the open-ended, generational nature of cultural accumulation observed in humans and other species. Achieving effective cultural accumulation in artificial agents poses significant challenges, including balancing social learning from other agents with independent exploration and discovery, as well as operating on multiple time scales that govern the acquisition of knowledge, skills, and technological advances.
Previous work has explored various approaches to social learning and cultural accumulation. The expert dropout method gradually increases the proportion of episodes without a carefully selected demonstrator. Bayesian reinforcement learning with constrained intergenerational communication uses domain-specific languages to model social learning in human populations. Large linguistic models have also been used, with language acting as a means of communication across generations. While promising, these techniques rely on explicit communication channels, incremental adjustments, or domain-specific representations, limiting their broader applicability. More general approaches are needed that can facilitate knowledge transfer without such limitations.
The researchers propose a robust approach that balances social learning from other agents with independent exploration, enabling cultural accumulation in artificial reinforcement learning agents. They build two different models to explore this accumulation under different notions of generations: episodic generations for learning in context (accumulation of knowledge) and generations in training time for learning in weight (accumulation of skills). By striking the right balance between these two mechanisms, agents can continually accumulate knowledge and skills over multiple generations, outperforming agents trained over a single lifetime with the same accumulated experience. This work represents the first general models for achieving emergent cultural accumulation in reinforcement learning, paving the way for more open learning systems and presenting new opportunities for modeling human cultural evolution.
The researchers propose two different models to investigate cultural accumulation in agents: accumulation in context and accumulation in weights. For in-context accumulation, a meta-reinforcement learning process produces a fixed policy network with parameters θ. Cultural accumulation occurs during online adaptation to new environments by distinguishing between generations using the agent's internal state ϕ. The duration of a T episode represents a single generation. For weight accumulation, each successive generation is trained from randomly initialized parameters θ, and the network weights serve as the substrate for accumulation. The number of environmental steps T used to train each generation represents a single generation.
The researchers introduce three settings to assess cultural accumulation: goal sequence, traveling salesman problem (TSP), and memory sequence. These environments are designed to require agents to discover and transmit information across generations, mimicking the cultural accumulation processes observed in humans.
The results demonstrate the effectiveness of the proposed cultural accumulation models in overcoming single-life reinforcement learning baselines in multiple environments.
In it Memory stream environment, in-context learners trained with the cultural accumulation algorithm outperformed single-life RL2 baselines and even outperformed the noisy oracles they were trained with when tested on new sequences. Interestingly, accumulation performance degraded when oracles were too precise, suggesting an overreliance on social learning that impedes independent learning in context. For him Goal Sequence Environment, in-context accumulation significantly outperformed single-life RL2 when tested on novel target sequences. Higher but imperfect oracle accuracies during training produced the most effective accumulating agents, likely due to the challenging nature of learning to follow demonstrations in this partially observable navigation task. In it TSP, cultural accumulation allowed for sustained improvements beyond RL2 in a single continuous context. The paths traveled by the agents were optimized over generations, with subsequent generations exploiting an increasingly smaller subset of edges.
In general, the contributions of this research are the following:
- He proposes two models of cultural accumulation in reinforcement learning:
- In-context model operating on episodic time scales
- Weight model that works during full training runs
- It defines successful cultural accumulation as a generational process that exceeds the performance of independent learning with the same experience budget.
- Presents algorithms for cultural accumulation models in context and in weights.
- Key results:
- Accumulation in context can be hampered by oracles that are over- or under-reliable, requiring a balance between social learning and independent discovery.
- Accumulation of weights effectively mitigates primacy bias
- Network resets further improve weight accumulation performance
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter. Join our Telegram channel, Discord channeland LinkedIn Grabove.
If you like our work, you will love our Newsletter..
Don't forget to join our 43k+ ML SubReddit | Also, check out our ai Event Platform
Asjad is an internal consultant at Marktechpost. He is pursuing B.tech in Mechanical Engineering at Indian Institute of technology, Kharagpur. Asjad is a machine learning and deep learning enthusiast who is always researching applications of machine learning in healthcare.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>