Why, in a world where the only constant is change, do we need a Continuous learning Approaching ai models.
Imagine you have a small robot designed to walk around your garden and water your plants. Initially, you spend a few weeks collecting data to train and test the robot, which is a considerable investment of time and resources. The robot learns to navigate your garden efficiently when the ground is covered with grass and bare soil.
However, as the weeks go by, the flowers begin to bloom and the appearance of the garden changes significantly. The robot, trained with data from a different season, now fails to accurately recognize its surroundings and struggles to complete its tasks. To fix this, new examples of the blooming garden need to be added to the model.
Your first thought is to add new data examples to the training and retrain the model from scratch. But this is expensive and you don't want to do it every time the environment changes. Plus, you just realized that you don't have all the historical training data available.
Now, consider simply fine-tuning the model with new samples, but this is risky because the model may lose some of its previously learned capabilities, leading to catastrophic oblivion (a situation in which the model loses previously acquired knowledge and skills when it learns new information).
So, is there an alternative? Yes, use continuous learning!
Of course, the robot that waters the plants in a garden is only an illustrative example of the problem. In the later parts of the text you will see more realistic applications.
Learn adaptively with continuous learning (CL)
It is not possible to anticipate and prepare for all possible scenarios that a model may face in the future, so in many cases, adaptive training of the model as new samples arrive may be a good option.
At CL we want to find a balance between the stability of a model and its plasticityStability is the ability of a model to retain previously learned information, and plasticity is its ability to adapt to new information as new tasks are introduced.
“(…) in the continuous learning scenario, a learning model is required to incrementally build and dynamically update internal representations as the task distribution dynamically changes throughout its lifetime..” (2)
But how to control stability and plasticity?
Researchers have identified various ways of building adaptive models. In (3) the following categories have been established:
- Regularization-based approach
- In this approach we add a regularization term that should balance the effects of old and new tasks on the model structure.
- For example, weight regulation. It aims to control parameter variation by adding a penalty term to the loss function, which penalizes parameter change by taking into account how much it contributed to previous tasks.
2. Repetition-based approach
- This group of methods focuses on recovering some of the historical data so that the model can continue to solve previous tasks reliably. One of the limitations of this approach is that we need to access the historical data, which is not always possible.
- For example, the repetition of experience.where we retain and replay a sample of old training data. When training a new task, some examples of previous tasks are added to expose the model to a mix of old and new task types, limiting catastrophic forgetting.
3. Optimization-based approach
- Here we want to manipulate the optimization methods to maintain the performance of all tasks, while reducing the effects of catastrophic forgetting.
- For example, gradient projection is a method in which gradients calculated for new tasks are projected so that they do not affect previous gradients.
4. Representation-based approach
- This group of methods focuses on obtaining and using robust feature representations to avoid catastrophic forgetting.
- For example, self-supervised learningwhere a model can learn a robust representation of the data before being trained on specific tasks. The idea is to learn high-quality features that reflect good generalization across different tasks a model may encounter in the future.
5. Architecture-based approach
- The above methods assume a single model with a single parameter space, but there are also a number of techniques in CL that exploit the model architecture.
- For example, parameter assignmentwhere, during training, each new task is assigned a dedicated subspace in a network, which eliminates the problem of destructive parameter interference. However, if the network is not fixed, its size will increase with the number of new tasks.
And how to evaluate the performance of CL models?
The basic performance of CL models can be measured from several angles (3):
- Overall performance evaluation: Average performance across all tasks
- Assessing memory stability: Calculate the difference between your maximum performance for a given task before and your current performance after continued training.
- Assessing learning plasticity: measure the difference between the performance of the joint training (if trained with all data) and the performance when trained using CL
So why don’t all ai researchers switch to continuous learning right away?
If you have access to historical training data and are not concerned about computational cost, it may seem easier to train from scratch.
One reason for this is that the interpretability of what happens in the model during continuous training is still limited. If training from scratch gives the same or better results than continuous training, then people may prefer the easier approach, i.e. retraining from scratch, rather than spending time trying to understand the performance issues of CL methods.
Furthermore, current research tends to focus on evaluating models and frameworks, which may not reflect well the actual use cases that the enterprise may have. As mentioned in (6), there are many synthetic incremental benchmarks that do not reflect well real-world situations where there is a natural evolution of tasks.
Finally, as noted in (4), many papers on the topic of CL focus on storage rather than computational costs, and in fact storing historical data is much less expensive and energy-intensive than retraining the model.
If more attention were paid to including computational and environmental costs in model retraining, more people might be interested in improving the current state of the art in CL methods, as they would see measurable benefits. For example, as mentioned in (4), model retraining can outperform 10,000 GPU days training for large recent models.
Why should we work on improving CL models?
Continuous learning seeks to address one of the most difficult obstacles of current ai models: the fact that the distribution of data changes over time. Retraining is expensive and requires large amounts of computation, which is not a very sustainable approach from an economic and environmental perspective. Therefore, in the future, well-developed continuous learning methods may enable models that are more accessible and reusable to a larger community of people.
As found and summarized in (4), there is a list of applications that inherently require or could benefit from well-developed CL methods:
- Model editing
- Selective editing of an error-prone part of a model without damaging other parts of the model. Continuous learning techniques could help to continuously correct model errors at a much lower computational cost.
2. Personalization and specialization
- Sometimes general-purpose models need to be tailored to be more personalized for specific users. With continuous learning, we could update only a small set of parameters without introducing catastrophic forgetting into the model.
3. Learning on the device
- Small devices have limited memory and computational resources, so methods that can efficiently train the model in real time as new data arrives, without having to start from scratch, could be useful in this area.
4. Faster retraining with warm start
- Models need to be updated when new samples become available or when the distribution changes significantly. With continuous learning, this process can be made more efficient by updating only the parts affected by the new samples, rather than retraining from scratch.
5. Reinforcement learning
- Reinforcement learning involves agents interacting with an environment that is often non-stationary. Therefore, efficient continuous learning methods and approaches could potentially be useful for this use case.
Learn more
As you can see, there is still There is much room for improvement in the area of continuous learning methods.If you are interested you can start with the materials below:
- Introductory course: (Continuing learning course) Lesson #1: Introduction and motivation from ContinualAI on YouTube https://youtu.be/z9DDg2CJjeE?si=j57_qLNmpRWcmXtP
- Document on the motivation for Continuous Learning: Continuous learning: application and the way forward (4)
- Document on the state of the art of Continuous Learning techniques: Comprehensive study of continuous learning: theory, method and application (3)
If you have any questions or comments, feel free to share them in the comments section.
Health!
(1) Awasthi, A. and Sarawagi, S. (2019). Continuous learning with neural networks: a review. In Proceedings of the ACM India Joint International Conference on Data Science and Data Management (pp. 362–365). Association for Computing Machinery.
(2) Continuous ai Wiki Introduction to continuous learning https://wiki.continualai.org/the-continualai-wiki/introduction-to-continuous-learning
(3) Wang, L., Zhang, x., Su, H., & Zhu, J. (2024). A comprehensive study of continuous learning: theory, method and applicationIEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5362–5383.
(4) Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke Hüllermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu , Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars and Gido M. van de Ven. (2024). Continuous learning: applications and the way forward https://arxiv.org/abs/2311.11908
(5) Awasthi, A., and Sarawagi, S. (2019). Continuous learning with neural networks: A review. In Proceedings of the ACM India Joint International Conference on Data Science and Data Management (pp. 362–365). Association for Computing Machinery.
(6) Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar and Fartash Faghri. (2024). TiC-CLIP: Continuous Training of CLIP Models.