Mike's Notes
This came through today. I copied part of Alyona Vert's article from TuringPost below as an introduction to these papers.
- A Comprehensive Survey of Continual Learning: Theory, Method and Application, by Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu, submitted to ArXiv on 31/01/2023.
- Continual Learning and Catastrophic Forgetting by Gido M. van de Ven, Nicholas Soures, Dhireesha Kudithipudi, submitted to arXiv on 8/03/2024
Resources
- https://www.turingpost.com/t/AI-101
- https://www.turingpost.com/p/continuallearning
- https://arxiv.org/abs/2302.00487
- https://arxiv.org/pdf/2302.00487
- https://arxiv.org/abs/2403.05175
- https://arxiv.org/pdf/2403.05175
- https://www.youtube.com/@RealTuringPost
References
- A Comprehensive Survey of Continual Learning: Theory, Method and Application by Liyuan Wang, Xingxing Zhang, Hang Su, Jun Zhu.
- Continual Learning and Catastrophic Forgetting by Gido M. van de Ven, Nicholas Soures, Dhireesha Kudithipudi.
Repository
- Home > Ajabbi Research > Library >Subscriptions > Turing Post
- Home > Handbook >
Last Updated
27/11/2025
AI 101: What is Continual Learning?
Joined Turing Post in April 2024. Studied control systems of aircrafts at BMSTU (Moscow, Russia), where conducted several researchers on helicopter models. Now is more into AI and writing.
Can models add new knowledge without wiping out what they already know? We look at why continual learning is becoming important right now and explore the new methods emerging for it, including Google’s Nested Learning and Meta’s Sparse Memory Fine-tuning.
If you think about the term AGI, especially in the context of pre-training, you will realize that the human being is not an AGI, because a human being lacks a huge amount of knowledge. Instead, we rely on continual learning. - Ilya Sutskever
Do you feel this shift too? The idea of models learning endlessly is showing up everywhere. We see it, we hear it, and it’s all pushing the spotlight toward continual learning.
Continual learning is the ability to keep learning new things over time without forgetting what you already know. Humans do this naturally (as Ilya Sutskever also noted) and they are very flexible to changing data. But, unfortunately, neural networks are not. When developers change the training data, they often face something that is called catastrophic forgetting: the model starts loosing its previous knowledge, and returns to training model from scratch.
Finding the very balance between a model’s plasticity and its stability in previously learned knowledge and skills is becoming a serious challenge right now. Continual learning is the path to more “intelligent” systems that will save time, resources, and money spent on training, it helps mitigate biases and errors, and, in the end, things can just go easier and more naturally with model deployment.
Today we’ll look at the basics of continual learning and two approaches that are worth your attention: very recent Google’s Nested Learning and Meta FAIR’s Sparse Memory Finetuning. There is a lot to explore →
Follow us on YouTube
In today’s episode, we will cover:
- Continual Learning: the essential basics
- Setups and scenarios for Continual Learning training
- How to help models learn continually? General methods
- What is Nested Learning?
- How does Nested Learning work?
- HOPE: Google’s architecture for continual learning
- Not without limitations
- Cautious continual learning with memory layers
- Sparse Memory Finetuning
- Limitations
- Conclusion / Why continual learning is important now?
- Sources and further reading
Continual Learning: The essential basics
Continual learning means learning step-by-step from data that changes over time. So it is related to two main things:
- Non-stationary data, which means the data distribution does not stay the same and keeps shifting.
- Incremental learning – the model should add new knowledge without wiping out what it learned before.
The new pieces of information can be new skills, new examples, new environments, or new contexts. As the data comes in gradually, continual learning is also known as lifelong learning. The process of continual learning happens when the model is already deployed.
Everything would be great if models didn’t face one major challenge – catastrophic forgetting. This problem generally looks like this: a neural network is trained on Task 2 after Task 1, and its weights are updated for Task 2. This often pushes them away from the optimum for Task 1, and the model suddenly performs very poorly on that task.
The problem here is not the model’s capacity – this usually happens because of the sequential training procedure. Even in 1989-1990, Michael McCloskey and Neal J. Cohen and R. Ratcliff identified this problem and showed that simple networks lose previous knowledge extremely quickly when trained sequentially. They also highlighted that this forgetting is much worse than in humans.
But if you train on Tasks 1 and 2 interleaved, forgetting does not happen.
Image Credit: Illustration of catastrophic forgetting, “Continual Learning and Catastrophic Forgetting” paper
Preventing forgetting is only one part of the solution. Effective continual learning also requires:
- Fast adaptation
- Ability to leverage task similarities
- Task-agnostic behavior
- Robustness to noise
- High efficiency in memory and compute
- Avoiding storing all past data and retraining on all previous data
If tasks are related, the model should get better at one after learning another, which marks positive knowledge transfer:
- Forward transfer → Task 1 helps Task 2 later.
- Backward transfer → Task 2 helps improve Task 1. This is a more difficult variant for neural networks.
So, a good continual learning system needs the right balance: it should stay stable (not forget old things) while still being plastic enough to learn new ones. It also needs to handle differences within each task and across different tasks. How is it released on practice?
Image Credit: “A Comprehensive Survey of Continual Learning: Theory, Method and Application” paper
Setups and scenarios for Continual Learning training
Continual learning is mainly about moving from one task to the next while keeping performance stable or improving it during ongoing learning. That’s why two fundamental setups are used for it:
- Task-based continual learning: Data is organized into clear, separate tasks which are shown one after another, with explicit task boundaries. It is the most common setup, because it is convenient and controlled – you know exactly when tasks switch. But it doesn’t represent gradual changes found in the real world, and models may rely too heavily on boundaries for memory updates.
- Task-free continual learning: This one is more realistic, because it better reflects real-world data where distributions shift continuously. There is still an underlying set of tasks, but task boundaries are not given and transitions are smooth.
Image Credit: “Continual Learning and Catastrophic Forgetting” paper
Continual learning researchers often uses three main scenarios to describe what the model is expected to know at test time and whether it gets task identity information. Importantly, these scenarios are defined by how the changing data relates to the function the network must learn:
Upgrade to read the rest on Turing Post



No comments:
Post a Comment