A new ML paradigm for continuous learning

A new ML paradigm for continuous learning

The last decade has seen incredible advances in machine learning (ML), driven primarily by powerful neural network architectures and the algorithms used to train them. However, despite the success of large language models (LLMs), some fundamental challenges remain, particularly with regard to continuous learning, i.e. the ability of a model to actively acquire new knowledge and skills over time without forgetting old ones.

When it comes to continuous learning and self-improvement, the human brain is the gold standard. It adapts through neuroplasticity – the remarkable ability to change its structure in response to new experiences, memories and learning. Without this ability, a person is limited to the immediate context (like anterograde amnesia). We see a similar limitation with current LLMs: their knowledge is limited either to the immediate context of their input window or to the static information they learn during pre-training.

The simple approach of continually updating a model's parameters with new data often leads to “catastrophic forgetting” (CF), where learning new tasks comes at the expense of proficiency at old tasks. Researchers traditionally combat CF through architectural optimizations or better optimization rules. However, for too long we have treated the architecture of the model (the network structure) and the optimization algorithm (the training rule) as two separate things, which prevents us from achieving a truly unified, efficient learning system.

In our article “Nested Learning: The Illusion of Deep Learning Architectures,” published at NeurIPS 2025, we introduce nested learning that closes this gap. Nested learning treats a single ML model not as a continuous process, but as a system of interconnected, multi-stage learning problems that are optimized simultaneously. We argue that the architecture of the model and the rules used to train it (i.e. the optimization algorithm) are fundamentally the same concepts; they are simply different “levels” of optimization, each with its own internal information flow (“context flow”) and update rate. By recognizing this inherent structure, nested learning offers a new, previously unseen dimension to developing more powerful AI, allowing us to create learning components with greater computational depth, ultimately helping to solve problems such as catastrophic forgetting.

We test and validate nested learning using a self-modifying proof-of-concept architecture we call “Hope.” It achieves superior language modeling performance and demonstrates better long-context memory management than existing state-of-the-art models.

Leave a comment

Your email address will not be published. Required fields are marked *