DeepMind’s New AI With a Memory Outperforms Algorithms 25 Times Its Size

Edd Gent

Dec 20, 2021

DeepMind RETRO NLP AI language conversation bubbles

Bigger is better—or at least that's been the attitude of those designing AI language models in recent years. But now DeepMind is questioning this rationale, and says giving an AI a memory can help it compete with models 25 times its size.

When OpenAI released it s GPT-3 model last June, it rewrote the rulebook for language AIs. The lab’s researchers showed that simply scaling up the size of a neural network and the data it was trained on could significantly boost performance on a wide variety of language tasks.

Since then, a host of other tech companies have jumped on the bandwagon, developing their own large language models and achieving similar boosts in performance. But despite the successes, concerns have been raised about the approach, most notably by former Google researcher Timnit Gebru.

In the paper that led to her being forced out of the company, Gebru and colleagues highlighted that the sheer size of these models and their datasets makes them even more inscrutable than your average neural network, which are already known for being black boxes. This is likely to make detecting and mitigating bias in these models even harder.

Perhaps an even bigger problem they identify is the fact that relying on ever more computing power to make progress in AI means that the cutting-edge of the field lies out of reach for all but the most well-resourced commercial labs. The seductively simple proposition that just scaling models up can lead to continual progress also means that fewer resources go into looking for promising alternatives.

But in new research, DeepMind has shown that there might be another way. In a series of papers, the team explains how they first built their own large language model, called Gopher, which is more than 60 percent larger than GPT-3. Then they showed that a far smaller model imbued with the ability to look up information in a database could go toe-to-toe with Gopher and other large language models.

The researchers have dubbed the smaller model RETRO, which stands for Retrieval-Enhanced Transformer. Transformers are the specific type of neural network used in most large language models; they train on large amounts of data to predict how to reply to questions or prompts from a human user.

RETRO also relies on a transformer, but it has been given a crucial augmentation. As well as making predictions about what text should come next based on its training, the model can search through a database of two trillion chunks of text to look for passages using similar language that could improve its predictions.

The researchers found that a RETRO model that had just 7 billion parameters could outperform the 178 billion parameter Jur assic-1 transformer made by AI21 Labs on a wide variety of language tasks, and even did better than the 280 billion-parameter Gopher model on most.

Be Part of the Future

100% Free. No Spam. Unsubscribe any time.

As well as cutting down the amount of training required, the researchers point out that the ability to see which chunks of text the model consulted when making predictions could make it easier to explain how it reached its conclusions. The reliance on a database also opens up opportunities for updating the model’s knowledge without retraining it, or even modifying the corpus to eliminate sources of bias.

Interestingly, the researchers showed that they can take an existing transformer and retro-fit it to work with a database by retraining a small section of its network. These models easily outperformed the original, and even got close to the performance of RETRO models trained from scratch.

It’s important to remember, though, that RETRO is still a large model by most standards; it’s nearly five times larger than GPT-3’s predecessor, GPT-2. And it seems likely that people will want to see what’s possible with an even bigger RETRO model with a larger database.

DeepMind certainly thinks further scaling is a promising avenue. In the Gopher paper they found that while increasing model size didn’t significantly improve performance in logical reasoning and common-sense tasks, in things like reading comprehension and fact-checking the benefits were clear.

Perhaps the most important lesson from RETRO is that scaling models isn’t the only—or even the fastest—route to better performance. While size does matter, innovation in AI models is also crucial.

Image Credit: DeepMind

Edd Gent

Edd is a freelance science and technology writer based in Bangalore, India. His main areas of interest are engineering, computing, and biology, with a particular focus on the intersections between the three.

Sparks of Genius to Flashes of Idiocy: How to Solve AI’s ‘Jagged Intelligence’ Problem

Vinay Chaudhri

Feb 27, 2026

A box is opening with a smoking gold circular-object inside.

Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It

Edd Gent

Feb 23, 2026

What the Rise of AI Scientists May Mean for Human Research

Claudia López Lloreda

Feb 20, 2026

Artificial Intelligence

Sparks of Genius to Flashes of Idiocy: How to Solve AI’s ‘Jagged Intelligence’ Problem

Vinay Chaudhri

Feb 27, 2026

Artificial Intelligence

Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It

Edd Gent

Feb 23, 2026

Future

What the Rise of AI Scientists May Mean for Human Research

Claudia López Lloreda

Feb 20, 2026

What we’re reading