Like Humans, This Breakthrough AI Makes Concepts Out of the Words It Learns

Shelly Fan

Oct 31, 2023

ai large language model gpt-4 generalization words to concepts

Prairie dogs are anything but dogs. With a body resembling a Hershey’s Kiss and a highly sophisticated chirp for communications, they’re more hamster than golden retriever.

Humans immediately get that prairie dogs aren’t dogs in the usual sense. AI struggles.

Even as toddlers, we have an uncanny ability to turn what we learn about the world into concepts. With just a few examples, we form an idea of what makes a “dog” or what it means to “jump” or “skip.” These concepts are effortlessly mixed and matched inside our heads, resulting in a toddler pointing at a prairie dog and screaming, “But that’s not a dog!”

Last week, a team from New York University created an AI model that mimics a toddler’s ability to generalize language learning. In a nutshell, generalization is a sort of flexible thinking that lets us use newly learned words in new contexts—like an older millennial struggling to catch up with Gen Z lingo.

When pitted against adult humans in a language task for generalization, the model matched their performance. It also beat GPT-4, the AI algorithm behind ChatGPT.

The secret sauce was surprisingly human. The new neural network was trained to reproduce errors from human test results and learn from them.

“For 35 years, researchers in cognitive science, artificial intelligence, linguistics, and philosophy have been debating whether neural networks can achieve human-like systematic generalization,” said study author Dr. Brenden Lake. “We have shown, for the first time, that a generic neural network can mimic or exceed human systematic generalization in a head-to-head comparison.”

A Brainy Feud

Most AI models rely on deep learning, a method loosely based on the brain.

The idea is simple. Artificial neurons interconnect to form neural networks. By changing the strengths of connections between artificial neurons, neural networks can learn many tasks, such as driving autonomous taxis or screening chemicals for drug discovery.

However, neural networks are even more powerful in the brain. The connections rapidly adapt to ever-changing environments and stitch together concepts from individual experiences and memories. As an example, we can easily identify a wild donkey crossing the road and know when to hit the brakes. A robot car may falter without wild-donkey-specific training.

The pain point is generalization. For example: What is a road? Is it it a paved highway, rugged dirt path, or hiking trail surrounded by shrubbery?

Back in the 1980s, cognitive scientists Jerry Fodor and Zenon Pylyshyn famously proposed that artificial neural networks aren’t capable of understanding concepts—such as a “road”—much less flexibly using them to navigate new scenarios.

The scientists behind the new study took the challenge head on. Their solution? An artificial neural network that’s fine-tuned on human reactions.

Man With Machine

As a baseline, the team first asked 25 people to learn a new made-up language. Compared to using an existing one, a fantasy language prevents bias when testing human participants.

The research went “beyond classic work that relied primarily on thought experiments” to tap into human linguistic abilities, the authors explained in their study. The test differed from previous setups that mostly focused on grammar. Instead, the point was for participants to understand and generalize in the made-up language from words alone.

Like they were teaching a new language, the team started with a bunch of simple nonsense words: “dax,” “lug,” “wif,” or “zup.” These translate as basic actions such as skipping or jumping.

The team then introduced more complex words, “blicket” or “kiki,” that can be used to string the previous words together into sentences—and in turn, concepts and notions. These abstract words, when used with the simple words, can mean “skip backwards” or “hop three times.”

The volunteers were trained to associate each word with a color. For example, “dax” was red, “lug” was blue. The colors helped the volunteers learn rules of the new language. One word combination resulted in three red circles, another flashed blue. But importantly, some words, such as “fep,” lit up regardless of other words paired with it—suggesting a grammatical basis in the fantasy language.

After 14 rounds of learning, the volunteers were challenged with 10 questions about the meaning of the made-up words and asked to generalize to more complex questions. For each task, the participants had to select the corresponding color circles and place them in the appropriate order to form a phrase.

They excelled. The humans picked the correct colors roughly 80 percent of the time. Many of the errors were “one-to-one” translation problems, which translated a word to its basic meaning without considering the larger context.

Be Part of the Future

100% Free. No Spam. Unsubscribe any time.

A second team of 29 more people also rapidly learned the fantasy language, translating combinations such as “fep fep” without trouble.

Language Learned

To build the AI model, the team focused on several criteria.

One, it had to generalize from just a few instances of learning. Two, it needed to respond like humans to errors when challenged with similar tasks. Finally, the model had to learn and easily incorporate words into its vocabulary, forming a sort of “concept” for each word.

To do this, the team used meta-learning for compositionality. Yes, it sounds like a villain’s superpower. But what it does is relatively simple.

The team gave an artificial neural network tasks like the ones given to the human volunteers. The network is optimized as dynamic “surges” change its overall function, allowing it to better learn on the fly compared to standard AI approaches, which rely on static data sets. Usually, these machines process a problem using a set of study examples. Think of it as deciphering Morse code. They receive a message—dots and dashes—and translate the sequence into normal English.

But what if the language isn’t English, and it has its own concepts and rules? A static training set would fail the AI wordsmith.

Here, the team guided the AI through a “dynamic stream” of tasks that required the machine to mix-and-match concepts. In one example, it was asked to skip twice. The AI model independently learned the notion of “skip”—as opposed to “jump”—and that twice means “two times.” These learnings were then fed through the neural network, and the resulting behavior was compared to the instruction. If, say, the AI model skipped three times, the results provided feedback to help nudge the AI model towards the correct response. Through repetition, it eventually learned to associate different concepts.

Then came the second step. The team added a new word, say, “tiptoe,” into a context the AI model had already learned, like movement, and then asked it to “tiptoe backwards.” The model now had to learn to combine “tiptoe” into its existing vocabulary and concepts of movement.

To further train the AI, the team fed it data from the human participants so it might learn from human errors. When challenged with new puzzles, the AI mimicked human responses in 65 percent of the trials, outperforming similar AI models—and in some cases, beating human participants.

The model raises natural questions for the future of language AI, wrote the team. Rather than teaching AI models grammar with examples, giving them a broader scope might help them mimic children’s ability to grasp languages by combining different linguistic components.

Using AI can help us understand how humans have learned to combine words into phrases, sentences, poetry, and essays. The systems could also lead to insights into how children build their vocabulary, and in turn, form a gut understanding of concepts and knowledge about the world. Language aside, the new AI model could also help machines parse other fields, such as mathematics, logic, and even, in a full circle, computer programming.

“It’s not magic, it’s practice. Much like a child also gets practice when learning their native language, the models improve their compositional skills through a series of compositional learning tasks,” Lake told Nature.

Image Credit: Andreas Fickl / Unsplash

Shelly Fan

Dr. Shelly Xuelai Fan is a neuroscientist-turned-science-writer. She's fascinated with research about the brain, AI, longevity, biotech, and especially their intersection. As a digital nomad, she enjoys exploring new cultures, local foods, and the great outdoors.

Scientists Want to Give ChatGPT an Inner Monologue to Improve Its ‘Thinking’

Ricky J. Sethi

Feb 06, 2026

Humanity’s Last Exam Stumps Top AI Models—and That’s a Good Thing

Shelly Fan

Feb 03, 2026

Paint splatters and a shadow of a person

AI Now Beats the Average Human in Tests of Creativity

Edd Gent

Jan 27, 2026

Artificial Intelligence

Scientists Want to Give ChatGPT an Inner Monologue to Improve Its ‘Thinking’

Ricky J. Sethi

Feb 06, 2026

Artificial Intelligence

Humanity’s Last Exam Stumps Top AI Models—and That’s a Good Thing

Shelly Fan

Feb 03, 2026

Artificial Intelligence

AI Now Beats the Average Human in Tests of Creativity

Edd Gent

Jan 27, 2026

What we’re reading