AI Agents With ‘Multiple Selves’ Learn to Adapt Quickly in a Changing World

Every day we’re juggling different needs. I’m hungry but exhausted; should I collapse on the couch or make dinner? I’m overheating in dangerous temperatures but also extremely thirsty; should I chug the tepid water that’s been heating under the sun, or stick my head in the freezer until I have the mental capacity to make ice?

When faced with dilemmas, we often follow our basic instincts without a thought. But under the hood, multiple neural networks are competing to make the “best” decision at any moment. Sleep over food. Freezer over lukewarm water. They may be terrible decisions in hindsight—but next time around, we learn from our past mistakes.

Our adaptability to an ever-changing world is a superpower that currently escapes most AI agents. Even the most sophisticated AI agents break down—or require untenable amounts of computing time—as they juggle conflicting goals.

To a team led by Dr. Jonathan Cohen at the Princeton Neuroscience Institute, the reason is simple: machine learning systems generally act as a single entity, forced to evaluate, calculate, and execute one goal at a time. Although able to learn from its mistakes, the AI struggles to find the right balance when challenged with multiple opposing goals simultaneously.

So why not break the AI apart?

In a new study published in PNAS, the team took a page from cognitive neuroscience and built a modular AI agent.

The idea is seemingly simple. Rather than a monolithic AI—a single network that encompasses the entire “self”—the team constructed a modular agent, each part with its own “motivation” and goals but commanding a single “body.” Like a democratic society, the AI system argues within itself to decide on the best response, where the action most likely to yield the largest winning outcome guides its next step.

In several simulations, the modular AI outperformed its classic monolithic peer. Its adaptability especially shined when the researchers artificially increased the number of goals that it had to simultaneously maintain. The Lego-esque AI rapidly adapted, whereas its monolithic counterpart struggled to catch up.

“One of the most fundamental questions about agency is how an individual manages conflicting needs,” said the team. By deconstructing an AI agent, the research doesn’t just provide insight into smarter machine learning agents. It also “paves the way to understanding psychological conflicts inherent in the human psyche,” wrote Dr. Rober Boshra at Princeton University, who was not involved in the work.

The Video Game of Life

How do intelligent beings learn to balance conflicting needs in a complex, changing world?

The philosophical question has haunted multiple fields—neuroscience, psychology, economics—that delve into human nature. We don’t yet have clear answers. But with AI increasingly facing similar challenges as it enters the real world, it’s time to tackle the age-old problem head-on.

The new study took up the challenge in the form of a simple RPG (role-playing game). There are two characters that navigate a grid-like world, each trying to find resources to survive.

The first contestant: the monolithic agent—otherwise known as the “self”—trained using deep-Q-learning (DQL). Popularized by DeepMind, the algorithm is especially powerful at figuring out the next optimal step depending on its current state. For example, as in a video game, should I go left or right? Move which chess or Go piece, and to where? Here, the algorithm surveys the entire environment while following a single reward signal—that is, its final goal. In a sense, the monolithic agent is a unified brain that tries to maximize the best outcome after simultaneously processing all resources in tandem.

The opponent: modular AI. Like an octopus with semi-autonomous limbs, the AI agent is broken down into sub-agents, each with its own goals and feedback. To make it a fair fight, each module is also trained with DQL. The separate “brains” observe their surroundings and learn to select the best option—but only tailored to their own goals. The predicted outcomes are then summed up. The solution with the potential optimal outcome is then selected, piloting the AI agent on to its next choice.

And the playing field?

The game is an extremely stripped-down version of a survival game. Each AI agent roams around a two-dimensional grid that has different types of resources hidden in some regions. The goal is to keep the agent’s four stats at their set level, with each gradually decreasing over time. When multiple stats tumble, it’s up to the AI to decide which one to prioritize.

For video gamers, think of the test as being thrown into a new game map and trying to find resources to boost, for example, health, magic, stamina, and attack power. For our everyday lives, it’s balancing hunger, temperature, sleep, and other basic physiological needs.

“For example, if the agent had a low ‘hunger’ stat, it could collect the ‘food’ resource by moving to the location of that resource,” explained the team.

Forest for the Trees

The first test started with a relatively simple environment. The location for each resource goal was fixed at the corner of the gaming arena. The monolithic agent readily maintained its four stats after 30,000 training steps, though it went through a period of overshooting and undershooting until reaching the targeted goals. In contrast, the modular agent learned far faster. By 5,000 learning steps, the agent had already captured an understanding of the “state of the world.”

Part of the modular AI’s prowess came from an intrinsic sense of free exploration, said the authors. Unlike previous methods for modular systems that divide and conquer to move towards a final goal, here the AI represents a more holistic social relationship—one in which some modules gain and some lose through a constant state of internal competition.

Because the AI agent’s “body” is guided only by the winning module, the losing ones have to go along with a decision they didn’t agree with and are forced into a new reality. They then have to rapidly adapt and recalculate the best solution for the next step. In other words, modules often find themselves outside their comfort zone. It’s tough love, but the unexpected results force them to ponder new solutions—sometimes yielding better results they wouldn’t have considered if tackling the problem alone.

Overall, the modular system forms a “virtuous cycle with exploration” to further improve AI actions, said study author Zack Dulberg.

This adaptability further shone when the team challenged both AI agents in changing environments. In one test, the resource goal positions moved to a random grid location at sporadic time scales. The modular AI quickly picked up on the changes and adapted to them, whereas the monolithic agent performed far worse.

In another test the team turned up the dial, requiring the AI agents to simultaneously maintain eight factors rather than the original four. The test tackled the problem that computations become increasingly improbable in terms of time and energy consumption as the number of variables go up—dubbed the “curse of dimensionality.”

The modular agent rapidly adapted to hunt down resources to maintain its goals. In contrast, the monolithic agent again struggled, taking far longer to return to the desired levels for each of its stats.

One Versus Many

The modular approach is another example of tapping into neuroscience for the development of AI—while providing insight into how our noggins work.

Similar to previous work, the modular modules show that it’s possible to have a single AI agent learn separate and easier sub-problems in parallel in a way that’s relatively decentralized in terms of data processing. Adding a model with a hierarchical control system could bolster the AI, said the authors, because both structures exist in the natural world.

For now, each module is programmed for its own gains—a multiple of selves. But our goals in life are often interlinked; for example, alleviating thirst and battling heat aren’t mutually exclusive. The team highlights the need to integrate these crossovers—and learn whether they are inherited or learned—in future tests.

To Dulberg, the unknown is part of the excitement. “How do modules develop? What features of the developmental environment put pressure on different solutions?” he asked. “And do the benefits of modularity explain why internal psychological conflict seems so central to the human condition?”

Image Credit: Anestiev/Pixabay

Shelly Fan
Shelly Fan
Shelly Xuelai Fan is a neuroscientist-turned-science writer. She completed her PhD in neuroscience at the University of British Columbia, where she developed novel treatments for neurodegeneration. While studying biological brains, she became fascinated with AI and all things biotech. Following graduation, she moved to UCSF to study blood-based factors that rejuvenate aged brains. She is the co-founder of Vantastic Media, a media venture that explores science stories through text and video, and runs the award-winning blog Her first book, "Will AI Replace Us?" (Thames & Hudson) was published in 2019.
Don't miss a trend
Get Hub delivered to your inbox