Babies are bubbly, cuddly, giggly balls of joy. They’re also enormously powerful learning machines. At three months old, they already have intuition about how things around them behave—without anyone explicitly teaching them the rules of the game.
This ability, dubbed “intuitive physics,” seems extremely trivial on the surface. If I fill a glass with water and set it on the table, I know that the glass is an object—something I can wrap my hands around without it melting into my palms. It won’t sink through the table. And if it started levitating, I’d stare then immediately run out the door.
Babies rapidly develop this ability by soaking up data from their external environments, forming a sort of “common sense” about the dynamics of the physical world. When things don’t move as expected—say, in magic tricks where objects disappear—they’ll show surprise.
For AI, it’s a completely different matter. While recent AI models have already trounced humans from game play to solving decades-old scientific conundrums, they still struggle at developing intuition about the physical world.
This month, researchers at Google-owned DeepMind took inspiration from developmental psychology and built an AI that naturally extracts simple rules about the world through watching videos. Netflix and chill didn’t work on its own; the AI model only learned the rules of our physical world when given a basic idea of objects, such as what their boundaries are, where they are, and how they move. Similar to babies, the AI expressed “surprise” when shown magical situations that didn’t make sense, like a ball rolling up a ramp.
Dubbed PLATO (for Physics Learning through Auto-encoding and Tracking Objects), the AI was surprisingly flexible. It needed only a relatively small set of examples to develop its “intuition.” Once it learned that, the software could generalize its predictions about how things moved and interacted with other objects, as well as about scenarios never previously encountered.
In a way, PLATO hits the sweet spot between nature and nurture. Developmental psychologists have long argued about whether learning in babies can be achieved from finding patterns in data from experiences alone. PLATO suggests the answer is no, at least not for this particular task. Both built-in knowledge and experience are critical to completing the whole learning story.
To be clear, PLATO isn’t a digital replica of a three-month-old baby—and was never designed to be. However, it does provide a glimpse into how our own minds potentially develop.
“The work…is pushing the boundaries of what everyday experience can and cannot account for in terms of intelligence,” commented Drs. Susan Hespos and Apoorva Shivaram, at Northwestern University and Western Sydney University, respectively, who were not involved in the study. It may “tell us how to build better computer models that simulate the human mind.”
The Common Sense Conundrum
At just three months old, most babies won’t bat an eye if they drop a toy and it falls to the ground; they’ve already picked up the concept of gravity.
How this happens is still baffling, but there are some ideas. At that age, babies still struggle to wriggle, crawl, or otherwise move around. Their input from the outside world is mostly through observation. That’s great news for AI: it means that rather than building robots to physically explore their environment, it’s possible to imbue a sense of physics into AI through videos.
It’s a theory endorsed by Dr. Yann LeCun, a leading AI expert and chief AI scientist at Meta. In a talk from 2019, he posited that babies likely learn through observation. Their brains build upon these data to form a conceptual idea of reality. In contrast, even the most sophisticated deep learning models still struggle to build a sense of our physical world, which limits how much they can engage with the world—making them almost literally minds in the clouds.
So how do you measure a baby’s understanding of everyday physics? “Luckily for us, developmental psychologists have spent decades studying what infants know about the physical world,” wrote lead scientist Dr. Luis Piloto. One particularly powerful test is the violation-of-expectation (VoE) paradigm. Show a baby a ball rolling up a hill, randomly disappearing, or suddenly going the opposite direction, and the baby will stare at the anomaly for longer than it would when ibserving its normal expectations. Something strange is up.
In the new study, the team adapted VoE for testing AI. They tackled five different physical concepts to build PLATO. Among those are solidity—that is, two objects can’t pass through each other; and continuity—the idea that things exist and don’t blink out even when hidden by another object (the “peek-a-boo” test).
To build PLATO, the team first started with a standard method in AI with a two-pronged approach. One component, the perceptual model, takes in visual data to parse discrete objects in an image. Next is the dynamics predictor, which uses a neural network to consider the history of previous objects and predict the behavior of the next one. In other words, the model builds a “physics engine” of sorts that maps objects or scenarios and guesses how something would behave in real life. This setup gave PLATO an initial idea of the physical properties of objects, such as their position and how fast they’re moving.
Next came training. The team showed PLATO under 30 hours of synthetic videos from an open-sourced dataset. These aren’t videos from real-life events. Rather, imagine old-school Nintendo-like blocky animations of a ball rolling down a ramp, bouncing into another ball, or suddenly disappearing. PLATO eventually learned to predict how a single object would move in the next video frame, and also updated its memory for that object. With training, its predictions on the next “scene” became more accurate.
The team then threw a wrench into the spokes. They presented PLATO with both a normal scene and an impossible one, such as a ball suddenly disappearing. When measuring the difference between the actual event and PLATO’s predictions, the team could gauge the AI’s level of “surprise”—which went through the roof for magical events.
The learning generalized to other moving objects. Challenged with a completely different dataset developed by MIT, featuring, among other items, rabbits and bowling pins, PLATO expertly discriminated between impossible and realistic events. PLATO had never “seen” a rabbit before, yet without any re-training, it showed surprise when a rabbit defied the laws of physics. Similar to babies, PLATO was able to capture its physical intuition with as little as 28 hours of video training.
To Hespos and Shivaram, “These findings also parallel characteristics that we see in infant studies.”
PLATO isn’t meant as an AI model for infant reasoning. But it showcases that tapping into our burgeoning baby brains can inspire computers with a sense of physicality, even when the software “brain” is literally trapped inside a box. It’s not just about building humanoid robots. From prosthetics to self-driving cars, an intuitive grasp of the physical world bridges the amorphous digital world of 0s and 1s into every day, run-of-the-mill reality.
It’s not the first time AI scientists think to turbo-charge machine minds with a dash of toddler ingenuity. One idea is to give AI a sense of theory of mind—the ability to distinguish itself from others, and being able to picture itself in others’ shoes. It’s an ability that comes naturally for kids around four years old, and if embedded into AI models, could dramatically help it understand social interactions.
The new study builds upon our early months in life as a rich resource for developing AI with common sense. For now, the field is just in its infancy. The authors are releasing their dataset for others to build on and explore an AI model’s ability to interact with more complex physical concepts, including videos from the real world. For now, “these studies could serve as a synergistic opportunity across AI and developmental science,” said Hespos and Shivaram.