There’s a classic scene in almost every police procedural: a weathered detective stands staring at a collection of photos pinned to a wall. Thin, red yarn traces the connections between the different players. Something’s clearly missing.
In a sudden flash of inspiration, the final link snaps into the detective’s mind. He dashes off, frantically yelling to his partner that he finally figured out “whodunnit.”
Although we’re not all seasoned crime solvers, under the hood our brains share one remarkable skill: the ability to reason about how one thing relates to another.
This type of logical acrobatics—dubbed “relational reasoning”—silently operates behind even the most banal situations: when is it safe to cross the street with multiple oncoming cars? Which entrée and wines go best together? How many attractions are around your hotel?
To a human, reasoning about relationships feels intuitive and simple. To an AI, it’s unfathomably hard.
That may be set to change. Last week, the researchers at DeepMind, the mysterious deep learning company that gave us AlphaGo, published a paper detailing a new algorithm that endows machines with a spark of human ingenuity.
The plug-and-play “relation networks” (RNs) are bits of code that forces an AI to explicitly think about relations between a group of mental representations—static objects, moving people, or even abstract ideas.
Like a powerful Turbo charger, when combined with existing machine learning tools, RNs gave the AIs a logic boost—so much so that they outperformed humans on several image-based reasoning tasks.
As a “fundamental part of human intelligence,” relational reasoning acts like a multitool to transfer know-how from one domain to another, says Dr. Sam Gershman, a computational neuroscientist at Harvard who was not involved in the study.
And while RNs only capture a snippet of human reasoning, it’s a “step in the right direction” towards generally intelligent machines with the flexibility and efficiency of human thought.
Thinking fast and slow
Not all AIs are created equal. Like students specializing in either arts or sciences, the two main types of AIs—symbolic and statistical—each have their own quirks.
Symbolic AIs use a powerful set of math operations to reason about relations between things, so they do deal with logic. The problem is that they’re constrained by predetermined rules. In other words, they’re terrible at learning on the fly, and any small variation in the task can throw them off track—not exactly ideal to tackle the challenges of our ever-changing world.
In contrast, statistical AIs (better known as machine learning) rely on millions of examples to find patterns in a dataset. The poster child of statistical AIs is deep learning, the driving force behind AlphaGo and various face-tagging services that has taken the world by storm.
As revolutionary as they are, however, deep neural networks are still terrible at finding complex relations in a data structure, especially when they don’t have enough training examples.
DeepMind combines the best of both worlds with their new algorithm: an artificial neural network capable of pattern recognition and reasoning about those patterns.
Artificial neural networks are loosely based on their biological counterparts in our brains. Rather than operating on pre-set rules, they learn to discover patterns by tweaking the connections between their “neurons”—like fine-tuning a guitar.
Each neural network has their own structure to support one task: labeling images, translating languages or playing GO and Atari games. DeepMind’s RN is similar in this way: it has a unique structure that “primes” it to compare every possible pair of objects within a system.
“We’re explicitly forcing the network to discover the relationships that exist between the objects,” says study author Timothy Lillicrap. “The capacity to compute relations is baked into the RN architecture,” he adds.
In a series of experiments, the team carefully tested the RN’s capabilities. First, they trained the algorithm on CLEVR—a database of images composed of simple objects designed to explicitly explore an AI’s ability to perform several types of reasoning, such as counting, comparing or querying.
In each image, the algorithm had to answer questions about the relations between objects in a scene. For example, “What shape is the small object that is in front of the yellow matte thing and behind the gray sphere?” or “What number of objects are blocks that are in front of the large red cube or green balls?”
What seems like a no-brainer to humans is actually a two-step process. To get it right, you need to first identify the objects and characterize their properties. Then, you have to put them all into a broader context of the image to build hypotheses about how they relate to each other.
But the RN didn’t go at it alone. To tackle this task, the authors combined it with two other neural networks: one for image processing, and one for interpreting the questions. After rounds and rounds of training, the algorithm network answered correctly 96 percent of the time, more than the 92 percent humans scored. Traditional neural networks without the RN module faltered far behind, netting around 63 percent.
Next, DeepMind switched gears and tested the RN on a word-based task to gauge its versatility. The network was exposed to short stories like “Sandra picked up the football,” and “Sandra went to the office,” which led to the question “Where is the football?”
The RN-augmented network performed just as well as state-of-the-art models at 95 percent on most of the tasks, but especially excelled at questions requiring inference— “The dog is a black Deerhound. The Deerhound’s name is Sirius. What color is Sirius?”—scoring twice as high as conventional networks.
Finally, the algorithm parsed a simulation of 10 bouncing balls, with some randomly selected to pair up, as if tied by invisible springs or rigid constraints. By analyzing the relative positions and speed of the balls, the RN identified more than 90 percent of the connected pairs.
The beauty of RN lies in its simplicity. The core of the algorithm is a single equation, meaning it can be tagged onto existing network structures to give them a boost. RN-enhanced networks could one day automatically analyze surveillance footage, study social networks, or guide self-driving cars through complex intersections with many moving components.
That said, RN only analyzes pair-wise connections. To really understand ever more complex relational structures, they’ll have to compare triplets, quadruplets or (more meta) pairs-of-pairs. And while it deals with moving objects to an extent, it doesn’t predict the future trajectory of objects—a crucial part of relational reasoning.
“There is a lot of work needed to solve richer real-world data sets,” says study author Adam Santoro.
DeepMind has already made strides on this problem. In another paper, they described a “Visual Interaction Network (VIN)” that predicts the future of moving objects based on their properties and physical surroundings—a sort of physics engine, like the one built into our brains.
Both of the studies show that by carving the world into objects and their relations, we could give AIs the ability to generalize. They learn to form “new combinations of objects and reason about scenes that superficially might look very different but have underlying common relations,” explain the authors.
And while that’s not the only aspect of intelligence, it’s certainly a necessary one.