Machines Are Getting Smarter—Now They Should Explain Themselves

Edd Gent

Sep 19, 2017

Neural networks’ powers of prediction have fueled the recent AI boom, but it can be hard to explain how they reach their decisions. A new technique aimed at uncovering the inner workings of language processing networks is just the latest effort to shed some light on these "black boxes."

It’s probably not surprising that we find neural networks so inscrutable, seeing as they are broadly based on the human brain, which we’re also struggling to decipher. The models they learn are not neatly stored as sequences of bits in a database like a conventional computer program, but in the weight of the connections between their thousands of virtual neurons.

These weights are not set by a human programmer; instead, the neural network essentially programs itself by looking for patterns in reams of data. So while you can test how well a neural network detects cats in a photo, it’s tricky to tell what visual patterns it uses to determine their presence or absence.

"When it comes to cat detection it’s not a major problem, but this technology is creeping into fields where being able to explain decisions could be important."

When it comes to cat detection it’s not a major problem, but this technology is creeping into fields where being able to explain decisions could be important, like financial trading and disease diagnosis. That has led to a growing body of research that’s trying to make the decisions of these algorithms more explainable.

Earlier this month, MIT engineers unveiled a technique that promises to provide insight into any natural language processing network regardless of the underlying software. That’s because it works by simply varying the input into the algorithm and measuring the impact on the output.

The group used their own neural network that compresses and decompresses natural sentences to come up with lists of closely-related sentences that can then be fed into the neural network being interrogated. By analyzing how slight variation in the input changed the output, the researchers are able to discover how the network reacts to particular words and phrases.

One of the tests they conducted was on a translation service provided as part of Microsoft’s Azure cloud services. French has different forms of each noun depending on the gender of the subject. For instance, a male dancer is a “danseur” and female one is a “danseuse.”

They found the model tended to show a preference for the masculine form in sentences containing occupations such as doctor or professor or adjectives such as smart or talented, while it chose the feminine form for charming, compassionate subjects who are dancers or nurses.

This kind of gender bias would be hard to detect by simply scouring the architecture of the translation service’s neural network, but the effects could be insidious. Being able to detect this kind of prejudice is a key driver for efforts to make neural networks more accountable, but it could also help researchers improve their performance by weeding out assumptions that lead to error.

The MIT research follows similar work from the University of Washington, which also used variations in the input to see how a model’s predictions behave. It dealt with the simpler problem of classification algorithms but was able to work on image-processing algorithms too by highlighting what sections of the image led it to make its predictions.

NVIDIA has come up with what it claims is a simpler way to achieve the same result when dealing with the video used by its PilotNet system designed to steer a self-driving car. By taking the output of the network’s higher layers and superimposing them on the layers below them they are able to create a “visualization mask” that highlights the features in the live video feed that the network thinks are important.

Going a step further, some researchers are trying to create AI able to explain its decisions to lay people, not just experts. Researchers from the US and Germany recently unveiled an algorithm that can not only analyze pictures to answer questions like “which sport is being played” but also justify an answer like “baseball” with phrases like “the player is holding a bat”.

Mark Riedl, director of the Entertainment Intelligence Lab at the Georgia Institute of Technology in Atlanta, got humans to play the computer game Frogger and explain their tactics as they went. He recorded this data alongside code describing the game state at that time and then trained a neural network on both. When he wired this network to another designed to play the game, he created an AI that could rationalize its actions as it played the game.

Be Part of the Future

100% Free. No Spam. Unsubscribe any time.

While research into explainable AI is still in its infancy, a recent directive from the EU may give the field an added sense of urgency. The General Data Protection Regulation (GDPR), due to take effect next year, will effectively create a "right to explanation,” which will allow citizens to demand the reasoning behind an algorithmic decision made about them.

As Accenture analysts note in a blog post, there is debate about the extent of this new right, but they still recommend that companies embrace explainable AI to future-proof their businesses against regulators.

There’s also likely to be big money in the field.

Finance giant Capital One is conducting research into how to get machine learning algorithms to explain their decisions, and the US Defense Advanced Research Projects Agency (DARPA) is funding 13 different research groups working on the problem. That includes a group from Oregon State University that plans to analyze neural networks with a second neural net to identify what neural activity influences particular decisions.

But Google’s director of research, Peter Norvig, recently questioned how useful these approaches could ultimately be. He said that even with humans, cognitive psychologists have found that when you ask someone to explain their decision they often make sense of their actions after the fact in ways that may or may not be linked to the actual decision-making process.

“So we might end up being in the same place with machine learning where we train one system to get an answer and then we train another system to say, given the input of this first system, now it’s your job to generate an explanation,” he said at an event in Sydney.

Instead, he said, it may be more useful to look at the output of these algorithms over time to identify bias and error. The question, then, is whose responsibility that would be: overstretched public bodies and academics or companies with a vested interest in protecting the reputation of their AI capabilities.

In reality, it will probably require a combination of the two. AI developers will need to find ways to explain the decisions made by their creations, but we can’t just take their word for it. There will also need to be careful monitoring of how those decisions impact people’s lives.

Stock Media provided by chrisroll / Pond5

Edd Gent

Edd is a freelance science and technology writer based in Bangalore, India. His main areas of interest are engineering, computing, and biology, with a particular focus on the intersections between the three.

Sparks of Genius to Flashes of Idiocy: How to Solve AI’s ‘Jagged Intelligence’ Problem

Vinay Chaudhri

Feb 27, 2026

A box is opening with a smoking gold circular-object inside.

Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It

Edd Gent

Feb 23, 2026

What the Rise of AI Scientists May Mean for Human Research

Claudia López Lloreda

Feb 20, 2026

Artificial Intelligence

Sparks of Genius to Flashes of Idiocy: How to Solve AI’s ‘Jagged Intelligence’ Problem

Vinay Chaudhri

Feb 27, 2026

Artificial Intelligence

Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It

Edd Gent

Feb 23, 2026

Future

What the Rise of AI Scientists May Mean for Human Research

Claudia López Lloreda

Feb 20, 2026

What we’re reading