Not So Mysterious After All: Researchers Show How to Crack AI’s Black Box

Edd Gent

Oct 25, 2021

The deep learning neural networks at the heart of modern artificial intelligence are often described as “black boxes” whose inner workings are inscrutable. But new research calls that idea into question, with significant implications for privacy.

Unlike traditional software whose functions are predetermined by a developer, neural networks learn how to process or analyze data by training on examples. They do this by continually adjusting the strength of the links between their many neurons.

By the end of this process, the way they make decisions is tied up in a tangled network of connections that can be impossible to follow. As a result, it’s often assumed that even if you have access to the model itself, it’s more or less impossible to work out the data that the system was trained on.

But a pair of recent papers have brought this assumption into question, according to MIT Technology Review, by showing that two very different techniques can be used to identify the data a model was trained on. This could have serious implications for AI systems trained on sensitive information like health records or financial data.

The first approach takes aim at generative adversarial networks (GANs), the AI systems behind deepfake images. These systems are increasingly being used to create synthetic faces that are supposedly completely unrelated to real people.

But researchers from the University of Caen Normandy in France showed that they could easily link generated faces from a popular model to real people whose data had been used to train the GAN. They did this by getting a second facial recognition model to compare the generated faces against training samples to spot if they shared the same identity.

The images aren’t an exact match, as the GAN has modified them, but the researchers found multiple examples where generated faces were clearly linked to images in the training data. In a paper describing the research, they point out that in many cases the generated face is simply the original face in a different pose.

While the approach is specific to face-generation GANs, the researchers point out that similar ideas could be applied to things like biometric data or medical images. Another, more general approach to reverse engineering neural nets could do that straight off the bat, though.

A group from Nvidia has shown that they can infer the data the model was trained on without even seeing any examples of the trained data. They used an approach called model inversion, which effectively runs the neural net in reverse. This technique is often used to analyze neural networks, but using it to recover the input data had only been achieved on simple networks under very specific sets of assumptions.

In a recent paper, the researchers described how they were able to scale the approach to large networks by splitting the problem up and carrying out inversions on each of the networks’ layers separately. With this approach, they were able to recreate training data images using nothing but the models themselves.

Be Part of the Future

100% Free. No Spam. Unsubscribe any time.

While carrying out either attack is a complex process that requires intimate access to the model in question, both highlight the fact that AIs may not be the black boxes we thought they were, and determined attackers could extract potentially sensitive information from them.

Given that it’s becoming increasingly easy to reverse engineer someone else’s model using your own AI, the requirement to have access to the neural network isn’t even that big of a barrier.

The problem isn’t restricted to image-based algorithms. Last year, researchers from a consortium of tech companies and universities showed that they could extract news headlines, JavaScript code, and personally identifiable information from the large language model GPT-2.

These issues are only going to become more pressing as AI systems push their way into sensitive areas like health, finance, and defense. There are some solutions on the horizon, such as differential privacy, where models are trained on the statistical features of aggregated data rather than individual data points, or homomorphic encryption, an emerging paradigm that makes it possible to compute directly on encrypted data.

But these approaches are still a long way from being standard practice, so for the time being, entrusting your data to the black box of AI may not be as safe as you think.

Image Credit: Connect world / Shutterstock.com

Edd Gent

Edd is a freelance science and technology writer based in Bangalore, India. His main areas of interest are engineering, computing, and biology, with a particular focus on the intersections between the three.

Sparks of Genius to Flashes of Idiocy: How to Solve AI’s ‘Jagged Intelligence’ Problem

Vinay Chaudhri

Feb 27, 2026

A box is opening with a smoking gold circular-object inside.

Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It

Edd Gent

Feb 23, 2026

What the Rise of AI Scientists May Mean for Human Research

Claudia López Lloreda

Feb 20, 2026

Artificial Intelligence

Sparks of Genius to Flashes of Idiocy: How to Solve AI’s ‘Jagged Intelligence’ Problem

Vinay Chaudhri

Feb 27, 2026

Artificial Intelligence

Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It

Edd Gent

Feb 23, 2026

Future

What the Rise of AI Scientists May Mean for Human Research

Claudia López Lloreda

Feb 20, 2026

What we’re reading