AI Is Easy to Fool—Why That Needs to Change

Edd Gent

Oct 10, 2017

magician-card-trick-ace-hearts-tricking-artificial-intelligence

Con artistry is one of the world’s oldest and most innovative professions, and it may soon have a new target. Research suggests artificial intelligence may be uniquely susceptible to tricksters, and as its influence in the modern world grows, attacks against it are likely to become more common.

The root of the problem lies in the fact that artificial intelligence algorithms learn about the world in very different ways than people do, and so slight tweaks to the data fed into these algorithms can throw them off completely while remaining imperceptible to humans.

Much of the research into this area has been conducted on image recognition systems, in particular those relying on deep learning neural networks. These systems are trained by showing them thousands of examples of images of a particular object until they can extract common features that allow them to accurately spot the object in new images.

But the features they extract are not necessarily the same high-level features a human would be looking for, like the word STOP on a sign or a tail on a dog. These systems analyze images at the individual pixel level to detect patterns shared between examples. These patterns can be obscure combinations of pixel values, in small pockets or spread across the image, that would be impossible to discern for a human, but highly accurate at predicting a particular object.

"An attacker can trick the object recognition algorithm into seeing something that isn’t there, without these alterations being obvious to a human."

What this means is that by identifying these patterns and overlaying them over a different image, an attacker can trick the object recognition algorithm into seeing something that isn’t there, without these alterations being obvious to a human. This kind of manipulation is known as an “adversarial attack.”

Early attempts to trick image recognition systems this way required access to the algorithm’s inner workings to decipher these patterns. But in 2016 researchers demonstrated a “black box” attack that enabled them to trick such a system without knowing its inner workings.

By feeding the system doctored images and seeing how it classified them, they were able to work out what it was focusing on and therefore generate images they knew would fool it. Importantly, the doctored images were not obviously different to human eyes.

These approaches were tested by feeding doctored image data directly into the algorithm, but more recently, similar approaches have been applied in the real world. Last year it was shown that printouts of doctored images that were then photographed on a smartphone successfully tricked an image classification system.

Another group showed that wearing specially designed, psychedelically-colored spectacles could trick a facial recognition system into thinking people were celebrities. In August scientists showed that adding stickers to stop signs in particular configurations could cause a neural net designed to spot them to misclassify the signs.

These last two examples highlight some of the potential nefarious applications for this technology. Getting a self-driving car to miss a stop sign could cause an accident, either for insurance fraud or to do someone harm. If facial recognition becomes increasingly popular for biometric security applications, being able to pose as someone else could be very useful to a con artist.

Unsurprisingly, there are already efforts to counteract the threat of adversarial attacks. In particular, it has been shown that deep neural networks can be trained to detect adversarial images. One study from the Bosch Center for AI demonstrated such a detector, an adversarial attack that fools the detector, and a training regime for the detector that nullifies the attack, hinting at the kind of arms race we are likely to see in the future.

While image recognition systems provide an easy-to-visualize demonstration, they’re not the only machine learning systems at risk. The techniques used to perturb pixel data can be applied to other kinds of data too.

"Bypassing cybersecurity defenses is one of the more worrying and probable near-term applications for this approach."

Be Part of the Future

100% Free. No Spam. Unsubscribe any time.

Chinese researchers showed that adding specific words to a sentence or misspelling a word can completely throw off machine learning systems designed to analyze what a passage of text is about. Another group demonstrated that garbled sounds played over speakers could make a smartphone running the Google Now voice command system visit a particular web address, which could be used to download malware.

This last example points toward one of the more worrying and probable near-term applications for this approach: bypassing cybersecurity defenses. The industry is increasingly using machine learning and data analytics to identify malware and detect intrusions, but these systems are also highly susceptible to trickery.

At this summer’s DEF CON hacking convention, a security firm demonstrated they could bypass anti-malware AI using a similar approach to the earlier black box attack on the image classifier, but super-powered with an AI of their own.

Their system fed malicious code to the antivirus software and then noted the score it was given. It then used genetic algorithms to iteratively tweak the code until it was able to bypass the defenses while maintaining its function.

All the approaches noted so far are focused on tricking pre-trained machine learning systems, but another approach of major concern to the cybersecurity industry is that of “data poisoning .” This is the idea that introducing false data into a machine learning system’s training set will cause it to start misclassifying things.

This could be particularly challenging for things like anti-malware systems that are constantly being updated to take into account new viruses. A related approach bombards systems with data designed to generate false positives so the defenders recalibrate their systems in a way that then allows the attackers to sneak in.

How likely it is that these approaches will be used in the wild will depend on the potential reward and the sophistication of the attackers. Most of the techniques described above require high levels of domain expertise, but it’s becoming ever easier to access training materials and tools for machine learning.

Simpler versions of machine learning have been at the heart of email spam filters for years, and spammers have developed a host of innovative workarounds to circumvent them. As machine learning and AI increasingly embed themselves in our lives, the rewards for learning how to trick them will likely outweigh the costs.

Image Credit: Nejron Photo / Shutterstock.com

Edd Gent

Edd is a freelance science and technology writer based in Bangalore, India. His main areas of interest are engineering, computing, and biology, with a particular focus on the intersections between the three.

A sea-based data center prototype made by startup Panthalassa

In the Scramble to Power AI, Investors Bet $140 Million on Data Centers at Sea

Edd Gent

May 11, 2026

A smartphone with the message "What can I help you with" on its display

You Probably Wouldn’t Notice if a Chatbot Slipped Ads Into Its Responses

Brian Jay Tang

and

Kang G. Shin

May 08, 2026

A doctor with a stethoscope and white coat consults a phone

An AI Just Beat Doctors at Diagnosing ER Patients

Shelly Fan

May 04, 2026

Energy

In the Scramble to Power AI, Investors Bet $140 Million on Data Centers at Sea

Edd Gent

May 11, 2026

Future

You Probably Wouldn’t Notice if a Chatbot Slipped Ads Into Its Responses

Brian Jay Tang

and

Kang G. Shin

May 08, 2026

Artificial Intelligence

An AI Just Beat Doctors at Diagnosing ER Patients

Shelly Fan

May 04, 2026

What we’re reading