AI Behaving Badly: New Model Could Help AI Make More Ethical Choices

Edd Gent

Jul 06, 2020

AI ethics black background with question marks

The ethics of AI is a hot topic at the minute, particularly with the ongoing controversies around facial recognition software. Now mathematicians have developed a model that can help businesses spot when commercial AI systems might make shady choices in the pursuit of profits.

Modern AI is great at optimizing—finding the shortest route, the perfect pricing sweet spot, or the best distribution of a company’s resources. But it’s also blind to a lot of the context that a human making similar decisions would be cognizant of, particularly when it comes to ethics.

As an example, most people realize that while jacking the price of a medicine up during a health crisis would boost profits, it would also be morally indefensible. But AI has no sense of ethics, so if put in charge of pricing strategy this might seem like a promising approach.

In fact, in a recent paper in Royal Society Open Science, researchers showed that AI tasked with maximizing returns is actually disproportionately likely to pick an unethical strategy in fairly general conditions. Fortunately, they also showed it's possible to predict the circumstances in which this is likely to happen, which could guide efforts to modify AI to avoid it.

The fact that AI is likely to pick unethical strategies seems intuitive. There are plenty of unethical business practices that can reap huge rewards if you get away with them, not least because few of your competitors dare use them. There’s a reason companies often bend or even break the rules despite the reputational and regulatory backlash they could face.

Those potential repercussions should be of considerable concern to companies deploying AI solutions, though. While efforts to build ethical principles into AI are already underway, they are nascent and in many contexts there are a vast number of potential strategies to choose from. Often these systems make decisions with little or no human input and it can be hard to predict the circumstances under which they are likely to choose an unethical approach.

And in fact, the authors of the paper have proven mathematically that AI designed to maximize returns is disproportionately likely to pick an unethical strategy, something they dub the “unethical optimization principle.” Fortunately, they say it’s possible for risk managers or regulators to estimate the impact of this principle to help detect potential unethical strategies.

The key is to focus on the strategies likely to provide the biggest returns, as these are the ones the optimization process is likely to settle on. The authors recommend ranking strategies by their returns and then manually inspecting the highest-ranked ones to determine if they’re ethical or not.

Be Part of the Future

100% Free. No Spam. Unsubscribe any time.

This will not only weed out the unethical strategies most likely to be adopted, they say, but will also help develop intuition about the way the AI approaches the problem and therefore have a better understanding of where to look for other problematic strategies.

The hope is that this would make it possible to then redesign the AI to avoid these kinds of strategies. If that’s not possible, the authors recommend analyzing the strategy space to estimate how likely it is that the AI will choose unethical solutions.

What they found is that if the probability of extreme returns for a small number of strategies is high, there are statistical techniques that could help estimate the risk that the AI will choose an unethical one. But if the probability of returns is evenly distributed, then it’s highly likely the optimal strategy will be unethical, and companies shouldn’t allow the system to make decisions without human input.

Even when it’s possible to estimate the risk, the authors still say it’s unwise to put too much faith in these predictions. And they suggest it may actually be necessary to instead re-think how AI operates so that unethical strategies are automatically weeded out at the training stages.

How exactly that would happen is far from clear, so for the time being it seems like it might be a good idea to keep humans in the loop for most AI decision-making.

Image Credit: Arek Socha from Pixabay

Ethics

Edd Gent

Edd is a freelance science and technology writer based in Bangalore, India. His main areas of interest are engineering, computing, and biology, with a particular focus on the intersections between the three.

Scientists Want to Give ChatGPT an Inner Monologue to Improve Its ‘Thinking’

Ricky J. Sethi

Feb 06, 2026

Humanity’s Last Exam Stumps Top AI Models—and That’s a Good Thing

Shelly Fan

Feb 03, 2026

Paint splatters and a shadow of a person

AI Now Beats the Average Human in Tests of Creativity

Edd Gent

Jan 27, 2026

Artificial Intelligence

Scientists Want to Give ChatGPT an Inner Monologue to Improve Its ‘Thinking’

Ricky J. Sethi

Feb 06, 2026

Artificial Intelligence

Humanity’s Last Exam Stumps Top AI Models—and That’s a Good Thing

Shelly Fan

Feb 03, 2026

Artificial Intelligence

AI Now Beats the Average Human in Tests of Creativity

Edd Gent

Jan 27, 2026

What we’re reading