AI has mastered some of the most complex games known to man, but while it often excels at competition, cooperation doesn’t come as naturally. Now an AI from Meta has mastered the game Diplomacy, which requires you to work with other players to win.
Google’s mastery of the game of Go was hailed as a major milestone for AI, but despite its undeniable complexity, it is in many ways well-suited to the cold, calculating logic of a machine. It’s a game of perfect information, where you have full visibility of your opponent’s moves, and winning simply means being able to outfox one other player.
Diplomacy, on the other hand, is a much messier affair. The board game sees up to seven players take over European military powers and use their armies to take control of strategic cities. But players are allowed to negotiate with each other to form and break alliances in pursuit of total domination.
What’s more, all players’ moves are made simultaneously at each turn, so you can’t simply react to what others do. This means that winning games requires a complex combination of strategic thinking, the ability to cooperate with other players, and persuasive negotiation skills. While AI has already mastered pure strategy, those other skills have proved much trickier to replicate.
A new AI designed by researchers at Meta may have taken a big step in that direction, though. In a paper published last week in Science, they describe a system called Cicero that ranked in the top 10 percent of players in an online Diplomacy league and achieved more than double the average score of the human players.
“Cicero is resilient, it’s ruthless, and it’s patient,” three-times Diplomacy world champion Andrew Goff said in a video produced by Meta. “It plays without a lot of the human emotion that sometimes makes you make bad decisions. It just assesses the situation and makes the best decision, not only for it, but for the people it’s working with.”
Creating Cicero required Meta researchers to combine state-of-the-art AI methods from two different sub-fields: strategic reasoning and natural language processing. At its heart, the system has a planning algorithm that predicts other players’ moves and uses this to determine its own strategy. This algorithm was trained by getting the AI to play itself over and over again, while also trying to mimic the way humans play the game.
The researchers had already shown that this planning module alone was able to beat human pros in a simplified version of the game. But in this latest research, the team combined it with a large language model trained on vast amounts of text from the internet, and then fine-tuned using dialogue from 40,000 online games of Diplomacy. This gave the upgraded Cicero the ability to both interpret messages from other players and also craft its own messages to persuade them to work together.
The combined system starts by using the current state of the board and past dialogue to predict what each player is likely to do. It then comes up with a plan of action for both itself and its partners before generating messages designed to outline its intent and ensure the cooperation of other players.
Over 40 games in the online tournament, Cicero effectively communicated with 82 other players to explain its intentions, coordinate actions, and negotiate alliances. Crucially, the researchers say they saw no evidence from in-game messages that human players suspected they were teaming up with an AI.
However, the model’s communicative abilities weren’t flawless. It is more than capable of spitting out nonsensical messages or ones inconsistent with its goals, so the researchers had to generate multiple candidate messages at each move and then use various filtering mechanisms to weed out the garbage. And even then, the researchers admit that illogical messages sometimes slipped through.
This suggests that the language model at the heart of Cicero still doesn’t really understand what is going on and is simply producing plausible-sounding messages that then need to be vetted to make sure they achieve the desired results.
Writing in The Conversation, AI researcher Toby Walsh at the University of New South Wales in Australia also notes that Cicero is unerringly honest, unlike most human players. While this is a surprisingly effective strategy, it could be a major weakness if competitors work out that their opponent is never going to try and deceive them.
The advance is a significant one, nonetheless, and Facebook hopes it could have applications far beyond board games. In a blog post, the researchers say the ability to use planning algorithms to control language generation could make it possible to have much longer and richer conversations with AI chatbots or create video game characters who can adapt to a player’s behavior.
Image Credit: MabelAmber / 4008 images