AI hates uncertainty. Yet to navigate our unpredictable world, it needs to learn to make choices with imperfect information—as we do every single day.
DeepMind just took a stab at solving this conundrum. The trick was to interweave game theory into an algorithmic strategy loosely based on the human brain called deep reinforcement learning. The result, DeepNash, toppled human experts in a highly strategic board game called Stratego. A notoriously difficult game for AI, Stratego requires multiple strengths of human wit: long-term thinking, bluffing, and strategizing, all without knowing your opponent’s pieces on the board.
“Unlike chess and Go, Stratego is a game of imperfect information: players cannot directly observe the identities of their opponent’s pieces,” DeepMind wrote in a blog post. With DeepNash, “game-playing artificial intelligence (AI) systems have advanced to a new frontier.”
It’s not all fun and games. AI systems that can easily maneuver the randomness of our world and adjust their “behavior” accordingly could one day handle real-world problems with limited information, such as optimizing traffic flow to reduce travel time and (hopefully) quenching road rage as self-driving cars become ever more present.
“If you’re making a self-driving car, you don’t want to assume that all the other drivers on the road are perfectly rational, and going to behave optimally,” said Dr. Noam Brown at Meta AI, who wasn’t involved in the research.
DeepNash’s triumph comes hot on the heels of another AI advance this month, where an algorithm learned to play Diplomacy—a game that requires negotiation and cooperation to win. As AI gains more flexible reasoning, becomes more generalized, and learns to navigate social situations, it may also spark insights into our own brains’ neural processes and cognition.
Meet Stratego
In terms of complexity, Stratego is a completely different beast compared to chess, Go, or poker—all games that AI has previously mastered.
The game is essentially capture the flag. Each side has 40 pieces they can place at any position on the board. Each piece has a different name and numerical rank, such as “marshal,” “general,” “scout,” or “spy.” Higher ranking pieces can capture lower ones. The goal is to eliminate the opposition and capture their flag.
Stratego is especially challenging for AI because players can’t see the location of their opponents’ pieces, both during initial setup and throughout gameplay. Unlike chess or Go, in which each piece and movement is in view, Stratego is a game with limited information. Players must “balance all possible outcomes” any time they make a decision, the authors explained.
This level of uncertainty is partly why Stratego has stumped AI for ages. Even the most successful game-play algorithms, such as AlphaGo and AlphaZero, rely on complete information. Stratego, in contrast, has a touch of Texas Hold ’em, a poker game DeepMind previously conquered with an algorithm. But that strategy faltered for Stratego, largely because of the length of game, which unlike poker, normally encompasses hundreds of moves.
The number of potential game plays is mind-blowing. Chess has one starting position. Stratego has over 1066 possible starting positions—far more than all the stars in the universe. Stratego’s game tree, the sum of all potential moves in the game, totals a staggering 10535.
“The sheer complexity of the number of possible outcomes in Stratego means algorithms that perform well on perfect-information games, and even those that work for poker, don’t work,” said study author Dr. Julien Perolat at DeepMind. The challenge is “what excited us,” he said.
A Beautiful Mind
Stratego’s complexity means that the usual strategy for searching gameplay moves is out of the question. Dubbed the Monte Carlo tree search, a “stalwart approach to AI-based gaming,” the technique plots out potential routes—like branches on a tree—that could result in victory.
Instead, the magic touch for DeepNash came from the mathematician John Nash, portrayed in the film A Beautiful Mind. A pioneer in game theory, Nash won the Nobel Prize for his work for the Nash equilibrium. Put simply, in each game, players can tap into a set of strategies followed by everyone, so that no single player gains anything by changing their own strategy. In Statego, this brings about a zero-sum game: any gain a player makes results in a loss for their opponent.
Because of Stratego’s complexity, DeepNash took a model-free approach to their algorithm. Here, the AI isn’t trying to precisely model its opponent’s behavior. Like a baby, it has a blank slate, of sorts, to learn. This set-up is particularly useful in early stages of gameplay, “when DeepNash knows little about its opponent’s pieces,” making predictions “difficult, if not impossible,” the authors said.
The team then used deep reinforcement learning to power DeepNash, with the goal of finding the game’s Nash equilibrium. It’s a match made in heaven: reinforcement learning helps decide the best next move at every step of the game, while DeepNash provides an overall learning strategy. To evaluate the system, the team also engineered a “tutor” using knowledge from the game to filter out obvious mistakes that likely wouldn’t make real-world sense.
Practice Makes Perfect
As a first learning step, DeepNash played against itself in 5.5 billion games, a popular approach in AI training dubbed self-play.
When one side wins, the AI gets awarded, and its current artificial neural network parameters are strengthened. The other side—the same AI—receives a penalty to dampen its neural network strength. It’s like rehearsing a speech to yourself in front of a mirror. Over time, you figure out mistakes and perform better. In DeepNash’s case, it drifts towards a Nash equilibrium for best gameplay.
What about actual performance?
The team tested the algorithm against other elite Stratego bots, some of which won the Computer Stratego World Championship. DeepNash squashed its opponents with a win rate of roughly 97 percent. When unleashed against Gravon—an online platform for human players—DeepNash trounced its human opponents. After over two weeks of matches against Gravon’s players in April this year, DeepNash rose to third place in all ranked matches since 2002.
It shows that bootstrapping human play data to AI isn’t needed for DeepNash to reach human-level performance—and beat it.
The AI also exhibited some intriguing behavior with the initial setup and during gameplay. For example, rather than settling on a particular “optimized” starting position, DeepNash constantly shifted the pieces around to prevent its opponent from spotting patterns over time. During gameplay, the AI bounced between seemingly senseless moves—such as sacrificing high-ranking pieces—to locate the opponent’s even higher-ranking pieces upon counterattack.
DeepNash can also bluff. In one play, the AI moved a low-ranking piece as if it were a high-ranking one, luring the human opponent to chase after the piece with its high-ranking colonel. The AI sacrificed the pawn, but in turn, lured the opponent’s valuable spy piece into an ambush.
Although DeepNash was developed for Stratego, it’s generalizable to the real-world. The core method can potentially instruct AI to better tackle our unpredictable future using limited information—from crowd and traffic control to analyzing market turmoil.
“In creating a generalizable AI system that’s robust in the face of uncertainty, we hope to bring the problem-solving capabilities of AI further into our inherently unpredictable world,” the team said.
Image Credit: Derek Bruff / Flickr