The Poker-Playing AI That Beat the World’s Best Players

Edd Gent

5 years ago

Casino poker chips on green felt background artificial intelligence

Poker is a powerful combination of strategy and intuition, something that’s made it the most iconic of card games and devilishly difficult for machines to master. Now an AI built by Facebook and Carnegie Mellon University has managed to beat top professionals in a multiplayer version of the game for the first time.

Games have proven a popular test-bed for AI in recent years, and when Google’s AlphaGo cracked the ancient Chinese board game Go it was a watershed moment for the field. But most of the games AI has been tested on have been so-called “perfect information” games.

As complex as Go is, you can see where all your opponent’s pieces are, and it’s theoretically possible to map out every possible future sequence of moves based on the current configuration of pieces on the board. In poker your opponent’s hand remains hidden, which makes it much harder to predict what kind of moves they might make.

Despite this, poker-playing AI (including a system developed by the same team called Libratus) has already mastered two-player, “no-limit” poker, where bets have no upper bound—something that adds to the complexity. The most popular form of poker, though, isn’t a head-to-head contest—it’s against a full table of players, which has so far been beyond the scope of AI.

Now, though, researchers have developed an AI that was able to best a host of pro players at six-player no-limit Texas hold’em. The breakthrough is a big win for game-playing AI, but the technology at the system’s heart could have applications for everything from military planning to cyber-security.
“Thus far, superhuman AI milestones in strategic reasoning have been limited to two-party competition,” Tuomas Sandholm, a CMU professor of computer science who led the design of the system, said in a press release. “The ability to beat five other players in such a complicated game opens up new opportunities to use AI to solve a wide variety of real-world problems.”

Nicknamed Pluribus, the system described in a new paper in Science relied on a tried and tested tactic for game-playing AI. It first took on six copies of itself in a series of practice games to build up a “blueprint” strategy of how to play the game. After the first round of betting on each hand, though, the complexity of the problem increases, and so it uses a search algorithm to look ahead to predict what other players might do.

While the approach is common in many game-playing AI, systems typically plan out alternative futures all the way to the end of the game. With five opponents and so much hidden information, that simply isn’t practical.

So the researchers devised a more efficient approach that only looked a few moves ahead and considered four potential strategies for each opponent and itself: the blueprint the system had learned and three modifications to that blueprint that bias the player towards folding, calling, or raising.

They found this new approach was more than enough to outplay some of the world’s best poker players. First the team got Darren Elias, who holds the record for most World Poker Tour titles, and Chris “Jesus” Ferguson, winner of six World Series of Poker events, to play against 5 copies of Pluribus over 5,000 hands.

Then it went head to head with 13 top pros, all of whom have won more than $1 million playing poker, playing solo against five humans over 10,000 hands. In both competition formats it emerged victorious.

In the CMU press release Elias said the machine’s major strength was its ability to use mixed strategies. “That’s the same thing that humans try to do,” he said. “It’s a matter of execution for humans—to do this in a perfectly random way and to do so consistently. Most people just can’t.”

One of the most significant breakthroughs is the computational efficiency of the new approach. Learning the blueprint took 8 days on a 64-core server, which works out to 12,400 CPU core hours. In contrast, their previous Libratus system took 15 million core hours to train.

Even after training, game-playing AI typically need to be executed on a supercomputer. Libratus needed 100 CPUs, and AlphaGo used a whopping 1,920 CPUs and 280 GPUs during matches. Pluribus was able to run on just two CPUs.

While beating humans at poker competitions is certainly one way to make money, Sandholm has already spun off two companies to make practical use of the technology at the heart of Libratus and Pluribus.

In 2018 he founded a startup called Strategy Robot that has received a $10 million contract from the US army and aims to adapt the AI for strategic planning and military simulations. Sandholm has also started a second startup called Strategic Machine that will bring the same technology to bear on problems in gaming, business, and medicine.

Image Credit: rawf8 / Shutterstock.com