Recent progress in AI has been startling. Barely a week’s gone by without a new algorithm, application, or implication making headlines. But OpenAI, the source of much of the hype, only recently completed their flagship algorithm, GPT-4, and according to OpenAI CEO Sam Altman, its successor, GPT-5, hasn’t begun training yet.
It’s possible the tempo will slow down in coming months, but don’t bet on it. A new AI model as capable as GPT-4, or more so, may drop sooner than later.
This week, in an interview with Will Knight, Google DeepMind CEO Demis Hassabis said their next big model, Gemini, is currently in development, “a process that will take a number of months.” Hassabis said Gemini will be a mashup drawing on AI’s greatest hits, most notably DeepMind’s AlphaGo, which employed reinforcement learning to topple a champion at Go in 2016, years before experts expected the feat.
“At a high level you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models,” Hassabis told Wired. “We also have some new innovations that are going to be pretty interesting.” All told, the new algorithm should be better at planning and problem-solving, he said.
The Era of AI Fusion
Many recent gains in AI have been thanks to ever-bigger algorithms consuming more and more data. As engineers increased the number of internal connections—or parameters—and began to train them on internet-scale data sets, model quality and capability increased like clockwork. As long as a team had the cash to buy chips and access to data, progress was nearly automatic because the structure of the algorithms, called transformers, didn’t have to change much.
Then in April, Altman said the age of big AI models was over. Training costs and computing power had skyrocketed, while gains from scaling had leveled off. “We’ll make them better in other ways,” he said, but didn’t elaborate on what those other ways would be.
GPT-4, and now Gemini, offer clues.
Last month, at Google’s I/O developer conference, CEO Sundar Pichai announced that work on Gemini was underway. He said the company was building it “from the ground up” to be multimodal—that is, trained on and able to fuse multiple types of data, like images and text—and designed for API integrations (think plugins). Now add in reinforcement learning and perhaps, as Knight speculates, other DeepMind specialties in robotics and neuroscience, and the next step in AI is beginning to look a bit like a high-tech quilt.
But Gemini won’t be the first multimodal algorithm. Nor will it be the first to use reinforcement learning or support plugins. OpenAI has integrated all of these into GPT-4 with impressive effect.
If Gemini goes that far, and no further, it may match GPT-4. What’s interesting is who’s working on the algorithm. Earlier this year, DeepMind joined forces with Google Brain. The latter invented the first transformers in 2017; the former designed AlphaGo and its successors. Mixing DeepMind’s reinforcement learning expertise into large language models may yield new abilities.
In addition, Gemini may set a high-water mark in AI without a leap in size.
GPT-4 is believed to be around a trillion parameters, and according to recent rumors, it might be a “mixture-of-experts” model made up of eight smaller models, each a fine-tuned specialist roughly the size of GPT-3. Neither the size nor architecture has been confirmed by OpenAI, who, for the first time, did not release specs on its latest model.
Similarly, DeepMind has shown interest in making smaller models that punch above their weight class (Chinchilla), and Google has experimented with mixture-of-experts (GLaM).
Gemini may be a bit bigger or smaller than GPT-4, but likely not by much.
Still, we may never learn exactly what makes Gemini tick, as increasingly competitive companies keep the details of their models under wraps. To that end, testing advanced models for ability and controllability as they’re built will become more important, work that Hassabis suggested is also critical for safety. He also said Google might open models like Gemini to outside researchers for evaluation.
“I would love to see academia have early access to these frontier models,” he said.
Whether Gemini matches or exceeds GPT-4 remains to be seen. As architectures become more complicated, gains may be less automatic. Still, it seems a fusion of data and approaches—text with images and other inputs, large language models with reinforcement learning models, the patching together of smaller models into a larger whole—may be what Altman had in mind when he said we’d make AI better in ways other than raw size.
When Can We Expect Gemini?
Hassabis was vague on an exact timeline. If he meant training wouldn’t be complete for “a number of months,” it could be a while before Gemini launches. A trained model is no longer the end point. OpenAI spent months rigorously testing and fine-tuning GPT-4 in the raw before its ultimate release. Google may be even more cautious.
But Google DeepMind is under pressure to deliver a product that sets the bar in AI, so it wouldn’t be surprising to see Gemini later this year or early next. If that’s the case, and if Gemini lives up to its billing—both big question marks—Google could, at least for the moment, reclaim the spotlight from OpenAI.
Image Credit: Hossein Nasr / Unsplash