At an MIT event in March, OpenAI cofounder and CEO Sam Altman said his team wasn’t yet training its next AI, GPT-5. “We are not and won’t for some time,” he told the audience.
This week, however, new details about GPT-5’s status emerged.
In an interview, Altman told the Financial Times the company is now working to develop GPT-5. Though the article did not specify whether the model is in training—it likely isn’t—Altman did say it would need more data. The data would come from public online sources—which is how such algorithms, called large language models, have previously been trained—and proprietary private datasets.
This lines up with OpenAI’s call last week for organizations to collaborate on private datasets as well as prior work to acquire valuable content from major publishers like the Associated Press and News Corp. In a blog post, the team said they want to partner on text, images, audio, or video but are especially interested in “long-form writing or conversations rather than disconnected snippets” that express “human intention.”
It’s no surprise OpenAI is looking to tap higher quality sources not available publicly. AI’s extreme data needs are a sticking point in its development. The rise of the large language models behind chatbots like ChatGPT was driven by ever-bigger algorithms consuming more data. Of the two, it’s possible even more data that’s higher quality can yield greater near-term results. Recent research suggests smaller models fed larger amounts of data perform as well as or better than larger models fed less.
“The trouble is that, like other high-end human cultural products, good prose ranks among the most difficult things to produce in the known universe,” Ross Andersen wrote in The Atlantic this year. “It is not in infinite supply, and for AI, not any old text will do: Large language models trained on books are much better writers than those trained on huge batches of social-media posts.”
After scraping much of the internet to train GPT-4, it seems the low-hanging fruit has largely been picked. A team of researchers estimated last year the supply of publicly accessible, high-quality online data would run out by 2026. One way around this, at least in the near term, is to make deals with the owners of private information hoards.
Computing is another roadblock Altman addressed in the interview.
Foundation models like OpenAI’s GPT-4 require vast supplies of graphics processing units (GPUs), a type of specialized computer chip widely used to train and run AI. Chipmaker Nvidia is the leading supplier of GPUs, and after the launch of ChatGPT, its chips have been the hottest commodity in tech. Altman said they recently took delivery of a batch of the company’s latest H100 chips, and he expects supply to loosen up even more in 2024.
In addition to greater availability, the new chips appear to be speedier too.
In tests released this week by AI benchmarking organization MLPerf, the chips trained large language models nearly three times faster than the mark set just five months ago. (Since MLPerf first began benchmarking AI chips five years ago, overall performance has improved by a factor of 49.)
Reading between the lines—which has become more challenging as the industry has grown less transparent—the GPT-5 work Altman is alluding to is likely more about assembling the necessary ingredients than training the algorithm itself. The company is working to secure funding from investors—GPT-4 cost over $100 million to train—chips from Nvidia, and quality data from wherever they can lay their hands on it.
Altman didn’t commit to a timeline for GPT-5’s release, but even if training began soon, the algorithm wouldn’t see the light of day for a while. Depending on its size and design, training could take weeks or months. Then the raw algorithm would have to be stress tested and fine-tuned by lots of people to make it safe. It took the company eight months to polish and release GPT-4 after training. And though the competitive landscape is more intense now, it’s also worth noting GPT-4 arrived almost three years after GPT-3.
But it’s best not to get too caught up in version numbers. OpenAI is still pressing forward aggressively with its current technology. Two weeks ago, at its first developer conference, the company launched custom chatbots, called GPTs, as well as GPT-4 Turbo. The enhanced algorithm includes more up-to-date information—extending the cutoff from September 2021 to April 2023—can work with much longer prompts, and is cheaper for developers.
And competitors are hot on OpenAI’s heels. Google DeepMind is currently working on its next AI algorithm, Gemini, and big tech is investing heavily in other leading startups, like Anthropic, Character.AI, and Inflection AI. All this action has governments eyeing regulations they hope can reduce near-term risks posed by algorithmic bias, privacy concerns, and violation of intellectual property rights, as well as make future algorithms safer.
In the longer term, however, it’s not clear if the shortcomings associated with large language models can be solved with more data and bigger algorithms or will require new breakthroughs. In a September profile, Wired’s Steven Levy wrote OpenAI isn’t yet sure what would make for “an exponentially powerful improvement” on GPT-4.
“The biggest thing we’re missing is coming up with new ideas,” Greg Brockman, president at OpenAI, told Levy, “It’s nice to have something that could be a virtual assistant. But that’s not the dream. The dream is to help us solve problems we can’t.”
It was Google’s 2017 invention of transformers that brought the current moment in AI. For several years, researchers made their algorithms bigger, fed them more data, and this scaling yielded almost automatic, often surprising boosts to performance.
But at the MIT event in March, Altman said he thought the age of scaling was over and researchers would find other ways to make the algorithms better. It’s possible his thinking has changed since then. It’s also possible GPT-5 will be better than GPT-4 like the latest smartphone is better than the last, and the technology enabling the next step change hasn’t been born yet. Altman doesn’t seem entirely sure either.
“Until we go train that model, it’s like a fun guessing game for us,” he told FT. “We’re trying to get better at it, because I think it’s important from a safety perspective to predict the capabilities. But I can’t tell you here’s exactly what it’s going to do that GPT-4 didn’t.”
In the meantime, it seems we’ll have more than enough to keep us busy.