As generative AI models grow more powerful, their energy use is becoming a serious bottleneck. A new fully optical generative AI chip could help by running advanced image and video generation tasks at speeds and efficiencies orders of magnitude beyond today's hardware.

Training generative AI models requires an enormous amount of computing power and energy. But as demand explodes, the process of actually running the models to create images, text, or video—known as inference—is quickly becoming an even bigger drain on resources.

Video and image generation models are particularly energy intensive. While the efficiency of these models is constantly improving, a 2023 study found that generating 1,000 images using a leading model produced carbon emissions equivalent to driving a gas-powered car more than four miles.

One promising approach for slashing energy use is photonic computing, where processors use light instead of electricity. It’s a tactic multiple well-funded startups are pursuing in earnest. But most advances have been limited to simpler tasks like image classification or text generation.

Now, researchers from Shanghai Jiao Tong University and Tsinghua University in China have demonstrated an all-optical chip they call LightGen that is more than 100 times faster and more energy efficient than a leading Nvidia GPU on tasks like video and image generation.

"LightGen provides a new way to bridge the new chip architectures to daily complicated AI without impairment of performance and with speed and efficiency that are orders of magnitude greater,” the researchers write in a recent paper on the chip in Science.

A key aspect of the new design is its density. Generative models typically require millions of parameters to produce high-quality outputs, but previous photonic chips have had, at most, a few thousand artificial neurons. Using 3D packaging, however, LightGen integrates more than two million onto a device measuring just a quarter of a square inch.

The resulting processing boost allows the chip to work with images at resolutions up to 512-by-512 pixels. Older photonic chips typically broke up high-resolution images into smaller patches to process them. This not only takes longer but also reduces a model’s ability to draw statistical correlations between the different patches.

The researchers also innovated something called an "optical latent space." Generative AI models work, in part, by compressing high-dimensional data into simpler representations. This forces them to remove less important information and only retain the bits that are integral to the input.

These condensed representations are then stored in a multi-dimensional map of concepts called a latent space. Models use these representations to generate new outputs when given a prompt.