The ability of machines to read our minds has been steadily progressing in recent years. Now, researchers have used AI video generation technology to give us a window into the mind’s eye.
The main driver behind attempts to interpret brain signals is the hope that one day we might be able to offer new windows of communication for those in comas or with various forms of paralysis. But there are also hopes that the technology could create more intuitive interfaces between humans and machines that could also have applications for healthy people.
So far, most research has focused on efforts to recreate the internal monologues of patients, using AI systems to pick out what words they are thinking of. The most promising results have also come from invasive brain implants that are unlikely to be a practical approach for most people.
Now though, researchers from the National University of Singapore and the Chinese University of Hong Kong have shown that they can combine non-invasive brain scans and AI image generation technology to create short snippets of video that are uncannily similar to clips that the subjects were watching when their brain data was collected.
The work is an extension of research the same authors published late last year, where they showed they could generate still images that roughly matched the pictures subjects had been shown. This was achieved by first training one model on large amounts of data collected using fMRI brain scanners. This model was then combined with the open-source image generation AI Stable Diffusion to create the pictures.
In a new paper published on the preprint server arXiv, the authors take a similar approach, but adapt it so that the system can interpret streams of brain data and convert them into videos rather than stills. First, they trained one model on large amounts of fMRI so that it could learn the general features of these brain scans. This was then augmented so it could process a succession of fMRI scans rather than individual ones, and then trained again on combinations of fMRI scans, the video snippets that elicited that brain activity, and text descriptions.
Separately, the researchers adapted the pre-trained Stable Diffusion model to produce video rather than still images. It was then trained again on the same videos and text descriptions that the first model had been trained on. Finally, the two models were combined and fine-tuned together on fMRI scans and their associated videos.
The resulting system was able to take fresh fMRI scans it hadn’t seen before and generate videos that broadly resembled the clips human subjects had been watching at the time. While far from a perfect match, the AI’s output was generally pretty close to the original video, accurately recreating crowd scenes or herds of horses and often matching the color palette.
To evaluate their system, the researchers used a video classifier designed to assess how well the model had understood the semantics of the scene—for instance, whether it had realized the video was of fish swimming in an aquarium or a family walking down a path—even if the imagery was slightly different. Their model scored 85 percent, which is a 45 percent improvement over the state-of-the-art.
While the videos the AI generates are still glitchy, the authors say this line of research could ultimately have applications in both basic neuroscience and also future brain-machine interfaces. However, they also acknowledge potential downsides to the technology. “Governmental regulations and efforts from research communities are required to ensure the privacy of one’s biological data and avoid any malicious usage of this technology,” they write.
That is likely a nod to concerns that the combination of AI brain scanning technology could make it possible for people to intrusively record other’s thoughts without their consent. Anxieties were also voiced earlier this year when researchers used a similar approach to essentially create a rough transcript of the voice inside peoples’ heads, though experts have pointed out that this would be impractical if not impossible for the foreseeable future.
But whether you see it as a creepy invasion of your privacy or an exciting new way to interface with technology, it seems machine mind readers are edging closer to reality.
Image Credit: Claudia Dewald from Pixabay