In late 2020, AI pioneer DeepMind achieved a breakthrough 50 years in the making. By predicting the shape of proteins with atomic accuracy, its deep learning algorithm, AlphaFold, all but solved one of biology’s grand challenges.
From metabolism to brain function, proteins are the molecules that make our bodies go. When they go wrong, things break down, and we suffer. Much of modern medicine focuses on this aspect of disease: Identifying a dysfunctional protein culprit and modifying its behavior with another molecule specially selected to interact with it—a drug.
Thing is, proteins are extremely complex. Made up of hundreds or thousands of molecular building blocks called amino acids, they form long ribbon-like chains that fold in on themselves in nuanced ways. Nestled within these folds are active sites that give the protein its function by connecting with other proteins or catalyzing chemical reactions.
Designing effective drugs depends on predicting a protein’s shape, its functional sites, and identifying another protein or molecule that can dock to them.
AlphaFold, AlphFold 2, and an algorithm called RoseTTAFold, developed by Baker Lab at the University of Washington, took crucial steps in accelerating this process. By mid-2022, DeepMind said AlphaFold 2 had predicted the structure of 200 million proteins—nearly all those known—and offered them up in an open database.
But it didn’t end there. The creation of protein structures has since taken center stage. These newer algorithms are in the same family as DALL-E and GPT-4—the algorithm behind ChatGPT—only instead of generating images or written passages, they generate novel proteins.
Baker Lab, in particular, has been building on RoseTTAFold to design proteins. This summer, in a paper published in Nature, the team said their latest algorithm, RFdiffusion, was speedier and more accurate. The algorithm can generate a 100-amino-acid protein in 11 seconds on an Nvidia chip, compared to 8.5 minutes with an older algorithm. RFdiffusion is also roughly 100 times more effective at generating new proteins that bind strongly to sites of interest on known proteins.
“In a manner reminiscent of the generation of images from text prompts, RFdiffusion makes possible, with minimal specialist knowledge, the generation of functional proteins from minimal molecular specifications,” the team wrote in the July paper.
All this can be hard to visualize. There’s no substitute for seeing these algorithms in action. The reason ChatGPT was a viral hit was less about it being a zero-to-one breakthrough—the tech had been growing more sophisticated for several years—and more that it was a simple portal through which we could all experience that sophistication directly.
Luckily, here, we have a visual to hammer the point home. The video below, credited to Ian C. Haydon and the University of Washington Institute for Protein Design, shows RFdiffusion at work, designing a protein for a specific site on an insulin receptor in seconds.
Watch this #AI design a protein in seconds.
Learn more: https://t.co/7oYxpmjW4r @NewsfromScience pic.twitter.com/iPvquos8uA
— Science Magazine (@ScienceMagazine) July 24, 2023
Of course, there’s much more work to be done—designing effective new drugs is a difficult, years-long process—but it’s clear that AI tools continue to make quick progress in biotechnology.
Image Credit: Baker Lab/University of Washington