Proteins are biology’s molecular machines. They’re our bodies’ construction workers—making muscle, bone, and brain; regulators—keeping systems in check; and local internet—responsible for the transmission of information between cells and regions. In a word, proteins are crucial to our survival. When they work, we’re healthy. When they don’t, we aren’t.
Which is why recent leaps in our understanding of protein structure and the emerging ability to design entirely new proteins from scratch, mediated by AI, is such a huge development. It’s why three computer scientists won Nobel prizes in chemistry this year for their work in the field.
Things are by no means standing still. 2024 was another winning year for AI protein design.
Earlier this year, scientists expanded AI’s ability to model how proteins bind to other biomolecules, such as DNA, RNA, and the small molecules that regulate their shape and function. The study broadened the scope of RoseTTAFold, a popular AI tool for protein design, so that it could map out complex protein-based molecular machines at the atomic level—in turn, paving the way for more sophisticated therapies.
DeepMind soon followed with the release of AlphaFold3, an AI model that also predicts protein interactions with other molecules. Now available to researchers, the sophisticated AI tool will likely lead to a flood of innovations, therapeutics, and insights into biological processes.
Meanwhile, protein design went flexible this year. AI models generated “effector” proteins that could shape-shift in the presence of a molecular switch. This flip-flop structure altered their biological impact on cells. A subset of these morphed into a variety of arrangements, including cage-like structures that could encapsulate and deliver medicines like tiny spaceships.
They’re novel, but do any AI-designed proteins actually work? Yes, according to several studies.
One used AI to dream up a universe of potential CRISPR gene editors. Inspired by large language models—like those that gave birth to ChatGPT—the AI model in the study eventually designed a gene editing system as accurate as existing CRISPR-based tools when tested on cells. Another AI designed circle-shaped proteins that reliably turned stem cells into different blood vessel cell types. Other AI-generated proteins directed protein “junk” into the lysosome, a waste treatment blob filled with acid inside cells that keeps them neat and tidy.
Outside of medicine, AI designed mineral-forming proteins that, if integrated into aquatic microbes, could potentially soak up excess carbon and transform it into limestone. While still early, the technology could tackle climate change with a carbon sink that lasts millions of years.
It seems imagination is the only limit to AI-based protein design. But there are still a few cases that AI can’t yet fully handle. Nature has a comprehensive list, but these stand out.
Back to Basics: Binders
When proteins interact with each other, binder molecules can increase or break apart those interactions. These molecules initially caught the eyes of protein designers because they can serve as drugs that block damaging cellular responses or boost useful ones.
There have been successes. Generative AI models, such as RFdiffusion, can readily model binders, especially for free-floating proteins inside cells. These proteins coordinate much of the cell’s internal signaling, including signals that trigger senescence or cancer. Binders that break the chain of communication could potentially halt the processes. They can also be developed into diagnostic tools. In one example, scientists engineered a glow-in-the-dark tag to monitor a cell’s status, detecting the presence of a hormone when the binder grabbed onto it.
But binders remain hard to develop. They need to interact with key regions on proteins. But because proteins are dynamic 3D structures that twist and turn, it’s often tough to nail down which regions are crucial for binders to latch onto.
Then there’s the data problem. Thanks to hundreds of thousands of protein structures available in public databases, generative AI models can learn to predict protein-protein interactions. Binders, by contrast, are often kept secret by pharmaceutical companies—each organization has an in-house database cataloging how small molecules interact with proteins.
Several teams are now using AI to design simple binders for research. But experts stress these need to be tested in living organisms. AI can’t yet predict the biological consequences of a binder—it could either boost a process or shut it down. Then there’s the problem of hallucination, where an AI model dreams up binders that are completely unrealistic.
From here, the goal is to gather more and better data on how proteins grab onto molecules, and perhaps add a dose of their underlying biophysics.
Designing New Enzymes
Enzymes are proteins that catalyze life. They break down or construct new molecules, allowing us to digest food, build up our bodies, and maintain healthy brains. Synthetic enzymes can do even more, like sucking carbon dioxide from the atmosphere or breaking down plastic waste.
But designer enzymes are still tough to build. Most models are trained on natural enzymes, but biological function doesn’t always rely on the same structure to do the same thing. Enzymes that look vastly different can perform similar chemical reactions. AI evaluates structure, not function—meaning we’ll need to better understand how one leads to the other.
Like binders, enzymes also have “hotspots.” Scientists are racing to hunt these down with machine learning. There are early signs AI can design hotspots on new enzymes, but they still need to be heavily vetted. An active hotspot usually requires a good bit of scaffolding to work properly—without which it may not be able to grab its target or, if it does, let it go.
Enzymes are a tough nut to crack especially because they’re in motion. For now, AI struggles to model their transformations. This is, as it turns out, a challenge for the field at large.
Shape-Shifting Headaches
AI models are trained on static protein structures. These snapshots have been hard won with decades of work, in which scientists freeze a protein in time to image its structure. But these images only capture a protein’s most stable shape, rather than its shape in motion—like when a protein grabs onto a binder or when an enzyme twists to fit into a protein nook.
For AI to truly “understand” proteins, researchers will have to train models on the changing structures as proteins shapeshift. Biophysics can help model a protein’s twists and turns, but it’s extremely difficult. Scientists are now generating libraries of synthetic and natural proteins and gradually mutating each to see how simple changes alter their structures and flexibility.
Adding a bit of “randomness” to how an AI model generates new structures could also help. AF-Cluster, built on AlphaFold2, injected bits of uncertainty into its neural network processes when predicting a known shape-shifting protein and did well on multiple structures.
Protein prediction is a competitive race. But teams will likely need to work together too. Building a collaborative infrastructure for the rapid sharing of data could speed efforts. Adding so-called “negative data,” such as when AI-designed proteins or binders are toxic in cells, could also guide other protein designers. A harder problem is that verifying AI-designed proteins could take years—when the underlying algorithm has already been updated.
Regardless, there’s no doubt AI is speeding protein design. Let’s see what next year has to offer.
Image Credit: Baker Lab