Designing a protein is a bit like making a cabinet. The first step is building the backbone that holds the protein together. But then comes the hard part: figuring out where to install hinges on the scaffold—that is, finding the best “hotspots”—to put on doors, shelves, and other attachments that ultimately make the cabinet fully functional.
In a way, proteins also have hotspots embedded in their structures. True to their name, “functional sites,” these intriguing nooks and crannies form intricate docks for other proteins or drugs to grab onto. The sites are central to performing most of our basic biological processes. They’re also a massive gold mine for designing new treatments and medical drugs.
The problem? Functional sites are hard to map. Scientists traditionally had to mutate suspecting areas on a protein one by one—switching one amino acid to another—to nail down precise binding spots. Like a detective screening hundreds of suspects, of which there could be many, it’s extremely tedious.
A new study in Science overthrew the whole gamebook. Led by Dr. David Baker at the University of Washington, a team tapped into an AI’s “imagination” to dream up a myriad of functional sites from scratch. It’s a machine mind’s “creativity” at its best—a deep learning algorithm that predicts the general area of a protein’s functional site, but then further sculpts the structure.
As a reality check, the team used the new software to generate drugs that battle cancer and design vaccines against common, if sometimes deadly, viruses. In one case, the digital mind came up with a solution that, when tested in isolated cells, was a perfect match for an existing antibody against a common virus. In other words, the algorithm “imagined” a hotspot from a viral protein, making it vulnerable as a target to design new treatments.
The algorithm is deep learning’s first foray into building proteins around their functions, opening a door to treatments that were previously unimaginable. But the software isn’t limited to natural protein hotspots. “The proteins we find in nature are amazing molecules, but designed proteins can do so much more,” said Baker in a press release. The algorithm is “doing things that none of us thought it would be capable of.”
The Protein Hotspot
Baker’s team are no strangers to predicting proteins with artificial minds. A few years back, they rocked the structural biology field by releasing Rosetta, a software that can predict a protein’s 3D structure based on its amino acid sequence alone. They further mapped protein complexes and designed protein “screwdrivers” from scratch to pry apart undesirable protein interactions. Late last year, they released a deep learning network dubbed trRosetta, an AI “architect” that generalizes how strings of amino acids arrange into intricate structures at the nanoscale.
Let’s back up.
It’s easy to picture proteins as the meaty, sinewy chicken wing I’m biting into as I type this sentence. But on the molecular level, they’re far more elegant. Imagine multiple Lego blocks—amino acids—held together by a string. Now swirl it around, twisting the chain until some blocks snap onto each other. This forms a delicate structure that often resembles a helix or rumpled bedsheets. In some proteins, these building blocks further assemble into complexes—for example, crafting a channel that tunnels through a cell’s protective membrane like a patrolled interstate highway.
Proteins power every single biological process, often through a cascade of interactions with other proteins or drugs, which—depending on the partner—can trigger completely different consequences: should a cell live or die? Attack a potential invader or stand down? In other words, protein are the building blocks of life, and parsing their structure is how we can hack into life.
Here’s the thing: not all parts of a protein are created equal. If a protein is a human body, functional sites are its “hands”—where it grabs onto another protein or drug, stirs up enzymatic reactions, or fights off invading pathogens. Embedded directly into the protein’s structure, these sites are hard to pin down and even harder to recreate.
The new study tackled the problem with a version of Rosetta: with some previous knowledge, is it possible for a computer to dream up a chain of amino acids that naturally fold into a functional site?
The Dreamer and the Realist
The problem may seem exotic, but there is a previous example—in a different field. Using a neural network, OpenAI created a wide range of images from text captions alone. A spinoff of the rockstar AI text generator GPT-3, the DALL·E algorithm generated fantastical but realistic-looking images based on simple text prompts by detecting patterns from its training. “It takes the deepest, darkest recesses of your imagination and renders it into something that is eerily pertinent,” said Dr. Hany Farid at UC Berkeley after the tool’s initial release.
Building a protein functional site is similar. Here, amino acids are the letters and the protein functional site is the image. “The idea is the same: neural networks can be trained to see patterns in data. Once trained, you can give it a prompt and see if it can generate an elegant solution,” said Dr. Joseph Watson, a lead author of the new work. Except rather than writing a novel, the algorithm could help rewrite life.
The team started with a previous creation, trRosetta. It’s a neural network originally designed to dream up new proteins based on amino acid sequences while being able to predict their structure—some so alien from natural ones that the team dubbed the deep learning’s inner workings “hallucination.” The algorithm seemed perfect: it could both predict a protein’s amino acid sequence and its structure.
The hiccup? It didn’t really work. In contrast, the OG of protein structure prediction, RoseTTAFold, performed like a champ. The algorithm’s power comes from its design: modeling each amino acid at the nanoscale, providing coordinates to each atom. Like pinning a geographical site using Google Maps, this provides a level of ground truth for a structure that an AI can further riff on—a sort of “constrained hallucination.”
Translation? RoseTTAFold can predict a functional structure—specific to the problem at hand—and come up with a rough sketch as the final design.
Then came another clever trick, dubbed “inpainting.” Here, the team hid parts of the protein sequence or structure. The software had to learn how to decipher information from what’s essentially a noisy radio interception, where you can only hear the first few words but try to understand its meaning by filling in the blanks. RoseTTAFold tackled the “missing information recovery problem” with gusto, autocompleting both amino acid sequences and structures to construct a given functional region with high fidelity.
RoseTTAFold can tackle the problems of building amino acid sequences and generating a backbone for the site at the same time. It’s like putting words on paper: the writer makes sure each letter is in the right place, all the while checking that the grammar and meaning make sense.
Questioning the Nature of Reality
Putting their new creation to the test, the team generated several drug and vaccine designs that could potentially fight off viruses and cancer or help with low-iron health issues.
To lead author Dr. Jue Wang, the algorithm became unexpectedly pertinent. While working on the project, his two-year-old son was hospitalized in the emergency unit from a lung infection by RSV (Respiratory Syncytial Virus)—a virus that normally exhibits cold-like symptoms, but can be deadly in the young and the elderly.
At the time, Wang was using the algorithm to design new treatments, which included potential sites on RSV to further test vaccines and drugs against. It’s a relatively well-mapped-out structure. The software hallucinated designs that recapitulated two sites for vaccines to potentially bind to. Tests using hallucinated proteins, reconstructed in bacteria, rapidly grabbed onto existing antibodies—a sign that they’re functional and that the deep learning approach works.
The incident “made me realize that even the ‘test’ problems we were working on were actually quite meaningful,” said Wang.
In several additional tests, the team designed functional sites for an enzyme, protein-binding proteins, and proteins that grab onto metal ions—basically, how you absorb iron and other important metals.
Although powerful, there’s room for growth. The method opens the door to demystifying natural proteins, but also potentially designing new ones for synthetic biology. “These are very powerful new approaches, but there is still much room for improvement,” said Baker.
Altogether, it’s another win for deep learning and a riveting showcase of how AI and biology can synergize. “Deep learning transformed protein structure prediction in the past two years, we are now in the midst of a similar transformation of protein design,” said Baker.
Image Credit: Ian C. Haydon/UW Institute for Protein Design. New artificial intelligence software trained on protein structures can generate functional proteins, including these candidate vaccines for the respiratory virus RSV, in seconds.