DeepMind may just have cracked one of the grandest challenges in biology. One that rivals the discovery of DNA’s double helix. It could change biomedicine, drug discovery, and vaccine development forever.
The actual achievement sounds far less sexy at first glance. One of DeepMind’s powerful AI algorithms, called AlphaFold, used its deep learning prowess to predict a protein’s three-dimensional (3D) shape, down to the width of an atom. It’s a challenge that’s mystified biologists for 50 years and counting—so much so that computer-based protein structure prediction has been made into crowd-sourcing games, global competitions, and a Nobel Prize in search of a breakthrough.
We’re at that inflection point. AlphaFold triumphed over roughly 100 other teams in a long-running challenge called Critical Assessment of Structure Prediction, or CASP, with a knockout, jaw-dropping performance. Speaking to Nature, CASP cofounder Dr. John Moult at the University of Maryland said “in some sense the problem is solved.”
Dr. Mohammed AlQuraishi at Columbia University, who also participated in CASP, lauded the AI as transformational. “It’s a breakthrough of the first order, certainly one of the most significant scientific results of my lifetime,” he said to Nature.
It also comes as a triumph for DeepMind, which rose to fame with a slew of algorithms that outperformed humans in games such as Go and the entire Atari laundry list. The win for protein structure prediction, however, marks its dazzling debut into the real world—one that nixes negative punditry on the value of AI for real-life quandaries.
DeepMind isn’t the only contender in the protein folding game. AlphaFold relies on biological data and insights. This week, a group of experimental scientists delivered. By tactically changing the genes of a complicated protein assembly and observing the outcome, the team was able to build an algorithm that reconstructs the protein with extremely high accuracy.
Together, we’re on a fast-track to a paradigm shift. “This will change medicine,” said Dr. Andrei Lupas at the Max Planck Institute for Developmental Biology. “It will change research. It will change bioengineering. It will change everything.”
What’s the Big Deal?
A central tenet in biology is “structure explains function.” The discovery of the double helix shape of DNA, for example, skyrocketed insights into how genetic information is copied and stored. Without structure, we wouldn’t have gene editing, DNA computers, or storage devices.
Protein structures arguably contain as much, if not more, information. But they’re far harder to decipher. They start their lives as ribbons of linear components, called amino acids, like beads on a string. Based on enormously complicated biophysics—much of which remains mysterious—the string folds into delicate shapes, such as sheets of twisting and turning strands, or helices that wrap around each other. Many of these structures further couple into a megaplex. Only then can they function as intended to sustain life.
If we know a protein’s structure, we can make educated guesses about its function. And by mapping thousands of protein structures, we can begin to decipher the biology of life—and find ways to manipulate it.
Take Covid-19 vaccines. One major breakthrough was to map the structure of “spike” proteins on the surface of the virus, which the virus relies on to invade our cells. Imagine a protein’s 3D structure as a lock. If we can map the shape of the lock, it’s then possible to design “keys”—drugs or vaccines—to disrupt it. It’s not surprising that DeepMind’s AlphaFold went after these spike protein structures in March, just as Covid-19 cases began skyrocketing across the world.
The classic “gold standard” for uncovering protein structures relies on an extremely tedious and difficult lab technique called X-ray crystallography. Scientists essentially “freeze” proteins into delicate crystal-esque structures and use a combination of X-rays, high-tech microscopes, and math to figure out their shapes. But not all proteins can be “flash-frozen” to be analyzed, leaving a Grand Canyon-sized gap for decoding biology. Other methods, with unfriendly names like “nuclear magnetic resonance spectroscopy,” are just as expensive and finicky.
But here’s the thing. The instructions for building a 3D protein are inherently embedded inside its 1D amino acid sequences—a discovery that won the Nobel Prize. And if there’s one thing AI is good at, it’s finding patterns in complicated sequences beyond our puny human ability.
3D Chess
The CASP challenge crowd-sourced predictions of protein structures that have already been identified using X-ray crystallography, but were unavailable to the public. DeepMind isn’t a newcomer to the challenge; back in 2018, its performance shocked many academic scientists who had long worked in the field.
AlphaFold’s strategy is similar to most entries in CASP this year, in that it relies on deep learning. Remember: amino acid sequences, the building blocks of proteins, contain data about a protein’s final 3D shape, which seems perfect for a deep learning approach.
DeepMind went a step further. They took on the leviathan task of adding in data about physics, geometry, and evolutionary history into their model. The neural network, trained on protein data banks of roughly 170,000 protein structures, could then interpret the protein’s structure as a “3D map” and analyze any buried relationships or patterns. By iterating this process, AlphaFold was able to “determine highly-accurate structures in a matter of days,” wrote DeepMind.
These aren’t empty words. At CASP, the algorithm put competitors to shame. Nearly two-thirds of its predictions were comparable to experimental data at a similar resolution of a few atoms. It scored a mind-boggling 90 out of 100—a massive 25-point margin above other contenders.
More to Go
More practically, AlphaFold’s success means that we could have access to previously “un-druggable” proteins—many of which are involved in cancer and other serious diseases.
Nearly all of our drugs are designed to dock onto a protein, like keys to a lock. The first step is to know thy enemy; that is, the protein’s structure to find vulnerable points of attack. Having an AI-based method to decode protein structure could rapidly screen for tens of thousands of new drug targets. “AlphaFold will open up a new area of research,” said Dame Janet Thornton at the European Bioinformatics Institute in the UK to MIT Technology Review.
Overwhelming accolades aside, there’s room for improvement. AlphaFold is relatively slow compared to some algorithms that deliver results in seconds, though with the trade-off of less accuracy. But more importantly, it struggled with deciphering protein complexes—mega-structures of multiple individual 3D building blocks that form into a collective functional entity. These are hardly rare in biology—most of the chemical receptors in our brain cells, for example, rely on these structures. They’re also like shape-shifting mega-Rubik’s cubes in that their 3D structure can change depending on the state of the body. For example, a mega-protein in the shape of a closed tunnel can open when it detects a chemical docked on its surface—a process that’s central to how our brains work.
The plus side? DeepMind has help. This week, a team took a separate approach to analyzing protein complexes in living cells—something AlphaFold hasn’t yet dominated. Their approach to the vexing problem went back to genes, the blueprint that guides the construction of amino acid chains, which contains information on 3D protein folding.
It’s also an out-of-the-box idea. The team found that they could quickly screen through thousands of mutations for a gene that makes a protein in living cells. By observing the structure of resulting protein complexes, they could then use AI-based methods to map out how one mutation affects another—and in turn, reveal the “rules” behind how these mega-structures form by just looking at their underlying genetic instructions.
Similar to AlphaFold, the technology, called “integrative modeling,” isn’t yet ready to replace the gold standard of protein mapping. But more than ever before, we’re close. From singular proteins to meta-protein complexes, we now have faster, simpler, cheaper ways to accurately visualize a biological Invisible Man. With AI and biology working in tandem, protein folding may just be the first major breakthrough for medicine in our generation.
“AlphaFold is one of our most significant advances to date,” wrote the DeepMind team. [The progress] “gives us further confidence that AI will become one of humanity’s most useful tools in expanding the frontiers of scientific knowledge, and we’re looking forward to the many years of hard work and discovery ahead!”
Image Credit: fdecomite/flickr