Proteins are like Spider-Man in the multiverse.
The underlying story is the same: each building block of a protein is based on a three-letter DNA code. However, change one letter, and the same protein becomes a different version of itself. If we’re lucky, some of these mutants can still perform their normal functions.
When we’re unlucky, a single DNA letter change triggers a myriad of inherited disorders, such as cystic fibrosis and sickle cell disease. For decades, geneticists have hunted down these disease-causing mutations by examining shared genes in family trees. Once found, gene-editing tools such as CRISPR are beginning to help correct genetic typos and bring life-changing cures.
The problem? There are more than 70 million possible DNA letter swaps in the human genome. Even with the advent of high-throughput DNA sequencing, scientists have painstakingly uncovered only a sliver of potential mutations linked to diseases.
This week, Google DeepMind brought a new tool to the table: AlphaMissense. Based on AlphaFold, their blockbuster algorithm for predicting protein structures, the new algorithm analyzes DNA sequences and works out which DNA letter swaps likely lead to disease.
The tool only focuses on single DNA letter changes called “missense mutations.” In several tests, it categorized 89 percent of the tens of millions of possible genetic typos as either benign or pathogenic, said DeepMind.
AlphaMissense expands DeepMind’s work in biology. Rather than focusing only on protein structure, the new tool goes straight to the source code—DNA. Just a tenth of a percent of missense mutations in human DNA have been mapped using classic lab tactics. AlphaMissense opens a new genetic universe in which scientists can explore targets for inherited diseases.
“This knowledge is crucial to faster diagnosis” wrote the authors in a blog post, and to get to the “root cause of disease.”
For now, the company is only releasing the catalog of AlphaMissense predictions, rather than the code itself. They also warn the algorithm isn’t meant for diagnoses. Rather, it should be viewed more like a tip-line for disease-causing mutations. Scientists will have to examine and validate each tip using biological samples.
“Ultimately, we hope that AlphaMissense, together with other tools, will allow researchers to better understand diseases and develop new life-saving treatments,” said study authors Žiga Avsec and Jun Cheng at DeepMind.
Let’s Talk Proteins
A quick intro to proteins. These molecules are made from genetic instructions in our DNA represented by four letters: A, T, C, and G. Combining three of these letters codes for a protein’s basic building block—an amino acid. Proteins are made up of 20 different types of amino acids.
Evolution programmed redundancy into the DNA-to-protein translation process. Multiple three-digit DNA codes create the same amino acid. Even if some DNA letters mutate, the body can still build the same proteins and ship them off to their normal workstations without issue.
The problem is when a single letter change bulldozes the entire operation.
Scientists have long known these missense mistakes lead to devastating health consequences. But hunting them down has taken years of tedious work. To do this, scientists manually edit DNA sequences in a suspicious gene—letter by letter—make them into proteins, then observe their biological functions to hunt down the missense mutation. With hundreds of potential suspects, nailing down a single mutation can take years.
Can we speed it up? Enter machine minds.
AI Learning ATCG
DeepMind joins a burgeoning field that uses software to predict disease-causing mutations.
Compared to previous computational methods, AlphaMissense has a leg up. The tool leverages learnings from its predecessor algorithm, AlphaFold. Known for solving protein structure prediction—a grand challenge in the field—AlphaFold is in the algorithmic biology hall-of-fame.
AlphaFold predicts protein structures—which often determine function—based on amino acid sequences alone. Here, AlphaMissense uses AlphaFold’s “intuition” about protein structures to predict whether a mutation is benign or detrimental, study author and DeepMind’s vice president of research Dr. Pushmeet Kohli said at a press briefing.
The AI also leverages the large language model approach. In this way, it’s a little like GPT-4, the AI behind ChatGPT, only rejiggered to decode the language of proteins. These algorithmic editors are great at homing in on protein variants and flagging which sequences are biologically plausible and which aren’t. To Avsec, that’s AlphaMissense’s superpower. It already knows the rules of the protein game—that is, it knows which sequences work and which fail.
As a proof-of-concept, the team used a standardized database of missense variants, called ClinVar, to challenge their AI system. These genetic typos lead to multiple developmental disorders. AlphaMissense bested existing models for nailing down disease-causing mutations.
Predicting protein structures can be useful for stabilizing protein drugs and nailing down other biophysical properties. However, solving structure alone has “generally been of little benefit” when it comes to predicting variants that cause diseases, said the authors.
With AlphaMissense, DeepMind wants to turn the tide.
The team is releasing its entire database of potential disease-causing mutations to the public. Overall, they hunted down 32 percent of all missense variants that likely trigger diseases and 57 percent that are likely benign. The algorithm joins others in the field, such as PrimateAI, first released in 2018 to screen for dangerous mutants.
To be clear: the results are only predictions. Scientists will have to validate these AI-generated leads in lab experiments. AlphaMissense provides “only one piece of evidence,” said Dr. Heidi Rehm at the Broad Institute, who wasn’t involved in the work.
Nevertheless, the AI model has already generated a database that scientists can tap into “as a starting point for designing and interpreting experiments,” said the team.
Moving forward, AlphaMissense will likely have to tackle protein complexes, said Marsh and Teichmann. These sophisticated biological architectures are fundamental to life. Any mutations can crack their delicate structure, cause them to misfunction, and lead to diseases. Dr. David Baker’s lab at the University of Washington—another pioneer in protein structure prediction—has already begun using machine learning to explore these protein cathedrals.
For now, no single tool that predicts disease-causing DNA mutations can be relied on to diagnose genetic diseases, as symptoms often result from both inherited mutations and environmental cues. This applies to AlphaMissense as well. But as the algorithm—and interpretation of its results—advances, its use in the “diagnostic odyssey will continue to improve,” they said.