Deep learning is solving biology’s deepest secrets at breathtaking speed.
Just a month ago, DeepMind cracked a 50-year-old grand challenge: protein folding. A week later, they produced a totally transformative database of more than 350,000 protein structures, including over 98 percent of known human proteins. Structure is at the heart of biological functions. The data dump, set to explode to 130 million structures by the end of the year, allows scientists to foray into previous “dark matter”—proteins unseen and untested—of the human body’s makeup.
The end result is nothing short of revolutionary. From basic life science research to developing new medications to fight our toughest disease foes like cancer, deep learning gave us a golden key to unlock new biological mechanisms—either natural or synthetic—that were previously unattainable.
Now, the AI darling is set to do the same for RNA.
As the middle child of the “DNA to RNA to protein” central dogma, RNA didn’t get much press until its Covid-19 vaccine contribution. But the molecule is a double hero: it both carries genetic information, and—depending on its structure—can catalyze biological functions, regulate which genes are turned on, tweak your immune system, and even crazier, potentially pass down “memories” through generations.
It’s also frustratingly difficult to understand.
Similar to proteins, RNA also folds into complicated 3D structures. The difference, according to Drs. Rhiju Das and Ron Dror at Stanford University, is that we comparatively know little about these molecules. There are 30 times as many types of RNA as there are proteins, but the number of deciphered RNA structures is less than one percent compared to proteins.
The Stanford team decided to bridge that gap. In a paper published last week in Science, they described a deep learning algorithm called ARES (Atomic Rotationally Equivalent Scorer) that efficiently solves RNA structures, blasting previous attempts out of the water.
The authors “have achieved notable progress in a field that has proven recalcitrant to transformative advances,” said Dr. Kevin Weeks at the University of North Carolina, who was not involved in the study.
Even more impressive, ARES was trained on only 18 RNA structures, yet was able to extract substantial “building block” rules for RNA folding that’ll be further tested in experimental labs. ARES is also input agnostic, in that it isn’t specifically tailored to RNA.
“This approach is applicable to diverse problems in structural biology, chemistry, materials science, and beyond,” the authors said.
Meet RNA
The importance of this biomolecule for our everyday lives is probably summarized as “Covid vaccine, mic drop.”
But it’s so much more.
Like proteins, RNA is transcribed from DNA. It also has four letters, A, U, C, and G, with A grabbing U and C tethered to G. RNA is a whole family, with the most well-known type being messenger RNA, or mRNA, which carries the genetic instructions to build proteins. But there’s also transfer RNA, or tRNA—I like to think of this as a transport drone—that grabs onto amino acids and shuttles them to the protein factory, microRNA that controls gene expression, and even stranger cousins that we understand little about.
Bottom line: RNA is both a powerful target and inspiration for genetic medicine or vaccines. One way to shut off a gene without actually touching it, for example, is to kill its RNA messenger. Compared to gene therapy, targeting RNA could have fewer unintended effects, all the while keeping our genetic blueprint intact.
In my head, RNA often resembles tangled headphones. It starts as a string, but subsequently tangles into a loop-de-loop—like twisting a rubber band. That twisty structure then twists again with surrounding loops, forming a tertiary structure.
Unlike frustratingly annoying headphones, RNA twists in semi-predictable ways. It tends to settle into one of several structures. These are kind of like the shape your body contorts into during a bunch of dance moves. Tertiary RNA structures then stitch these dance moves together into a “motif.”
“Every RNA likely has a distinct structural personality,” said Weeks.
This seeming simplicity is what makes researchers tear their hair out. RNA’s building blocks are simple—just four letters. They also fold into semi-rigid structures before turning into more complicated tertiary models. Yet “despite these simplifying features, the modeling of complex RNA structures has proven to be difficult,” said Weeks.
The Prediction Conundrum
Current deep learning solutions usually start with one requirement: a ton of training examples, so that each layer of the neural network can begin to learn how to efficiently extract features—information that allows the AI to make solid predictions.
That’s a no-go for RNA. Unlike protein structures, RNA simply doesn’t have enough experimentally tried and true examples.
With ARES, the authors took an eyebrow-raising approach. The algorithm doesn’t care about RNA. It discards anything we already know about the molecule and its functions. Instead, it focused only on the arrangement of atoms.
ARES was first trained with a small set of motifs known from previous RNA structures. The team also added a large bunch of alternative examples of the same structure that were incorrect. Digesting these example, ARES slowly tweaked its neural network parameters so that the program began learning how each atom and its placement contributes to the overall molecule’s function.
Similar to a classic computer vision algorithm that gradually extracts features—from pixels to lines and shapes—ARES does the same. The layers in its neural network cover both fine and coarse scales. When challenged with a new set of RNA structures, many of which are far more complex than the training ones, ARES was able to distill patterns and novel motifs, recognizing how the letters bind.
“It learns entirely from atomic structure, using no other information…and it makes no assumptions about what structural features might be important,” the authors said. They didn’t even provide any basic information to the algorithm, such as that RNA is made up of four-letter chains.
As another benchmark, the team next challenged ARES to RNA-Puzzles. Kicked off in 2011, RNA-Puzzles is a community challenge for structural biologists to test their prediction algorithms against known experimental RNA structures. ARES blew the competition away.
The average resolution “has stayed stubbornly stuck” around 10 times less than that for a protein, said Weeks. ARES improved the accuracy by roughly 30 percent. It’s a seemingly small step, but a giant leap for one of biology’s most intractable problems.
An RNA Structural Code
Compared to protein structure prediction, RNA is far harder. And for now, ARES still can’t get to the level of accuracy needed for drug discovery efforts, or find new “hot spots” on RNA molecules that can tweak our biology.
But ARES is a powerful step forward in “piercing the fog” of RNA, one that’s “poised to transform RNA structure and function discovery,” said Weeks. One improvement to the algorithm could be to incorporate some experimental data to further model these intricate structures. What’s clear is that RNA seem to have a “structural code” that helps regulate gene circuits—something that ARES and its next generations may help parse.
Much of RNA has been the “dark matter” of biology. We know it’s there, but it’s difficult to visualize and even harder to study. ARES represents the next telescope into that fog. “As it becomes possible to measure, (deeply) learn, and predict the details of the tertiary RNA structure-ome, diverse new discoveries in biological mechanisms await,” said Weeks.