For all their simplicity, viruses are sneaky little life forces.

Take SARS-Cov-2, the virus behind Covid-19. Challenged with the human immune system, the virus has gradually reshuffled parts of its genetic material, making it easier to spread among a human population. The new strain has already terrorized South Africa and shut down the UK, and recently popped up in the United States.

The silver lining is that our existing vaccines and antibody therapies are still likely to be effective against the new strain. But that’s not always the case. “Viral escape” is a nightmare scenario, in which the virus mutates just enough so that existing antibodies no longer recognize it. The consequences are dire: it means that even if you’ve already had the infection, or produced antibodies from a vaccine, those protections are now kneecapped or useless.

From an evolutionary perspective, viral mutations and our immune system are constantly engaged in a cat-and-mouse game. Last week, thanks to an utterly unexpected resource, we may now have a leg up. In a mind-bending paper published in Science, one team developed a tool to predict viral escape—and it came from natural language processing (NLP), the AI field of mimicking human speech.

Weird, right?

The team’s critical insight was to construct a “viral language” of sorts, based purely on its genetic sequences. This language, if given sufficient examples, can then be analyzed using NLP techniques to predict how changes to its genome alter its interaction with our immune system. That is, using artificial language techniques, it may be possible to hunt down key areas in a viral genome that, when mutated, allow it to escape roaming antibodies.

It’s a seriously kooky idea. Yet when tested on some of our greatest viral foes, like influenza (the seasonal flu), HIV, and SARS-CoV-2, the algorithm was able to discern critical mutations that “transform” each virus just enough to escape the grasp of our immune surveillance system.

“The language of viral evolution and escape … provides a powerful framework for predicting mutations that lead to viral escape,” said Drs. Yoo-Ah Kim and Teresa Przytycka at the National Institute of Health, who were not involved in the study but provided perspectives on it.

“This is a phenomenal way of narrowing down the entire universe of potential mutant viruses,” added Dr. Benhur Lee at Mount Sinai. And if further validated, the algorithm could bolster attempts at an effective HIV vaccine, or a universal flu vaccine—rather than the piecemeal prediction approach we have now. It could also provide insight into how the new coronavirus could further mutate and put our immune system in “check,” and in turn, give us time to battle its escape plans and end the pandemic once and for all.

A Useful Analogy

The idea of using NLP to examine viruses started with an analogy. Last winter, study author Brian Hie was cruising around the snowy grounds of MIT when an idea popped into his head: what if it’s possible to explain the interaction between virus and the immune system in the same way we analyze language?

It’s an uber-nerdy realization that takes a few leaps of faith. But the more Hie thought about it, the more it made sense. Language contains both grammar and semantics. The first is rather immutable, before it sets up the structure of a sentence. But the second, semantics, is just the meaning of the sentence. Changing a single word could immediately alter the meaning to the point a listener could no longer comprehend, all the while keeping the grammar intact. In other words, it’s totally possible to say grammatically correct gibberish—Mad Libs comes to mind—while “escaping” the understanding of a listener.

Here’s the analogy leap. Viruses also run on two main traits to survive. Both involve their interaction with our immune system. The first is their ability to enter a cell to replicate more of themselves. This trait, dubbed “virulence,” needs to stay semi-consistent so that the virus can maintain itself inside a host.

Take SARS-CoV-2. Like most viruses, it’s a bubble-like being with spikes dotted on its surface. Encapsulated within is its genomic sequence. The spike proteins are necessary for the virus to “talk” to our cells, allowing the virus to enter. But it’s the viral genes that dictate the shape of the spike proteins. In other words, if changes to the viral genes also alter spike proteins, these mutations would change the virus’s interaction with our cells and immune system.

In order to survive, any given virus needs to follow its own “grammar.” These fundamental sequences, captured in its genome, allow its survival. Break the grammar with too many mutations, or mutations in critical spots, and the virus will no longer be able to enter a cell and replicate, and will reach an evolutionary dead end. Bottom line: a virus needs to keep its “grammar” intact.

Yet grammar is just half of comprehension. The other is semantics, the meaning of words. This, thought Hie, is where viruses have more leeway. Imagine the virus as a speaker, and our immune system as a listener. Mutations to a viral genome that swap out “words”—but leave the grammar intact—could fool the immune “listener” just enough so that it no longer understands the virus’s language, and halts an attack. Yet because the virus’s grammar remains, it’s free to replicate and cause havoc, hidden away from the immune system’s defenses. In other words, if a mutation allows a virus to keep its grammar but changes its semantics, it also allows viral escape.

The question is, how do we predict those nightmare mutations?

Enter Algorithms

Hie’s second leap in thought was to tap into a completely different field: AI language.

In recent years, AI has gotten extremely efficient at modeling both grammar and semantics in human language, without any prior knowledge or understanding of the content. Take GPT-3 by OpenAI, which produces startling human-like prose that’s both grammatically correct and stays mostly on topic. Rather than studying linguistics, these NLP algorithms learn through a vast corpus of text, arranged in words, short phrases, sentences, and paragraphs. Even without prior training, an NLP algorithm is capable of grasping patterns in human language. Forget rules—it’s pattern recognition all the way through.

Now imagine example text being the virus’s “normal” genome and mutations being alternative novel phrases; it’s then possible to analyze the language of the virus using NLP techniques. Take “grammar,” for example, or sequences in a viral genome that enable its entry into a cell. If considered a language, the NLP could begin grasping sequences related to a virus’s infectiousness, without needing any previous knowledge of microbiology.

A similar idea works for viral semantics. It’s possible to systematically change one viral genetic letter. Using NLP, we can then analyze how far the mutant strays in its “meaning”—for example, its behavior. Using the language example, swapping “cat” to “feline” is a tiny change. Swapping “cat” with “bulldozer,” however, yields a much larger difference. The degree of these alterations is captured by a number, rather than intuition, and allows the algorithm to judge how far a virus has strayed from its original form.

Using influenza, HIV, and SARS-CoV-2, the team set out to find genetic mutations that allow viral escape: ones that preserve the virus’s “grammar,” but alter its “semantics.” Scoring each region with their algorithm, the team uncovered several targeted protein spots—and their genetic blueprint—that massively raised the chance of viral escape. Remember: the algorithm had never previously encountered any data remotely related to the biology of a virus. But based solely on the “language” of the virus, it replicated previous lab results of sequences that led to influenza escape.

It’s not often that unrelated branches of science give each other a push. And Hie’s not about to stop. Further tapping into the language analogy, it’s possible that some people comprehend the same sentence differently based on their history, culture, and experience. Similarly, our immune systems aren’t all the same—each has its own plethora of molecules, antibodies, and immune cells, and overall “strength.”

“It will be interesting to see whether the proposed approach can be adapted to provide a ‘personalized’ view of the language of virus evolution,” said Kim and Przytycka.

Image Credit: Vektor Kunst from Pixabay

Shelly Xuelai Fan is a neuroscientist-turned-science writer. She completed her PhD in neuroscience at the University of British Columbia, where she developed novel treatments for neurodegeneration. While studying biological brains, she became fascinated with AI and all things biotech. Following graduation, she moved to UCSF to study blood-based factors that rejuvenate aged brains. She is the ...

Follow Shelly: