After going through the experience of the COVID-19 pandemic, everybody is keen on predicting and avoiding the next big viral threat. New research at the University of Glasgow in the UK is harnessing the power of AI towards that goal.
Machine learning, an approach to data analysis whose goal is to teach machines how to automate certain tasks, could help predict the next zoonosis — a virus that jumps from an animal species to humans. Such pathogens are the most significant drivers of epidemics and pandemics and have been so throughout human history. The coronavirus was, very likely, also a zoonosis, one which jumped to humans from bats.
Manually sifting through all known animal viruses in an attempt to predict zoonosis is a monumental task. We estimate that there are around 1.67 million animal viruses out there, and although just a few should be able to infect humans, the work volume required for this task makes it simply not feasible in practical terms; especially as such predictions require specialized skills and laboratories.
This is where, a new study hopes, machines will come to the rescue.
Let the computer crunch it
“Our findings show that the zoonotic potential of viruses can be inferred to a surprisingly large extent from their genome sequence,” the study reads. “By highlighting viruses with the greatest potential to become zoonotic, genome-based ranking allows further ecological and virological characterization to be targeted more effectively.”
Predicting that a virus is likely to become a threat is not the same thing as actually preventing it from doing so, but it does go a long, long way in helping us prepare. That preparation would, in turn, lead to many lives saved, and much suffering avoided. It would also allow us to better monitor the behavior of particular threats, and focus preventative efforts more effectively.
In order to develop this AI, the team used the genetic sequences — full genomes — of roughly 860 virus species belonging to 36 families. The algorithm was trained to look for patterns in these (human-infecting) viral genomes alongside species-level records of human infection rates. Based on these datasets, viruses were assigned a probability of being able to infect human hosts. Its estimations were then compared to our best models of predicting a virus’ zoonotic potential. The authors used this step to both validate the estimations as much as possible, and to analyze patterns in these estimations across viral families.
“Although our primary interest was in zoonotic transmission, we trained models to predict the ability to infect humans in general, reasoning that patterns found in viruses predominantly maintained by human-to-human transmission may contain genomic signals that also apply to zoonotic viruses.”
Overall, the team reports, there are genetic features that seem to predispose viruses to infecting humans. These are largely independent of their taxonomy (evolutionary relationships to other viral species). Based on the AI’s estimations, they then developed machine learning models tailored specifically to look for these features across known viral genomes. We would still have to test any viral strain flagged by such a system in the lab in order to confirm that it can infect human cells, the author explain, before major resources are devoted to researching them and how to best counter them
This being said, a virus’ ability to infect human cells, by itself, is only one factor of its overall zoonotic potential. How virulent/infectious it is in humans, how easily it transmits between different hosts, and other environmental factors (such as a period of economic downturn or starvation, for example) have a sizable part to play in the formation of pandemics.
“These findings add a crucial piece to the already surprising amount of information that we can extract from the genetic sequence of viruses using AI techniques,” says study co-author Simon Babayan, from the Institute of Biodiversity, Animal Health and Comparative Medicine at the University of Glasgow.
“A genomic sequence is typically the first, and often only, information we have on newly-discovered viruses, and the more information we can extract from it, the sooner we might identify the virus’ origins and the zoonotic risk it may pose. As more viruses are characterized, the more effective our machine learning models will become at identifying the rare viruses that ought to be closely monitored and prioritized for preemptive vaccine development.”
The paper “Identifying and prioritizing potential human-infecting viruses from their genome sequences” has been published in the journal PLOS Biology.