Literature detail

Identifying and prioritizing potential human-infecting viruses from their genome sequences.

Nardus Mollentze1,2 Simon A Babayan2 Daniel G Streicker1,2
Affiliations 2 institutions
  1. Medical Research Council-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom.
  2. Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.
PMID 34582436 2021 PLoS Biol eng epublish
PubMed DOI Browse context

Article

Publication summary

Determining which animal viruses may be capable of infecting humans is currently intractable at the time of their discovery, precluding prioritization of high-risk viruses for early investigation and outbreak preparedness. Given the increasing use of genomics in virus discovery and the otherwise sparse knowledge of the biology of newly discovered viruses, we developed machine learning models that identify candidate zoonoses solely using signatures of host range encoded in viral genomes. Within a dataset of 861 viral species with known zoonotic status, our approach outperformed models based on the phylogenetic relatedness of viruses to known human-infecting viruses (area under the receiver operating characteristic curve [AUC] = 0.773), distinguishing high-risk viruses within families that contain a minority of human-infecting species and identifying putatively undetected or so far unrealized zoonoses. Analyses of the underpinnings of model predictions suggested the existence of generalizable features of viral genomes that are independent of virus taxonomic relationships and that may preadapt viruses to infect humans. Our model reduced a second set of 645 animal-associated viruses that were excluded from training to 272 high and 41 very high-risk candidate zoonoses and showed significantly elevated predicted zoonotic risk in viruses from nonhuman primates, but not other mammalian or avian host groups. A second application showed that our models could have identified Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) as a relatively high-risk coronavirus strain and that this prediction required no prior knowledge of zoonotic Severe Acute Respiratory Syndrome (SARS)-related coronaviruses. Genome-based zoonotic risk assessment provides a rapid, low-cost approach to enable evidence-driven virus surveillance and increases the feasibility of downstream biological and ecological characterization of viruses.

Animals COVID-19 Disease Outbreaks Forecasting Genome, Viral Host Specificity Humans Machine Learning Models, Theoretical Phylogeny SARS-CoV-2 Viruses Zoonoses

Structured evidence records

Evidence records

4 total
2 records
Extraction confidence 0.70
Key finding

General genomic features were found that may preadapt viruses to infect humans, representing molecular adaptation signatures independent of taxonomy.

Virus
Not specified
Host
Not specified
Location
Not specified
Supporting text

Analyses of the underpinnings of model predictions suggested the existence of generalizable features of viral genomes that are independent of virus taxonomic relationships and that may preadapt viruses to infect humans.

Mechanism types
preadaptation; host_range_determination
Extraction confidence 0.70
Key finding

Genome-based risk assessment classified SARS-CoV-2 as having features consistent with potential human infection, implying intrinsic genomic adaptation related to zoonotic capability.

Virus
Host
Not specified
Location
Not specified
Supporting text

A second application showed that our models could have identified Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) as a relatively high-risk coronavirus strain and that this prediction required no prior knowledge of zoonotic Severe Acute Respiratory Syndrome (SARS)-related coronaviruses.

Mechanism types
genome_signature; host_range_prediction
1 records
Extraction confidence 0.80
Key finding

Genome-based analysis identified genomic features predictive of human infectivity and showed that SARS-CoV-2 would have been classified as high risk using viral genome signatures alone.

Virus
Location
Not specified
Supporting text

We developed machine learning models that identify candidate zoonoses solely using signatures of host range encoded in viral genomes... our approach outperformed models based on the phylogenetic relatedness of viruses to known human-infecting viruses... A second application showed that our models could have identified Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) as a relatively high-risk coronavirus strain.

Genes or proteins
whole genome
Analysis methods
genome-based machine learning; phylogenetic analysis
1 records
Extraction confidence 0.75
Key finding

Machine learning applied to viral genomes was used to assess zoonotic risk in animal-associated viruses, particularly indicating elevated predicted risk in viruses from nonhuman primates to support evidence-driven surveillance.

Virus
Not specified
Host
Location
Not specified
Supporting text

Our model reduced a second set of 645 animal-associated viruses that were excluded from training to 272 high and 41 very high-risk candidate zoonoses and showed significantly elevated predicted zoonotic risk in viruses from nonhuman primates ... Genome-based zoonotic risk assessment provides a rapid, low-cost approach to enable evidence-driven virus surveillance and increases the feasibility of downstream biological and ecological characterization of viruses.

Method
machine learning; genome-based zoonotic risk assessment