Literature detail

On the origin and continuing evolution of SARS-CoV-2.

Xiaolu Tang1 Changcheng Wu1 Xiang Li2 Yuhe Song2 Xinmin Yao1 Xinkai Wu1 Yuange Duan1 Hong Zhang1 Yirong Wang1 Zhaohui Qian3 Jie Cui2 Jian Lu1
Affiliations 3 institutions
  1. State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing 100871, China.
  2. CAS Key Laboratory of Molecular Virology & Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai 200031, China.
  3. NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China.
PMID 34676127 2020 Natl Sci Rev eng ppublish
PubMed DOI Browse context

Article

Publication summary

The SARS-CoV-2 epidemic started in late December 2019 in Wuhan, China, and has since impacted a large portion of China and raised major global concern. Herein, we investigated the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses. Although we found only 4% variability in genomic nucleotides between SARS-CoV-2 and a bat SARS-related coronavirus (SARSr-CoV; RaTG13), the difference at neutral sites was 17%, suggesting the divergence between the two viruses is much larger than previously estimated. Our results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by natural selection besides recombination. Population genetic analyses of 103 SARS-CoV-2 genomes indicated that these viruses had two major lineages (designated L and S), that are well defined by two different SNPs that show nearly complete linkage across the viral strains sequenced to date. We found that L lineage was more prevalent than the S lineage within the limited patient samples we examined. The implication of these evolutionary changes on disease etiology remains unclear. These findings strongly underscores the urgent need for further comprehensive studies that combine viral genomic data, with epidemiological studies of coronavirus disease 2019 (COVID-19).

molecular evolution population genetics SARS-CoV-2 virus

Structured evidence records

Evidence records

6 total
3 records
Extraction confidence 1.00
Key finding

Comparative genomic analysis showed SARS-CoV-2 differs by 4% overall and 17% at neutral sites from the bat SARSr-CoV RaTG13, indicating substantial evolutionary divergence.

Virus
Host
Location
Not specified
Supporting text

We investigated the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses. Although we found only 4% variability in genomic nucleotides between SARS-CoV-2 and a bat SARS-related coronavirus (SARSr-CoV; RaTG13), the difference at neutral sites was 17%, suggesting the divergence between the two viruses is much larger than previously estimated.

Analysis methods
comparative genomics
Extraction confidence 1.00
Key finding

Sequence variation analysis indicates that functional site mutations in the spike RBD of SARS-CoV-2 and pangolin SARSr-CoVs arose through natural selection in addition to recombination.

Virus
Location
Not specified
Supporting text

The development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by natural selection besides recombination.

Genes or proteins
spike; receptor-binding domain
Analysis methods
comparative genomics; molecular evolution analysis
Extraction confidence 1.00
Key finding

Population genomic analysis of 103 SARS-CoV-2 genomes identified two major lineages, L and S, distinguished by two linked SNPs.

Virus
Location
Not specified
Supporting text

Population genetic analyses of 103 SARS-CoV-2 genomes indicated that these viruses had two major lineages (designated L and S), that are well defined by two different SNPs that show nearly complete linkage across the viral strains sequenced to date.

Analysis methods
population genetic analysis; phylogenetic analysis
2 records
Extraction confidence 0.90
Key finding

New functional site variations in the spike receptor-binding domain of SARS-CoV-2 and pangolin SARS-related coronaviruses likely arose through natural selection, indicating molecular adaptation related to host receptor binding.

Virus
Host
Not specified
Location
Not specified
Supporting text

Our results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by natural selection besides recombination.

Genes or proteins
spike; receptor-binding domain
Mechanism types
receptor_binding; host_adaptation; natural_selection
Extraction confidence 0.85
Key finding

Two major SARS-CoV-2 lineages (L and S) are defined by two nearly linked SNPs, representing ongoing molecular diversification within the virus.

Virus
Host
Not specified
Location
Not specified
Supporting text

Population genetic analyses of 103 SARS-CoV-2 genomes indicated that these viruses had two major lineages (designated L and S), that are well defined by two different SNPs that show nearly complete linkage across the viral strains sequenced to date.

Mutations
two lineage-defining SNPs
Mechanism types
genomic_divergence; lineage_differentiation
1 records
Extraction confidence 0.70
Key finding

Recombination likely contributed to variation in the spike receptor-binding domain of SARS-CoV-2 and pangolin SARS-related coronaviruses, suggesting a role in the virus’s evolutionary origin.

Host
Not specified
Location
Not specified
Supporting text

Our results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by natural selection besides recombination.

Event type
recombination
Genes or segments
spike; receptor-binding domain