Volume 26, Number 7—July 2020
Possible Bat Origin of Severe Acute Respiratory Syndrome Coronavirus 2
We showed that severe acute respiratory syndrome coronavirus 2 is probably a novel recombinant virus. Its genome is closest to that of severe acute respiratory syndrome–related coronaviruses from horseshoe bats, and its receptor-binding domain is closest to that of pangolin viruses. Its origin and direct ancestral viruses have not been identified.
Seventeen years after the severe acute respiratory syndrome (SARS) epidemic, an outbreak of pneumonia, now called coronavirus disease (COVID-19), was reported in Wuhan, China. Some of the early case-patients had a history of visiting the Huanan Seafood Wholesale Market, where wildlife mammals are sold, suggesting a zoonotic origin. The causative agent was rapidly isolated from patients and identified to be a coronavirus, now designated as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by the International Committee on Taxonomy of Viruses (1). SARS-CoV-2 has spread rapidly to other places; 113,702 cases and 4,012 deaths had been reported in 110 countries/areas as of March 10, 2020 (2). In Hong Kong, 130 cases and 3 deaths had been reported.
SARS-CoV-2 is a member of subgenus Sarbecovirus (previously lineage b) in the family Coronaviridae, genus Betacoronavirus, and is closely related to SARS-CoV, which caused the SARS epidemic during 2003, and to SARS-related-CoVs (SARSr-CoVs) in horseshoe bats discovered in Hong Kong and mainland China (3–5). Whereas SARS-CoV and Middle East respiratory syndrome coronavirus were rapidly traced to their immediate animal sources (civet and dromedaries, respectively), the origin of SARS-CoV remains obscure.
SARS-CoV-2 showed high genome sequence identities (87.6%–87.8%) to SARSr-Rp-BatCoV-ZXC21/ZC45, detected in Rhinolophus pusillus bats from Zhoushan, China, during 2015 (6). A closer-related strain, SARSr-Ra-BatCoV-RaTG13 (96.1% genome identity with SARS-CoV-2), was recently reported in Rhinolophus affinis bats captured in Pu’er, China, during 2013 (7). Subsequently, Pangolin-SARSr-CoV/P4L/Guangxi/2017 (85.3% genome identities to SARS-CoV-2) and related viruses were also detected in smuggled pangolins captured in Nanning, China, during 2017 (8) and Guangzhou, China, during 2019 (9). To elucidate the evolutionary origin and pathway of SARS-CoV-2, we performed an in-depth genomic, phylogenetic, and recombination analysis in relation to SARSr-CoVs from humans, civets, bats, and pangolins (10).
We downloaded 4 SARS-CoV-2, 16 human/civet-SARSr-CoV, 63 bat-SARSr-CoV and 2 pangolin-SARSr-CoV genomes from GenBank and GISAID (https://www.gisaid.org). We also sequenced the complete genome of SARS-CoV-2 strain HK20 (GenBank accession no. MT186683) from a patient with COVID-19 in Hong Kong. We performed genome, phylogenetic, and recombination analysis as described (11).
The 5 SARS-CoV-2 genomes had overall 99.8%–100% nt identities with each other. These genomes showed 96.1% genome identities with SARSr-Ra-BatCoV-RaTG13, 87.8% with SARSr-Rp-BatCoV-ZC45, 87.6% with SARSr-Rp-BatCoV-ZXC21, 85.3% with pangolin-SARSr-CoV/P4L/Guangxi/2017, and 73.8%–78.6% with other SARSr-CoVs, including human/civet-SARSr-CoVs (Table 1).
Most predicted proteins of SARS-CoV-2 showed high amino acid sequence identities with that of SARSr-Ra-BatCoV RaTG13, except the receptor-binding domain (RBD) region. SARS-CoV-2 possessed an intact open reading frame 8 without the 29-nt deletion found in most human SARS-CoVs. The concatenated conserved replicase domains for coronavirus species demarcation by the International Committee on Taxonomy of Viruses showed >92.9% aa identities (threshold >90% for same species) between SARS-CoV-2 and other SARSr-CoVs, supporting their classification under the same coronavirus species (Table 2) (1).
Unlike other members of the subgenus Sarbecovirus, SARS-CoV-2 has a spike protein that contains a unique insertion that results in a potential cleavage site at the S1/S2 junction, which might enable proteolytic processing that enhances cell–cell fusion. SARS-CoV-2 was demonstrated to use the same receptor, human angiotensin-converting enzyme 2 (hACE2), as does SARS-CoV (7). The predicted RBD region of SARS-CoV-2 spike protein, corresponding to aa residues 318–513 of SARS-CoV (12), showed the highest (97% aa) identities with pangolin-SARSr-CoV/MP789/Guangdong and 74.1%–77.7% identities with human/civet/bat-SARSr-CoVs known to use hACE2 (Table 1). Moreover, similar to the human/civet/bat-SARSr-CoV hACE2-using viruses, the 2 deletions (5 aa and 12 aa) found in all other SARSr-BatCoVs (10) were absent in SARS-CoV-2 RBD (Appendix Figure 1). Of the 5 critical residues needed for RBD-hACE2 interaction in SARSr-CoVs (13), 3 (F472, N487, and Y491) were present in SARS-CoV-2 RBD and pangolin SARSr-CoV/MP789/2019-RBD.
Phylogenetic analysis showed that the RNA-dependent RNA polymerase gene of SARS-CoV-2 is most closely related to that of SARSr-Ra-BatCoV RaTG13, whereas its predicted RBD is closest to that of pangolin-SARSr-CoVs (Figure 1). This finding suggests a distinct evolutionary origin for SARS-CoV-2 RBD, possibly as a result of recombination. Moreover, the SARS-CoV-2 RBD was also closely related to SARSr-Ra-BatCoV RaTG13 and the hACE2-using cluster containing human/civet-SARSr-CoVs and Yunnan SARSr-BatCoVs previously successfully cultured in VeroE6 cells (4,5).
To identify putative recombination events, we performed sliding window analysis using SARS-CoV-2-HK20 as query and SARSr-Ra-BatCoV RaTG13, pangolin-SARSr-CoV/P4L/Guangxi/2017, SARSr-Rp-BatCoV ZC45, SARSr-Rs-BatCoV Rs3367, and SARSr-Rs-BatCoV Longquan-140 as potential parents (Figure 2; Appendix Figures 2). A similarity plot showed that SARS-CoV-2 is most closely related to SARSr-Ra-BatCoV RaTG13 in the entire genome, except for its RBD, which is closest to pangolin-SARSr-CoV/MP789/Guangdong, and shows potential recombination breakpoints. Moreover, different regions of SARS-CoV-2 genome showed different similarities to pangolin-SARSr-CoV/P4L/Guangxi/2017, SARSr-Rp-BatCoV ZC45, SARSr-Rs-BatCoV Rs3367, and SARSr-Rs-BatCoV Longquan-140, as supported by phylogenetic analysis (Appendix Figures 2, 3).
Sequence alignment around the RBD supported potential recombination between SARSr-Ra-BatCoV RaTG13 and pangolin-SARSr-CoV/MP789/Guangdong/2019 and the receptor-binding motif region showing exceptionally high sequence similarity to that of pangolin-SARSr-CoV/MP789/Guangdong/2019. This finding suggested that SARS-CoV-2 might be a recombinant virus between viruses closely related to SARSr-Ra-BatCoV RaTG13 and pangolin-SARSr-CoV/MP789/Guangdong/2019.
Despite the close relatedness of SARS-CoV-2 to bat and pangolin viruses, none of the existing SARSr-CoVs represents its immediate ancestor. Most of the genome region of SARS-CoV-2 is closest to SARSr-Ra-BatCoV-RaTG13 from an intermediate horseshoe bat in Yunnan, whereas its RBD is closest to that of pangolin-SARSr-CoV/MP789/Guangdong/2019 from smuggled pangolins in Guangzhou. Potential recombination sites were identified around the RBD region, suggesting that SARS-CoV-2 might be a recombinant virus, with its genome backbone evolved from Yunnan bat virus–like SARSr-CoVs and its RBD region acquired from pangolin virus–like SARSr-CoVs.
Because bats are the major reservoir of SARSr-CoVs and the pangolins harboring SARSr-CoVs were captured from the smuggling center, it is possible that pangolin SARSr-CoVs originated from bat viruses as a result of animal mixing, and there might be an unidentified bat virus containing an RBD nearly identical to that of SARS-CoV-2 and pangolin SARSr-CoV. Similar to SARS-CoV, SARS-CoV-2 is most likely a recombinant virus originated from bats.
The ability of SARS-CoV-2 to emerge and infect humans is likely explained by its hACE2-using RBD region, which is genetically similar to that of culturable Yunnan SARSr-BatCoVs and human/civet-SARSr-CoVs. Most SARSr-BatCoVs have not been successfully cultured in vitro, except for some Yunnan strains that had human/civet SARS-like RBDs and were shown to use hACE2 (4,5). For example, SARSr-Rp-BatCoV ZC45, which has an RBD that is more divergent from that of human/civet-SARSr-CoVs, did not propagate in VeroE6 cells (6). Factors that determine hACE2 use among SARSr-CoVs remain to be elucidated.
Although the Wuhan market was initially suspected to be the epicenter of the epidemic, the immediate source remains elusive. The close relatedness among SARS-CoV-2 strains suggested that the Wuhan outbreak probably originated from a point source with subsequent human-to-human transmission, in contrast to the polyphyletic origin of Middle East respiratory syndrome coronavirus (14). If the Wuhan market was the source, a possibility is that bats carrying the parental SARSr-BatCoVs were mixed in the market, enabling virus recombination. However, no animal samples from the market were reported to be positive. Moreover, the first identified case-patient and other early case-patients had not visited the market (15), suggesting the possibility of an alternative source.
Because the RBD is considered a hot spot for construction of recombinant CoVs for receptor and viral replication studies, the evolutionarily distinct SARS-CoV-2 RBD and the unique insertion of S1/S2 cleavage site among Sarbecovirus species have raised the suspicion of an artificial recombinant virus. However, there is currently no evidence showing that SARS-CoV-2 is an artificial recombinant, which theoretically might not carry signature sequences. Further surveillance studies in bats are needed to identify the possible source and evolutionary path of SARS-CoV-2.
Dr. Lau is a professor and head of the Department of Microbiology at The University of Hong Kong, Hong Kong, China. Her primary research interest is using microbial genomics for studying emerging infectious diseases, including coronaviruses.
This study was partly supported by the theme-based research scheme (project no. T11-707/15-R) of the University Grant Committee; Health and Medical Research Fund of the Food and Health Bureau of HKSAR; Consultancy Service for Enhancing Laboratory Surveillance of Emerging Infectious Disease for the HKSAR Department of Health and the University Development Fund of the University of Hong Kong.
- Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nat Microbiol. 2020;5:536–44.
- World Health Organization. Coronavirus disease 2019 (COVID-19) situation report 50, March 10, 2020 [cited 2020 Apr 11]. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200310-sitrep-50-covid-19.pdf
- Lau SK, Woo PC, Li KS, Huang Y, Tsoi HW, Wong BH, et al. Severe acute respiratory syndrome coronavirus-like virus in Chinese horseshoe bats. Proc Natl Acad Sci U S A. 2005;102:14040–5.
- Ge XY, Li JL, Yang XL, Chmura AA, Zhu G, Epstein JH, et al. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature. 2013;503:535–8.
- Hu B, Zeng LP, Yang XL, Ge XY, Zhang W, Li B, et al. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLoS Pathog. 2017;13:
- Hu D, Zhu C, Ai L, He T, Wang Y, Ye F, et al. Genomic characterization and infectivity of a novel SARS-like coronavirus in Chinese bats. Emerg Microbes Infect. 2018;7:154.
- Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270–3.
- Liu P, Chen W, Chen JP. Viral metagenomics revealed sendai virus and coronavirus infection of Malayan pangolins (Manis javanica). Viruses. 2019;11:
- Lam TT, Shum MH, Zhu HC, Tong YG, Ni XB, Liao YS, et al. Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins. Nature. 2020; Epub ahead of print.
- Luk HKH, Li X, Fung J, Lau SKP, Woo PCY. Molecular epidemiology, evolution and phylogeny of SARS coronavirus. Infect Genet Evol. 2019;71:21–30.
- Lau SKP, Li KS, Huang Y, Shek CT, Tse H, Wang M, et al. Ecoepidemiology and complete genome comparison of different strains of severe acute respiratory syndrome-related Rhinolophus bat coronavirus in China reveal bats as a reservoir for acute, self-limiting infection that allows recombination events. J Virol. 2010;84:2808–19.
- Wong SK, Li W, Moore MJ, Choe H, Farzan M. A 193-amino acid fragment of the SARS coronavirus S protein efficiently binds angiotensin-converting enzyme 2. J Biol Chem. 2004;279:3197–201.
- Li W, Zhang C, Sui J, Kuhn JH, Moore MJ, Luo S, et al. Receptor and viral determinants of SARS-coronavirus adaptation to human ACE2. EMBO J. 2005;24:1634–43.
- Lau SK, Wernery R, Wong EY, Joseph S, Tsang AK, Patteril NA, et al. Polyphyletic origin of MERS coronaviruses and isolation of a novel clade A strain from dromedary camels in the United Arab Emirates. Emerg Microbes Infect. 2016;5:
- Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497–506.
TablesCite This Article
Original Publication Date: April 21, 2020
1These authors contributed equally to this article.