Skip directly to search Skip directly to A to Z list Skip directly to page options Skip directly to site content

Volume 10, Number 5—May 2004


Genetic Variation of SARS Coronavirus in Beijing Hospital

Dongping Xu*, Zheng Zhang*, Fuliang Chu*, Yonggang Li*, Lei Jin*, Lingxia Zhang*, George F. Gao†, and Fu-Sheng Wang*Comments to Author 
Author affiliations: *Beijing 302 Hospital, Beijing, China; †University of Oxford, Headington, Oxford, United Kingdom

Suggested citation for this article


To characterize genetic variation of severe acute respiratory syndrome–associated coronavirus (SARS-CoV) transmitted in the Beijing area during the epidemic outbreak of 2003, we sequenced 29 full-length S genes of SARS-CoV from 20 hospitalized SARS patients on our unit, the Beijing 302 Hospital. Viral RNA templates for the S-gene amplification were directly extracted from raw clinical samples, including plasma, throat swab, sputum, and stool, during the course of the epidemic in the Beijing area. We used a TA-cloning assay with direct analysis of nested reverse transcription–polymerase chain reaction products in sequence. One hundred thirteen sequence variations with nine recurrent variant sites were identified in analyzed S-gene sequences compared with the BJ01 strain of SARS-CoV. Among them, eight variant sites were, we think, the first documented. Our findings demonstrate the coexistence of S-gene sequences with and without substitutions (referred to BJ01) in samples analyzed from some patients.

A novel severe acute respiratory syndrome–associated coronavirus (SARS-CoV) has been implicated as the causative agent of a worldwide outbreak of SARS during the first 6 months of 2003 (13). From March 4 to June 18, Beijing had 2,521 cases and 191 deaths from SARS (4). Because of the poor fidelity of RNA-dependent RNA polymerase, genetic variation typically forms a heterogeneous virus pool in RNA virus populations, including coronaviruses such as mouse hepatitis virus (MHV) (5,6). This feature makes viruses highly adaptable and contributes to difficulties in preventing and controlling viral disease. SARS-CoV, a single-stranded RNA virus, has been reported with relatively less variability in analyses of a limited number of viral isolate collections (710). Furthermore, no SARS-CoV quasispecies have been documented, as they have been in many other RNA viruses, including hepatitis C virus (HCV) (11), HIV (12), and MHV (6).

During the SARS outbreak in Beijing, 132 SARS patients were hospitalized and treated on our unit at Beijing Hospital, including the first cluster of case-patients in the area (13). To characterize genetic variation among SARS-CoV transmitted in the Beijing area, we sequenced 29 full-length S genes of SARS-CoV from 20 hospitalized SARS patients, since S glycoprotein plays a key role in virus-host interaction and is predicted to be the main target of immune response (14). Samples that were analyzed represented the timespan of the epidemic. To exclude culture-derived artifacts and estimate mutational heterogeneity, viral RNA was directly extracted from raw clinical samples, and a TA-cloning assay was used with direct analysis of reverse transcriptase–polymerase chain reaction (RT-PCR) products. We compared these sequences with all previously documented S-gene sequences of SARS-CoV.

Materials and Methods

Patients and Samples

Figure 1

Thumbnail of Diagram showing amplification of six overlapping fragments covering full-length spike gene sequence of severe acute respiratory syndrome–associated coronavirus by nested reverse transcriptase–polymerase chain reaction.

Figure 1. Diagram showing amplification of six overlapping fragments covering full-length spike gene sequence of severe acute respiratory syndrome–associated coronavirus by nested reverse transcriptase–polymerase chain reaction.

All patients in the study were hospitalized on our unit with a confirmed diagnosis of SARS. Samples from patients included plasma, throat swab, sputum, and stool; these were stored at –70°C for extraction of viral RNA. A total of 64 RNA samples from 28 SARS-CoV–positive patients (detected by using BNI primers recommended by the World Health Organization [15]) were initially used in S-gene amplification, but only those that generated all six overlapping fragments covering the full-length S-gene sequence (see Nested RT-PCR below and Figure 1) were included in the sequence analysis. As a result, 29 RNA samples from 20 patients were included in the study (Table 1). All patients had received ribavirin and steroid combination therapy.

RNA Extraction

RNA extraction was performed in a biosafety level 3 (P3) laboratory. RNA was extracted directly from plasma samples. Sputum samples were shaken for 30 min with an equal volume of 1.0% acetylcysteine and 0.9% sodium chloride, followed by isolating supernatant by centrifuging (10,000g x 3 min). Throat swab and stool samples were suspended with phosphate-buffered saline (PBS) containing 10 U/mL RNasin (Promega, Madison, WI) and shaken for 10 min, followed by isolating supernatant by centrifuging as mentioned above. RNA was extracted according to the manufacturer’s instructions by using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany).

Nested RT-PCR

Screening RNA for SARS-CoV was based on the method by Drosten et al. (1). For the S-gene amplification, 18 pairs of primers were designed by using MacVactor computer software (Accelrys Inc, San Diego, CA) based on the BJ01 strain of SARS-CoV (GenBank accession no. AY278488) (16). Among them, six pairs (sense/antisense: S1aF/S1aB, S2aF/S2aB, S3aF/S3aB, S4aF/S4aB, S5aF/S5aB, S6aF/S6aB) were used as outer primers, six pairs (sense/antisense: S1bF/S1bB, S2bF/S2bB, S3bF/S3bB, S4bF/S4bB, S5bF/S5bB, S6bF/S6bB) were used as inner primers, and six pairs (sense/antisense: S1cF/S1cB, S2cF/S2cB, S3cF/S3cB, S4cF/S4cB, S5cF/S5cB, S6cF/S6cB) were designed for direct RT-PCR product sequencing. The sequences covering the full-length S gene were amplified separately as six overlapping fragments (F1b, F2b, F3b, F4b, F5b, and F6b) (Figure 1). The one-step RT-PCR Kit (Qiagen) was used for reverse transcription and the first round of PCR amplification with outer primers. Thermal cycling consisted of 50°C for 30 min; 95°C for 15 min; 10 cycles of 95°C for 30 s, 57.5°C for 30 s (decreasing by 1.5°C every other cycle), 72°C for 1 min; 40 cycles of 95°C for 30 s, 54°C for 30 s, 72°C for 1 min. Afterwards, 2 μL of the product was used as a template for the second round of PCR amplification in 100-μL volume with inner primers with Taq DNA polymerase (MBI Fermentas, Hanover, MD). Thermal cycling consisted of 30 cycles of 95°C for 25 s, 54°C for 25 s, 72°C for 50 s. In some cases, Transcript III RNase H Reverse Transcriptase (Invitrogen, Carlsbad, CA) was used for reverse transcription, according to the manufacturer’s instructions. The next two rounds of PCR amplification were performed by using Platinum Pfx DNA Polymerase with a higher fidelity (Invitrogen). The reaction condition was set as above, with a twofold elongation at 68°C instead of 72°C. All reactions were carefully carried out to avoid contamination.


RT-PCR products were purified by QIAquick PCR Purification Kit (Qiagen) or QIAquick Gel Extraction Kit (Qiagen), with a final volume of 30 μL of elution. The ligation and transformation were performed according to the manufacturer’s instructions by using pGEM-T Vector System II (Promega). Transformants were selected in LB-agar plate containing 100 μg of ampicillin, 100 μg of 5-bromo-4-chloro-3-indolyl β-L-fucopyranoside (X-gal), and 200 μg of isopropylthiogalactoside. Escherichia coli from white clones was added to 5 mL of LB culture for overnight growing at 37°C with vigorous shaking. Plasmid was purified by QIAprep Spin Miniprep Kit (Qiagen). The recombinant plasmids for sampling sequence analysis were screened by electrophoresis in 1% agarose containing 0.5 μg/mL of ethidium bromide.

Sequencing and DNA Analysis

For each S-gene fragment, four to six clones were screened. To verify variations, 5–50 additional clones generated from independently prepared, RNA-derived RT-PCR products were sequenced in two to four independent experiments. The cloned plasmids were prepared from different RT-PCR products and were directly sequenced for confirmation. DNA sequences were obtained with the use of an automated ABI 377 sequencer (Applied Biosystems Inc., Foster City, CA). For cloned plasmids, SP6 and T7 primers were used for two-directional sequencing reactions. For PCR products, specific primers (sense: S1cF–S6cF; antisense: S1cB–S6cB) were used for two-directional sequencing reactions. Analysis and comparison of nucleotide and amino acid sequences were carried out with the DNASTAR computer software (DNASTAR Inc., Madison, WI). The S gene sequence of BJ01 strain was taken as the reference for variation analysis.


With the designed six pairs of primers, all six overlapping S-gene fragments were amplified by nested RT-PCR from 29 RNA samples. However, most RNA samples initially included in the study, though positive for SARS-CoV with BNI primers, failed to simultaneously generate all six overlapping S-gene fragments and were excluded from further sequence analysis. Disintegration of the virus and low viral load in the raw samples likely accounted for these failures.

Figure 2

Thumbnail of Variants identified from 29 full-length S genes of severe acute respiratory syndrome–associated coronavirus from 20 SARS patients in comparison with BJ01 strain (GenBank accession no. AY278488). The nucleotide positions are numbered according to the sequence of BJ01 strain. Numbers start from the beginning of the genome, but the amino acid numbers start from the S protein. The filled arrows represent nonsynonymous mutations, and the hollow arrows represent synonymous ones. The occur

Figure 2. Variants identified from 29 full-length S genes of severe acute respiratory syndrome–associated coronavirus from 20 SARS patients in comparison with BJ01 strain (GenBank accession no. AY278488). The nucleotide positions are numbered...

One hundred and thirteen sequence variations distributed in nine variant sites were identified in analyzed sequences that were compared to the reference BJ01 strain of SARS-CoV. BJ01 is an isolate from a tissue-culture propagated sample (16) and is used as reference strain in other studies (9,10). With the exception of one site (position 21702), other variant sites have not, to our knowledge, been documented in humans. Seven of nine variant sites were nonsynonymous. Figure 2 shows the identified variant sites compared to the reference sequence.


We identified novel variant sites and the coexistence of sequences with and without S-gene substitutions in SARS-CoV. Theoretically, a replicating RNA virus expresses a range of genetic and phenotypic variants and has the potential to generate novel virions, which may be selected in response to environmental pressures. RNA viruses generally tolerate high levels of mutagenesis because of their limited genetic complexity (17). Mutations have the potential to be pathogenic (e.g., giving the virus immunity to neutralizing antibodies, cytotoxic T cells, or antiviral drugs [1820]). The dynamics of error copying and sequence decomposition are time-dependent. In HIV infection, for example, one adaptive substitution in the env gene occurred every 3.3 months or 25 viral generations, averaging across patients (21).

In our study, a higher variation frequency in the S gene was identified for SARS-CoV compared to previous reports (710). This difference may be due to a broader sample collection covering a longer timespan of infection. In addition, since virus isolates were not passaged in culture, the whole mutant repertoire is more likely to be detected, since no reverse mutation occurs in cell culture. Our observation most likely reflected the real situation in vivo. Variations were unlikely to result from Taq polymerase errors, since we repeated the experiments for all variations from preparing independent RNA and RT-PCR products and used Platinum Pfx DNA polymerase, which has a high fidelity, to confirm the results in some cases. We could not exclude the possibility that some variations were from defective genomes. However, the fact that the variations remained detectable in the sequences from two or three specimens of the same patient, obtained at different times, suggested that these variations might be active and extensible in vivo.

Sequences with and without substitutions (referred to BJ01) were simultaneously detected in the sequences from seven samples, which suggests the existence of SARS-CoV quasispecies. Furthermore, S-gene sequences from different samples collected at different times from the same patient showed similar, but not exactly identical, variation profiles in four participants (patients 4, 5, 6, and 19 in Table 1); this implies that a dynamic mutational process may exist in vivo. Table 2 summarizes the variations occurring in 29 analyzed S-gene sequences from 20 individual SARS patients.

One nonsynonymous change observed at position A1023G is within the heptad repeat (HR) domains, which is thought to be important for virus entry, and previous study on MHV showed that it would have some effect on virus infection (22). At this stage, we cannot rule out the possibility that this change affects the biological outcome of the virus, but further experiments need to be addressed in the near future.

We observed the coexistence of the S-gene sequences with and without substitutions and time-dependent variation profile in some patients. These observations suggest the possible existence of SARS-CoV quasispecies in an acute infection. In this study, however, the limitation of clinical sample collection and difficulty in directly amplifying full-length S gene from raw clinical samples restricted further extensive study for dynamic mutant distributions of the virus. In addition, the sequencing clone number was conditioned by the scale of the project, and this may have led to some minor variant sequences escaping analysis. Another factor possibly affecting the stability of the viral genome is the administration of the antiviral drug ribavirin. That ribavirin enhances mutagensis of RNA viruses has been addressed (23). Therefore, the artificial effect of ribavirin on the SARS-CoV mutant spectrum remains to be clarified.

The genetic variation of SARS-CoV remains limited in relation to many other RNA viruses such as HIV-1, HCV, and MHV. The probable reason is that SARS-CoV only causes an acute, self-limited infection, which may prevent persistent long-term mutant development in vivo as occurs in chronic RNA viral infections. Notably, some modules in the S protein remain conserved, e.g., the fusion-important HR domains. Although some variations may predict changes of protein functional features, no obvious correlation exists between mutation and clinical disease manifestation from the limited data reported here. Instead, the variation profile was closely correlated with epidemiography; e.g., patients 3–8 were infected in one hospital.

In conclusion, we report here some new variant sites in the S gene of coronavirus and possible existence of SARS-CoV quasispecies in some patients, though in limited numbers. This knowledge furthers our understanding of this emerging virus.

Dr. Xu is an associate professor of medicine at the Beijing Institute of Infectious Diseases. His work focuses on cancer gene therapy, medical viral molecular biology, and immunology.


We thank K.Y. Yuen for his valuable suggestions for this project; and Panyong Mao and Yuanli Mao for providing some of the samples.

The work was supported by Beijing Natural Science Foundation (Number: 7034051), Emergent Foundation for SARS Treatment and Prevention (Number: 03F017), and in part by Sino-UK Collaboration Foundation for SARS Immunopathogenesis Study (Number: H030230100130).


  1. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N Engl J Med. 2003;348:196776.Drosten C, Gunther S, Preiser W, van der Werf S, Brodt HR, Becker S, et al. DOIPubMed
  2. Ksiazek TG, Erdman D, Goldsmith CS, Zaki SR, Peret T, Emery S, A novel coronavirus associated with severe acute respiratory syndrome. N Engl J Med. 2003;348:195366. DOIPubMed
  3. Marra MA, Jones SJ, Astell CR, Holt RA, Brooks-Wilson A, Butterfield YS, The Genome sequence of the SARS-associated coronavirus. Science. 2003;300:1399404. DOIPubMed
  4. Epidemiological features of severe acute respiratory syndrome in Beijing. Zhonghua Liu Xing Bing Xue Za Zhi. 2003;24:10969.Liang WN, Mi J, Information Branch Joint Leadership Group of SARS Prevention and Control in Beijing.PubMed
  5. Quasispecies and the development of new antiviral strategies. Prog Drug Res. 2003;60:13358.Domingo E.PubMed
  6. Evolution of mouse hepatitis virus (MHV) during chronic infection: quasispecies nature of the persisting MHV RNA. Virology. 1995;209:33746.Adami C, Pooley J, Glomb J, Stecker E, Fazal F, Fleming JO, et al. DOIPubMed
  7. Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection. Lancet. 2003;361:177985.Ruan YJ, Wei CL, Ee AL, Vega VB, Thoreau H, Su ST, et al. DOIPubMed
  8. Coronavirus genomic-sequence variations and the epidemiology of the severe acute respiratory syndrome. N Engl J Med. 2003;349:1878.Tsui SK, Chim SS, Lo YM. DOIPubMed
  9. Mutation analysis of 20 SARS virus genome sequences: evidence for negative selection in replicase ORF1b and spike gene. Acta Pharmacol Sin. 2003;24:7415.Hu LD, Zheng GY, Jiang HS, Xia Y, Zhang Y, Kong XY.PubMed
  10. Severe acute respiratory syndrome–associated coronavirus genotype and its characterization. Chin Med J (Engl). 2003;116:128892.Li L, Wang Z, Lu Y, Bao Q, Chen S, Wu N, et al.PubMed
  11. Bukh J, Miller RH, Purcell RH. Genetic heterogeneity of hepatitis C virus: quasispecies and genotypes. Semin Liver Dis. 1995;15:4163. DOIPubMed
  12. Wain-Hobson S. Human immunodeficiency virus type 1 quasispecies in vivo and ex vivo. Curr Top Microbiol Immunol. 1992;176:18193.PubMed
  13. Epidemiologic features, clinical diagnosis and therapy of first cluster of patients with severe acute respiratory syndrome in Beijing area. Zhonghua Yi Xue Za Zhi. 2003;83:101822.Zhou XZ, Zhao M, Wang FS, Jiang TJ, Li YG, Nie WM, et al.PubMed
  14. Wang FS, Xu D. Study of characteristics and pathogenic mechanism of SARS coronavirus. Infect Dis Inform. 2003;16:678.
  15. PCR primers for SARS developed by WHO network laboratories [monograph on the Internet]. Geneva: World Health Organization. 2003 Apr 17. Available from:
  16. Qin ED, Zhu QY, Yu M, Fan B, Chang G, Si B, A complete sequence and comparative analysis of a SARS-associated virus (isolate BJ01). Chin Sci Bull. 2003;48:9418.
  17. Quasispecies structure and persistence of RNA viruses. Emerg Infect Dis. 1998;4:5217.Domingo E, Baranowski E, Ruiz-Jarabo CM, Martin-Hernandez AM, Saiz JC, Escarmis C. DOIPubMed
  18. Maekawa S, Enomoto N, Kurosaki M, Nagayama K, Marumo F, Sato C. Genetic changes in the interferon sensitivity determining region of hepatitis C virus during the natural course of chronic hepatitis C. J Med Virol. 2000;61:30310. DOIPubMed
  19. Yang OO, Sarkis PT, Ali A, Harlow JD, Brander C, Kalams SA, Determinant of HIV-1 mutational escape from cytotoxic T lymphocytes. J Exp Med. 2003;197:136575. DOIPubMed
  20. Cytotoxic T cell-resistant variants are selected in a virus-induced demyelinating disease. Immunity. 1996;5:25362.Pewe L, Wu GF, Barnett EM, Castro RF, Perlman S. DOIPubMed
  21. Williamson S. Adaptation in the env gene of HIV-1 and evolutionary theories of disease progression. Mol Biol Evol. 2003;20:131825. DOIPubMed
  22. Luo Z, Weiss SR. Roles in cell-to-cell fusion of two conserved hydrophobic regions in the murine coronavirus spike protein. Virology. 1998;244:48394. DOIPubMed
  23. Crotty S, Maag D, Arnold JJ, Zhong W, Lau JY, Hong Z, The broad-spectrum antiviral ribonucleoside ribavirin is an RNA virus mutagen. Nat Med. 2000;6:13759. DOIPubMed



Suggested citation for this article: Xu D, Zhang Z, Chu F, Li Y, Jin L, Zhang L, et al. Genetic variation of SARS coronavirus in Beijing hospital. Emerg Infect Dis [serial on the Internet]. 2004 May [date cited]. Available from:

DOI: 10.3201/eid1005.030875

Table of Contents – Volume 10, Number 5—May 2004

Comments to the Authors

Please use the form below to submit correspondence to the authors or contact them at the following address:

Fu-Sheng Wang, Beijing Institute of Infectious Diseases, 302 Hospital, 100 Xi Si Huan Middle Road, Beijing 100039, China; fax: 86-10-63831870

character(s) remaining.

Comment submitted successfully, thank you for your feedback.

Comments to the EID Editors

Please contact the EID Editors via our Contact Form.