SARS-CoV-2 Phylogenetic Analysis, Lazio Region, Italy, February–March 2020

We report phylogenetic and mutational analysis of severe acute respiratory syndrome coronavirus 2 virus strains from the Lazio region of Italy and provide information about the dynamics of virus spread. Data suggest effective containment of clade V strains, but subsequently, multiple waves of clade G strains were circulating widely in Europe.


The Study
We analyzed nasopharyngeal swab (n = 6) and bronchoalveolar lavage (n = 3) samples from 9 patients with COVID-19 to perform SARS-CoV-2 whole-genome reconstruction and mutational analysis. We collected samples in late February and early March, 2020 (Table  1). At sampling time, all patients reported symptoms such as fever, sore throat, cough, or other respiratory symptoms. Two sequences were identical, so we included only 1 of them in the analysis, resulting in 8 total sequences. We named the sequences INMI3-INMI10 for their detection at National Institute for Infectious Diseases and analyzed them together with the previously published INMI1 and INMI2 (6), along with all the sequences from Italy posted to GISAID database by April 11, 2020.
We submitted consensus sequences to GISAID. We used the proposed phylogenetic lineage classification (A. Rambaut  We report phylogenetic and mutational analysis of severe acute respiratory syndrome coronavirus 2 virus strains from the Lazio region of Italy and provide information about the dynamics of virus spread. Data suggest effective containment of clade V strains, but subsequently, multiple waves of clade G strains were circulating widely in Europe.

DISPATCHES
according to GISAID phylogenetics, as reported (6), and clade B2; the clade includes other sequences from EU countries, but no additional sequences from Italy. All other INMI sequences cluster with the GISAID G clade, and with the B1 clade; we focused subsequent analysis on clade B1 (Figure).
The clade B1 INMI sequences are distributed in 2 main clusters, one including most of the northern Italy strains and the other including sequences mainly from central Italy. In particular, INMI4, which was epidemiologically linked to Bergamo (Lombardy region), clusters with sequences from central Italy (Abruzzo region). The other INMI sequences cluster with strains from northern Italy. Of note, in both clusters the sequences from Italy are intermixed with sequences from other EU countries, which can also be seen in the broader phylogenetic analysis on GISAID, in which more EU sequences are analyzed. We have identified 5 synonymous and 9 nonsynonymous substitutions distributed along the whole genome (Table 2).
Each patient showed several amino acid substitutions ranging from 4 to 7. The G clade-specific single-nucleotide polymorphism A23403G led the amino acid change D614G in the S protein. We observed one additional mutation in this protein, that of C21575T (L5F) in INMI7, which is detected in few other sequences in GISAID, interspersed among different non-G clades (M. Chiara et al., unpub. data, https://doi.org/10.1101/2020.03.30.016790). Its location in a marginal region of the gene and the sporadic distribution in different clades indicates repeated occurrence not followed by fixation, consistent with no evolutionary advantage.
The S protein in the SARS-CoV-2 virus is a chief determinant of the host range and pathogenicity. The virion attaches to the cell membrane by binding the S protein with the host ACE2 receptor (7). The D614G mutation, located in the putative S1-S2 junction region near the furin polybasic cleavage site (RRAR), might have an effect on priming by host cell proteases; however, the real impact of this high-frequency mutation is unclear.
The variants C241T, C3037T (located in the noncoding region) and C14408T (in open reading frame1ab, orf1ab) were present in all INMI3-INMI10 sequences. These mutations have been detected in several SARS-CoV-2 isolates throughout Europe and are characteristic of clade G (C. Yin, unpub. data). A nonsynonymous substitution D3G in membrane glycoprotein was detected in 1 INMI9 sequence.
We detected 3 nucleotide changes in INMI4, located in a high variable region of the gene, in 2 adjacent codons of the nucleocapsid (N) gene, two 2-amino acid changes, R203K and G204R. N protein, responsible for the formation of helical nucleocapsid, can elicit humoral and cell mediated immune response and has potential value in vaccine development. However, none of the observed mutations has been so far associated with changes in viral pathogenicity or transmissibility.

Conclusions
The phylogenetic reconstruction we report suggests possible multiple introduction of SARS-CoV-2 virus in Italy, supporting previously reported analysis conducted on a more limited number of sequences (3)(4)(5).
The analysis consistently places the strains described in this study in 2 distinct clusters in B1 clade. No other sequence from Italy clusters in B2 (or GI-SAID V) clade, indicating the positive effect of containment measures established by health authorities in both Italy and China to limit viral transmission directly from China. The same measures were unable to contain a wave of subsequent multiple introductions in Italy of strains that were widely circulating in Europe, all clustering with clade B1.
The inclusion of the viral sequences from infections occurring in the Lazio region helps to demonstrate the dynamics of virus circulation in Italy. In particular, a small number of mutations have been detected in these strains, but the real impact and role that these mutations may have on the pathogenicity and transmissibility of SARS-CoV-2 remains to be determined.
A limitation of our research is that only a portion of viral sequences, including the sequences from Italy, have been published as of April 10, 2020; phylogenetic analysis could substantially change when more sequences are made available. Continued genomic surveillance strategies are needed to improve monitoring and understanding of current SARS-CoV-2 epidemics, which might help to lessen the public health impact of COVID-19. Furthermore, increased sequencing capacity is necessary for contact tracing and enhanced surveillance activity.