Molecular Epidemiology and Evolutionary Trajectory of Emerging Echovirus 30, Europe

In 2018, an upsurge in echovirus 30 (E30) infections was reported in Europe. We conducted a large-scale epidemiologic and evolutionary study of 1,329 E30 strains collected in 22 countries in Europe during 2016–2018. Most E30 cases affected persons 0–4 years of age (29%) and 25–34 years of age (27%). Sequences were divided into 6 genetic clades (G1–G6). Most (53%) sequences belonged to G1, followed by G6 (23%), G2 (17%), G4 (4%), G3 (0.3%), and G5 (0.2%). Each clade encompassed unique individual recombinant forms; G1 and G4 displayed >2 unique recombinant forms. Rapid turnover of new clades and recombinant forms occurred over time. Clades G1 and G6 dominated in 2018, suggesting the E30 upsurge was caused by emergence of 2 distinct clades circulating in Europe. Investigation into the mechanisms behind the rapid turnover of E30 is crucial for clarifying the epidemiology and evolution of these enterovirus infections.

We performed an in-depth analysis of the genetic diversity of E30 strains detected during a large-scale upsurge in cases in Europe during 2018. We collated sequences obtained by participating laboratories in 22 countries and analyzed the epidemiologic and evolutionary profi les in this molecular study.

Data Collection
An invitation to participate in this study was sent on November 13, 2018, to co-authors of the E30-2018 study (2) through the European Centre for Disease Prevention and Control (ECDC) Epidemic Intelligence Information System Vaccine-Preventable Diseases platform (https://www.ecdc.europa.eu/en/publications-data/ epidemic-intelligence-information-system-epis), and to members of the European Non-Polio Enterovirus Network (ENPEN; https://www.escv.eu/enpen). We requested pseudonymized data from 2016-2018 with sample identifi er, sampling date, specimen type, and sequence in FASTA be sent to ECDC secure fi le transfer protocol server by January 7, 2019. We also collected optional data, such as patient age, clinical presentation, whether they were hospitalized, and infection outcome. We excluded submissions without virus sequence data (Appendix Figure 1, https://wwwnc.cdc.gov/EID/ article/27/6/20-3096-App1.pdf).

Sequence Data Collection
We requested that the FASTA sequence data contain the VP1 gene and collected 1,784 records (Appendix Figure 1). Sequences were obtained from enteroviruspositive samples by using 5′ UTR PCR (23) and typed within the VP1 gene by using Sanger sequencing (2,24). We excluded sequences with indicators of poor sequence quality, such as >2 ambiguous or undefi ned bases, in-frame stop codons, identical to reference E30 strains; sequences of the wrong type, such as E3; or sequences shorter than 200 basepairs or spanning a non-VP1 region. In total, we had 1,407 study sequences that comprised 2 nonoverlapping regions, 1,262 sequences from region 1 (nt positions 2543-2902, according to the prototype E30 strain Bastianni, GenBank accession no. AF311938) and 145 sequences from region 2 (nt positions 2916-3428). Of these, 1,329 sequences were collected during 2016-2018 and 78 during 2010-2015. We used the 2010-2015 sequences In 2018, an upsurge in echovirus 30 (E30) infections was reported in Europe. We conducted a large-scale epidemiologic and evolutionary study of 1,329 E30 strains collected in 22 countries in Europe during 2016-2018. Most E30 cases aff ected persons 0-4 years of age (29%) and 25-34 years of age (27%). Sequences were divided into 6 genetic clades (G1-G6). Most (53%) sequences belonged to G1, followed by G6 (23%), G2 (17%), G4 (4%), G3 (0.3%), and G5 (0.2%). Each clade encompassed unique individual recombinant forms; G1 and G4 displayed >2 unique recombinant forms. Rapid turnover of new clades and recombinant forms occurred over time. Clades G1 and G6 dominated in 2018, suggesting the E30 upsurge was caused by emergence of 2 distinct clades circulating in Europe. Investigation into the mechanisms behind the rapid turnover of E30 is crucial for clarifying the epidemiology and evolution of these enterovirus infections.
for phylogenetic reconstruction but excluded these from further data analysis (Appendix Figure 1).
For additional analysis of the 3D polymerase (3Dpol) region, we randomly selected records from each clade to ensure fair distribution of sequence data. We asked participants to send either extracted RNA in a QIAGEN (https://www.qiagen.com) spin column at room temperature for next-generation sequencing (NGS) or to conduct 3Dpol sequencing of the 549 nucleotides, as previously described (17).

Epidemiologic and Statistical Analyses
We descriptively analyzed clinical symptoms and age. Patients were stratified into the following age groups: <3 months, 3-23 months, 2-5 years, 6-15 years, 16-25 years, 26-45 years, 46-65 years, and >65 years. Crude odds ratios with 95% CI were used to express magnitude of association between continuous or categorical variables in multivariate logistic regression.

Next-Generation Sequencing
Stool suspensions and CSF samples were processed to remove as much nonviral material as possible by using centrifugation, filtration, and endonuclease treatment. RNA was extracted by using the MagNAPure 96 (Roche Diagnostics, https://www.roche.com) automated extraction kit or QIAGEN filters and eluted in 50 µL of elution buffer (Appendix).

Nucleotide Sequences and Phylogenetic Analysis
We conducted VP1 phylogenetic reconstruction with the 1,407 study sequences and 324 sequences extracted from Genbank. We selected region 1 for clade analysis because it is more commonly used for enterovirus typing (24). We performed analysis of region 2 sequences based on sequence clustering, in which both region 1 and 2 were available, such as fulllength sequences or sequences spanning the entire VP1 gene. Sequencing of the 540 nt 3Dpol gene, positions 5825-6364, also was provided for 12 samples with region 1 sequences (Appendix Figure 1). Sanger sequencing indicated that samples did not display double infection and that VP1 and 3Dpol were from 1 virus. Complete genomes (≈7.3 kb) were generated for 48 sequences by using NGS. To compare 3Dpol groupings within EV species B, we downloaded all sequences available from GenBank as of October 18, 2019, that were >70% complete between positions 5825-6364 with <6 ambiguous base positions and <6 undetermined bases and without stop codons. We aligned the downloaded sequences with complete genomes or 3Dpol sequences from our study.
We aligned data by using sequence editor SSE version 1.3 (http://www.virus-evolution.org). We generated maximum-likelihood and neighbor-joining trees for VP1 and 3Dpol regions by using MEGA version 7 (https://www.megasoftware.net) with the optimal model (general time reversible plus invariant sites plus gamma distribution for rates over sites) and 100 bootstraps (25). We analyzed the species B dataset with neighbor-joining and maximum composite likelihood distances.

Nextstrain VP1 Phylodynamic Analysis
The dataset used for Nextstrain phylodynamic analysis comprised 1,285 sequences; 1,215 study sequences (region 1) and 70 complete VP1 sequences extracted from GenBank (Appendix Figure 1). We excluded sequences shorter than 250 bp, sequences from samples collected before 1958, and sequences deemed as outliers during phylogenetic reconstruction. We deemed these outliers recombinants with possible recombination breakpoints within the sequence fragment used made phylogenetic reconstruction impossible.
We aligned sequences by using MAFFT (26). We inferred a phylogenetic tree by using IQ-TREE (27) and generated time-resolved trees by using TreeTime (28) by estimating the mutation rate. When available, we attached to sequences data on country, sample type, E30 clade, age groups, and clinical data, such as whether patients were hospitalized and their symptoms. We provided the resulting Nextstrain build for viewing (https://nextstrain.org/community/enterovirusphylo/echo30-2019/vp1).
Clinical information was available for 734 (56.8%) E30 cases, of which 380 cases had unknown symptomology. For most (28.7%, n = 211) cases, the recorded signs and symptoms suggested meningitis. Symptoms of acute flaccid paralysis were reported in 1 case, encephalitis in 3 cases, and meningoencephalitis in 8 cases. Fever, either as sole symptom or in combination with other signs and symptoms, was recorded in only 52 (7.1%) cases. Unfortunately, not all records were filled in completely, and clinical data were absent for some samples. Other signs and symptoms mentioned were gastrointestinal symptoms in 6 cases, respiratory symptoms in 6, rash in 2, other neurologic symptoms in 4, or other unspecified in 53 cases; 8 cases had no symptoms. We created an interactive representation of age and clinical features of sequences from E30 cases, which we made available on Nextstrain (https://nextstrain.org/community/ enterovirus-phylo/echo30-2019/vp1). Hospitalization status was available for only 17.6% (n = 228) of cases, only 5 of which had no hospitalization. The low fraction of hospitalization reported limited further analysis. No deaths were reported.

Recombination Analysis
We used 110 sequences containing both VP1 and 3Dpol region and complete genome sequences to analyze recombination events between VP1 and the 3′ distal end of the E30 genome (Appendix Figure 1). The E30 3Dpol sequences formed a series of separate clusters interspersed with those of other species B types, indicative of many within-species recombination events during their diversification (Figure 4). We took the entire published sequence dataset and used a nucleotide sequence distance threshold of 8%, based on pairwise sequence comparisons, which divided sequences into distinct groups (Appendix Figure 3), comparable to those derived from a previous analysis of E30 RFs (17). Accordingly, species B could be divided into ≈442 RFs, an indication of the frequency and complexity of recombination events occurring during the evolution of this species. We used Nextstrain to generate an interactive tanglegram of VP1 and 3Dpol RFs (https://nextstrain.org/community/ enterovirus-phylo/echo30-2019/3D:community/enterovirus-phylo/echo30-2019/vp1) ( Figure 5). We found that 3Dpol sequences of G1-G6 formed 8 recombination groups, which were separated by other published E30 variants and by other species B types ( Figure 5). We noted that viruses within most VP1 clades were monophyletic in 3Dpol, but that G1 and G4 each had undergone further recombination ( Figure 5; Appendix Figure  4), a split corresponding to the sublineages evident in the VP1-based tree. The split was identified as a time-related phenomenon, with G1 circulating in 2018 representing a different RF from G1 circulating during 2016 and 2017.

Discussion
A large upsurge of E30 infections was reported in several countries in Europe during 2018 (2). We  conducted a comprehensive molecular characterization of E30 by using VP1, 3Dpol, and whole genome sequences. Our molecular characterization enabled an analysis of the recombination events occurring during E30 diversification in Europe, which can be conducted only when dealing with a single infection. Our study used a large EV sequence dataset collected worldwide, comprising 1,329 E30 sequences collected from 22 countries in Europe during 2016-2018 and was made possible due to the large-scale collaboration between countries through ENPEN and ECDC.
The data clearly demonstrate that analysis based on phylogenetic clade assignment shows differential dominance of many different clades. The upsurge in 2018 was caused by appearance of several different clades or genogroups of E30 viruses; G1 in GGII (7) and G6 in a novel genogroup, GGIII, which we propose in this study. Viruses from both clades had been circulating for >2 years. In total, 6 clades were identified during the study period and circulated in a pattern of rapid turnover of newly emerging genetic lineages and RFs and their relatively rapid disappearance over time, a pattern that is typical for other enteroviruses (1,7,10,12,13,(16)(17)(18)29). In this study, G2 predominated in 2016 and 2017 in central Europe and were subsequently replaced by the G1 and G6 in 2018 (Figures 1, 2). This genetic turnover and the associated string of recombination events during lineage diversification occurred within the 2-to 5-year cyclical pattern of E30 incidence. As expected, each VP1 group corresponded to a separate RF, but G1 underwent a further recombination event as the virus diversified from a common ancestor dated to around 2011 and G4 underwent a further recombination from an ancestor around 2008. Of note, clade G1 showed a time related split in which G1 sequences circulating in 2018 emerged from those circulating in 2016-2017, coinciding with a recombination generating a novel RF. The absence of G3 and G5 sequences in the study population might reflect a generally lower circulation of these strains or perhaps a period of relative quiescence during the survey period. Long-term  surveillance is essential to monitor for potential emergence of these strains in future incidence cycles.
The cocirculation of different E30 clades during the 2018 upsurge and in previous years argues against the idea that the periodic emergence of E30 occurs through the evolution of more pathogenic or transmissible forms of the virus. The cocirculation of several different groups fits better with changes in population susceptibility from birth cohort effects and a breach of a critical immunity level that controls E30 spread within the population (19). The high susceptibility is reflected by the high number of infected infants, who would have no immunity, and adults whom we hypothesize have no or waning immunity. However, another possibility is that the appearance of several, potentially convergent, amino acid substitutions in VP1 among different E30 groups represented a form of antigenic selection for escape from existing population immunity. The clustering of sites under selection in the BC loop associated with receptor interactions is consistent with this possibility. Serologic studies are required to explore this hypothesis.
As shown in the original description of the upsurge (2), a high percentage of cases showed central nervous system involvement, particularly for infants 0-3 months of age and adults 25-44 years of age, consistent with previous observations (2,(30)(31)(32). The distribution of E30 clades varied among age groups; most infections in infants <3 months of age were caused by G1 and symptomatology varied from fever to acute flaccid paralysis. However, analysis of the clinical correlates was limited by incomplete reporting; only 30% of reported E30 infections included history of symptoms, which hampered comparisons of clinical presentation between different clades. Another limitation is the retrospective study design and bias toward severe and hospitalized cases.
Using Nextstrain, we visualized the various categories of demographic and clinical data, clades, and RFs. Unfortunately, G5 could not be inferred due to possible recombination events within the fragment. Complete reconstruction of E30 temporal events with geographic spread was hampered by the inevitably uneven sampling and testing in different years by the different contributing countries.
This study underpins the strength of the ENPEN consortium, which brings together virologists, public health experts, infectious disease doctors, and scientists across Europe to enable rapid detection and early warning through standardized surveillance. Previous studies using Nextstrain with 2 EV-D68 datasets have shown the value of combining demographic and phylogenetic analysis, both as retrospective (33) and real-time analysis (34). The E30 dataset and the 2 EV-D68 datasets (33,34)   time tracking of viruses over time and across countries. These data are of considerable value in infection containment and control of nonpolio enteroviruses. Differences in surveillance systems, case definitions, and sample selection between institutes and countries make standardized data collection difficult, particularly for denominator data. The differences in data collection proved to be a limitation in our study, and the extent of the circulation of the different strains remains unknown. The emergence and disappearance of viruses from different clades across the years suggests that some form of predictive modeling might be undertaken if data were standardized and provided in real-time through networks such as ENPEN.
The mechanisms underlying the complex cyclic pattern of E30 and other enteroviruses and the effects of changing population immunity, antigenic changes, virus diversification, pathogenicity, and recombination need further exploration. The emergence of different enterovirus types, and their associated periodicities and population penetrance, might be driven by multiple mechanisms (19), making outbreak and upsurge prediction complex. However, continued structured surveillance can clarify enterovirus circulation and evolution and slowly aid in unraveling the complex nature of enteroviruses.