Skip directly to site content Skip directly to page options Skip directly to A-Z link Skip directly to A-Z link Skip directly to A-Z link

Disclaimer: Early release articles are not considered as final versions. Any changes will be reflected in the online version in the month the article is officially released.

Volume 31, Supplement—May 2025
SUPPLEMENT ISSUE
Supplement

Large-Scale Genomic Analysis of SARS-CoV-2 Omicron BA.5 Emergence, United States

Author affiliation: Yale School of Public Health, New Haven, Connecticut, USA (K. Pham, C. Chaguza, R. Lopes, T. Cohen, E. Taylor-Salmon, N.D. Grubaugh, V. Hill); Yale School of Medicine, New Haven (E. Taylor-Salmon); Centers for Disease Control and Prevention, Atlanta, Georgia, USA (M. Wilkinson, V. Katebi); Yale University, New Haven (N.D. Grubaugh)

Cite This Article

Abstract

The COVID-19 pandemic has been marked by continuous emergence of novel SARS-CoV-2 variants. Questions remain about the mechanisms with which those variants establish themselves in new geographic areas. We performed a discrete phylogeographic analysis on 18,529 sequences of the SARS-CoV-2 Omicron BA.5 sublineage sampled during February–June 2022 to elucidate emergence of that sublineage in different regions of the United States. The earliest BA.5 sublineage introductions came from Africa, the putative variant origin, but most were from Europe, matching a high volume of air travelers. In addition, we discovered extensive domestic transmission between different US regions, driven by population size and cross-country transmission between key hotspots. We found most BA.5 virus transmission within the United States occurred between 3 regions in the southwestern, southeastern, and northeastern parts of the country. Our results form a framework for analyzing emergence of novel SARS-CoV-2 variants and other pathogens in the United States.

SARS-CoV-2, the causative virus of the COVID-19 pandemic, has demonstrated the ability to evolve into novel variants. The Omicron (B.1.1.529) variant, detected in late 2021 in southern Africa, was deemed a variant of concern by the World Health Organization and soon became dominant in the United States and worldwide (1). Omicron lineages are defined by ≈60 mutations, 32 of those in the spike protein, that have granted evolutionary advantage over co-circulating variants because they enhance intrinsic transmissibility and immune escape (14). New Omicron sublineages, as well as recombinants, have subsequently emerged (5,6). Furthermore, the complex mosaic of immunity in the human population, likely caused by different levels of vaccination or previous infection, indicates the landscape for SARS-CoV-2 variant emergence has changed since the start of the pandemic. With ongoing variant emergence, changing patterns of spread must be elucidated, because those patterns have considerable implications in prevention and mitigation plans.

Recent advances in virus sequencing and phylogenetics has enabled the timely use of large-scale phylogenetic analyses to determine SARS-CoV-2 dynamics (7). Studies have been conducted globally, including in Brazil (8), The Gambia (9), and New Zealand (10), to explore the origins, emergence, and dynamics of SARS-CoV-2 variants. In the United Kingdom, multiple analyses of national-level spread from major population centers have been conducted, showing early spread from the origin(s) of introduction and the seeding and subsequent local transmission to new locations (1114). Furthermore, studies in the United States have shown the increased risk for virus importation among states compared with international origin (15), the importance of superspreading events promoting early transmission (16), and effects of international introductions of the Alpha variant (17).

Figure 1

Number of estimated weekly SARS-CoV-2 infections in study of large-scale genomic analysis of Omicron BA.5 emergence, United States, January 2022–June 2022. Source: https://covidestim.org

Figure 1. Number of estimated weekly SARS-CoV-2 infections in study of large-scale genomic analysis of Omicron BA.5 emergence, United States, January 2022–June 2022. Source: https://covidestim.org

Figure 2

SARS-CoV-2 variant frequency during January–June 2022 in study of large-scale genomic analysis of Omicron BA.5 emergence, United States.

Figure 2. SARS-CoV-2 variant frequency during January–June 2022 in study of large-scale genomic analysis of Omicron BA.5 emergence, United States.

We used a Bayesian discrete phylogeographic framework to determine the introduction and spread of a novel SARS-CoV-2 lineage into different regions of the United States. We focused on Omicron sublineage BA.5 during its global emergence period within the first 6 months of 2022 because of its rapid national spread, long-term persistence, and public health importance (Figures 1, 2). Omicron sublineage BA.5 established itself during times of lower SARS-CoV-2 incidence and remained prominent until the end of 2022 (Figure 1) (18). However, BA.5 never achieved complete dominance in the United States, co-circulating instead with other major Omicron sublineages, such as BA.2.12.1, BA.4, and XBB.1 (5). Moreover, BA.5 dissemination occurred on the background of a highly immune population because of vaccination and previous infections with other Omicron sublineages (5,19). Newer variants are likely to be introduced onto a similar immune landscape; thus, the dynamics of BA.5 introductions and dissemination offer a useful case study for how new lineages might spread across the United States. Furthermore, because most social and travel restrictions have been lifted and data streams have become more limited, clarifying within-country spread will enable targeted surveillance activities in the future.

Methods

Dataset Generation

To define our study period, we balanced having a large enough time period to cover key events with avoiding an intractably large final dataset. Therefore, we compared the frequencies of Omicron BA.5 with other variants in each continent and selected the week for which every continent had a BA.5 frequency of >25% (week commencing June 13, 2022). That cutoff is somewhat arbitrary, but the speed of BA.5 spread on a continental level meant that changing the threshold only resulted in a few weeks’ difference either way (e.g., changing it to 50% added 2 weeks to the dataset; changing to 10% resulted in 1 week less).

We assembled a dataset of BA.5 whole-genome sequences sampled in the United States and globally during the inferred emergence period, estimated to be during February–June 2022. First, we downloaded all sequences that had complete location and collection date metadata from GISAID (https://www.gisaid.org) and had the BA.5 pango lineage designation (20). We then used Nextclade (21) to filter for low-quality control score and genome coverage of <70%. To mitigate sampling bias, we categorized global BA.5 data by continent.

Figure 3

Ten regions of the United States evaluated in large-scale genomic analysis of SARS-CoV-2 Omicron BA.5 emergence. Regions have been designated by the US Department of Health and Human Services (https://www.hhs.gov/about/agencies/regional-offices/index.html).

Figure 3. Ten regions of the United States evaluated in large-scale genomic analysis of SARS-CoV-2 Omicron BA.5 emergence. Regions have been designated by the US Department of Health and Human Services (...

Within the United States, the genomic surveillance policy is largely decided by the individual state, causing potential bias in data from each region (22,23). To ameliorate that disparity, we divided the country into 10 regions according to the locations of the 10 regional offices of the US Department of Health and Human Services (DHHS) (Figure 3). To account for possible selection bias from that heterogeneity, we subsampled the full dataset in 1-week windows proportional to the population of each region. We chose to use population because case counts are also biased between and within countries (especially those as large as the United States) because of varying availability of resources and case definitions. We felt this choice was appropriate for SARS-CoV-2 because so much of each country’s population was infected; thus, in this specific case, we decided that population was a less biased metric on which to base our subsampling scheme than case counts. Specifically, we used the population proportion of the region, either global continent or US region, and multiplied by the total number of BA.5 genomes to find 1 fixed number of genomes (selected every week for that region). The final dataset selected for analysis consisted of 18,529 sequences, 9,350 from the United States and 10,258 from non-US countries (Table). For the emergence period, the earliest sample was collected on February 25, 2022, whereas the latest sample was collected on June 19, 2022.

Phylogeographic Analysis

We performed multiple sequence alignments by using the Nextclade tool, Nextalign (21), and Wuhan-Hu-1/2019 as the reference genome (GenBank accession no. MN908947.3). We then constructed a maximum-likelihood phylogenetic tree by using IQ-TREE version 2.2.2 (24), the Hasegawa-Kishino-Yano nucleotide substitution model (25), and outgroup rooting on the MN908947 reference genome. We assessed the temporal signal by using TempEST version 1.5.3 (26) and found the timeframe for the dataset was too short to have a strong temporal signal (Appendix Figure 1). We were still able to prune molecular clock outliers (27) by using jclusterfunk version 0.0.25 (https://github.com/snake-flu/jclusterfunk).

Because of the large size of the genomic dataset, we used the alternative tree likelihood function in BEAST version 1.10.4, which was developed for efficient estimation of large phylogenies (13,28). We used maximum-likelihood trees described previously for the topologic estimation and time-calibrated those trees approximately by using TreeTime version 0.9.4 (29) to reduce the percentage of states that needed to be discarded for burn-in.

Because of the low temporal signal in the dataset, we fixed the clock rate at 8 × 10–4 substitutions/site/year, as previously described (3032). We used the nonparametric Skygrid coalescent model (33) with 23 grid points defined according to approximately equal intervals within the global emergence period. We ran 2 Markov chain Monte Carlo chains for 1 billion iterations each to ensure convergence with the same part of the posterior distribution. We used Tracer version 1.7.1 d to assess convergence after run completion and discarded 10% of Markov model states for burn-in (34).

We chose 1 random tree from the post–burn-in posterior distribution from the previous analysis to use as the fixed tree in a discrete trait analysis. We analyzed 2 separate geographic scales: 1 analysis at the global level, which included 6 continents (Africa, Asia, Europe, North America without the United States, Oceania, and South America) and the United States as a country; and 1 analysis at the US national level, which included 10 DHHS regions using the continental dataset as background global context. In both analyses, we used an asymmetric continuous-time Markov chain to estimate transition rates between locations. Each chain ran for 2 million states; we discarded 10% of Markov model states for burn-in.

For the international analysis, we used custom Python scripts to estimate the average number of introductions across each tree in the posterior distribution and then selected a final tree closest to that average number. For the domestic analysis, we chose a random tree in the posterior distribution to maintain stability of clades within the United States across the analysis. We defined an introduction event as the point in which a node is in a different location than its parent, either originating from another continent into the United States (for international introduction) or between US regions (for domestic analysis). We did not account for reintroduction within the same clade; thus, once the location changed to the United States, that clade was counted as only 1 introduction event. If a node in 1 subtree coincided with a node in another subtree, we only counted the node that had the older root and eliminated the other. We determined the size of an introduction to be the number of sequences that immediately followed a change in location within a node. We estimated the time of introduction as halfway between the first US/domestic location node and its parent. We generated figures by using custom Python scripts and trees by using the Baltic Python package (https://github.com/evogytis/baltic).

To examine drivers of BA.5 domestic spread, we constructed a linear regression model that incorporated geographic proximity and population. Within the model, the proportion of directional domestic introductions between a pair of US regions was the outcome; the 2 independent variables were the binary neighboring relationship between that pair and the numeric total population of the 2 regions. We obtained population data from the US Census Bureau (https://www.census.gov).

Travel Data

To examine possible factors affecting BA.5 spread in different US regions, we collected data for monthly international and domestic air travel into US states during February–June 2022 (34). Those data were adjusted air passenger estimates, sampled according to ticket sales and reporting from airline carriers and assumed to represent 100% of the market. Adjusted travel volume represents the aggregate number of passenger journeys, not necessarily unique persons. We defined passenger journeys as airline transport between original embarkment and disembarkment in the United States. Both direct and indirect (i.e., connecting) flights were included.

Data Availability

The flight travel volume data were provided by OAG Aviation Worldwide Ltd. OAG Traffic Analyser, version 2.6.1 (http://analytics.oag.com/analyser-client/home; accessed 2023 Apr 24). The data were used under the US Centers for Disease Control and Prevention license for the current study and so are not publicly available. The authors are available to share the air passenger data upon reasonable request and with the permission of OAG Aviation Worldwide Ltd.

We obtained all genomic data from GISAID (acknowledgements table at https://doi.org/10.55876/gis8.240620dg). The XML files and outputs from the BEAST analyses are also available (https://github.com/grubaughlab/2025_paper_BA.5_United-States).

Results

International Introductions of Omicron BA.5 Sublineage into the United States

Figure 4

Time-scaled phylogeographic analysis of SARS-CoV-2 Omicron BA.5 sequences in the United States during January–June 2022. Analysis of BA.5 emergence was conducted by using 18,529 sequences collected globally and in the United States. Purple dotted line indicates the inferred date of the first introduction. The blue dotted line indicates the first sample of BA.5 sequenced in the United States. Colors indicate origin of the BA.5 variant.

Figure 4. Time-scaled phylogeographic analysis of SARS-CoV-2 Omicron BA.5 sequences in the United States during January–June 2022. Analysis of BA.5 emergence was conducted by using 18,529 sequences collected globally and in the...

Figure 5

Numbers and timeline of domestic and international introductions of SARS-CoV-2 Omicron BA.5 in the United States during January–June 2022.

Figure 5. Numbers and timeline of domestic and international introductions of SARS-CoV-2 Omicron BA.5 in the United States during January–June 2022.

Figure 6

Air travel volume into different regions of the United States in study of large-scale genomic analysis of SARS-CoV-2 Omicron BA.5 emergence, January–June 2022. Domestic and international air travel volume are indicated. Regions designated by the US Department of Health and Human Services are shown in Figure 3.

Figure 6. Air travel volume into different regions of the United States in study of large-scale genomic analysis of SARS-CoV-2 Omicron BA.5 emergence, January–June 2022. Domestic and international air travel volume are...

Figure 7

Total number of introductions of Omicron BA.5 into regions of the United States in study of large-scale genomic analysis of SARS-CoV-2 BA.5 emergence, January–June 2022. Cumulative numbers during the study period are indicated according to domestic or international origin. Regions designated by the US Department of Health and Human Services are shown in Figure 3.

Figure 7. Total number of introductions of Omicron BA.5 into regions of the United States in study of large-scale genomic analysis of SARS-CoV-2 BA.5 emergence, January–June 2022. Cumulative numbers during the study...

Figure 8

Spatiotemporal dynamics of international introductions of SARS-CoV-2 Omicron BA.5 lineage into the United States during February–June 2022. A) Numbers and timeline of BA.5 introduction events according to continent. B) Total introduction cluster size (number of sequences) of BA.5 international introduction events into the United States during the entire study period. Size was determined by the number of sequences per introduction. GISAID, https://www.gisaid.org; tMRCA, time to most recent common ancestor.

Figure 8. Spatiotemporal dynamics of international introductions of SARS-CoV-2 Omicron BA.5 lineage into the United States during February–June 2022. A) Numbers and timeline of BA.5 introduction events according to continent. B) Total...

We examined the dynamics of BA.5 global introductions into the United States by using a discrete phylogeographic analysis at the continent level and between US regions (Figure 3) and reconstructed introductions across the resulting phylogeny (Figure 4). An average of 1,168 (95% CI 1,137–1,198) introductions occurred from other continents into the United States across the posterior distribution of the entire period (January 1, 2022, through the week of June 13, 2022). The inferred time of the first introduction into the United States was the second week of February 2022, nearly 3 weeks before the collection date of the earliest US sequence on February 26, 2022 (Figure 4). During the earlier part of this emergence period (until mid-May 2022), most (68%) introductions were from international importation (Figure 5), despite air travel in the United States being predominantly between US regions during the study period (domestic volume was ≈80% of all air travel volume) (Figure 6). After BA.5 became established in the United States in mid-May 2022 (Figure 2), 72% of the between-region introductions came from domestic sources (Figure 7). During the entire study period, most international introductions came from Asia (27.8%), Europe (26.3%), and Africa (14.7%) (Figure 8, panel A).

Figure 9

Associations between travel from different countries and number and cluster size of SARS-CoV-2 Omicron BA.5 introductions into the United States, February–June 2022. A) Linear regresssions indicating associations between the number of introductions into the United States from different continents and international travel volume according to that continent. (B) Cluster sizes of BA.5 introductions originating from different continents into the 10 Department of Health and Human Services regions of the United States. Regions designated by the US Department of Health and Human Services are shown in Figure 3.

Figure 9. Associations between travel from different countries and number and cluster size of SARS-CoV-2 Omicron BA.5 introductions into the United States, February–June 2022. A) Linear regresssions indicating associations between the number...

We observed a chronological change in the relative dominance of continents as origins of BA.5 introductions into the United States (Figure 8). Introductions from Africa, despite only representing 14.7% of total BA.5 international introductions, comprised 41.9% of all international introductions before mid-May 2022. A high rate of introductions from Africa into all 10 US regions occurred, despite low travel volumes (Figure 9; Appendix Figures 2, 3). Indeed, Africa had the highest ratio of BA.5 introductions per travel volume, at ≈0.3 introductions per 1,000 passengers (Figure 9, panel A), likely because the BA.5 sublineage originated in Africa. As BA.5 prevalence increased globally, introductions from Europe, Asia, and North America became more critical (Figures 4, 5, 8), matching high travel volumes from those areas. Therefore, early emergence was determined by the variant’s geographic origin, whereas later introductions were connected to travel volume.

We examined the effect of timing on the size of international introduction events. During the first BA.5 introduction into the United States in early February 2022 and its detection ≈3 weeks later, 5 total introductions occurred (Figures 4, 8). Although 4 of those were singletons, 1 introduction from Africa during late February contained 3,980 sequences, the largest during the entire study period (Figures 8, panel B; Figure 9, panel B). Cluster size was highest during early introductions and decreased over time (Figure 8, panel B). Introduction events from Africa, most occurring earlier during the study period, tended to have high outbreak clade sizes; 9 clusters had >100 sequences (Figure 9, panel B). Introductions from Europe had only 4 clusters with >100 sequences; no other global regions had clusters of that size (Figure 9, panel B).

We found 2 main phases of BA.5 emergence in the United States. Large introductions from Africa dominated the early emergence phase before May 2022. As prevalence increased globally, international introductions had greater ties to air travel volume; hence, more introductions came from Europe, Asia, and North America. Because of a decrease in the susceptible population and possible behavior changes after an uptick in Omicron BA.5 cases, introductions from Europe, Asia, and North America did not expand as much as the earlier events from Africa.

Domestic Movement of Omicron BA.5 in the United States

To evaluate BA.5 transmission within the United States, we performed a discrete phylogeographic analysis using 10 DHHS-defined regions (Figure 3; Appendix Table 1). We inferred 3,137 within-country introductions across a single posterior tree, ≈70% of total introductions across the entire study period. Early international introduction events were followed by substantial domestic transmission (Figures 4, 5), and all 10 DHHS regions received >50% of their introductions from domestic sources (Figure 6). Those domestic movements grew in proportion throughout the study period and overtook the number of international introductions (Figures 5), aligning with the high (80%) proportion of domestic air travel (Figure 6).

Figure 10

Time-scaled phylogenetic analysis of domestic SARS-CoV-2 Omicron BA.5 introductions between regions within the United States during February–June 2022. Phylogenies of the 3 largest and earliest US clades are indicated. Trees were rooted according to region 2 (A), region 9 (B), and region 4 (C). Regions designated by the US Department of Health and Human Services are shown in Figure 3.

Figure 10. Time-scaled phylogenetic analysis of domestic SARS-CoV-2 Omicron BA.5 introductions between regions within the United States during February–June 2022. Phylogenies of the 3 largest and earliest US clades are indicated. Trees...

Figure 11

SARS-CoV-2 Omicron BA.5 movements between regions within the United States from study of BA.5 emergence during January–June 2022. Thickness of the lines indicates the prevalence of the movement across the maximum clade credibility tree; arrows indicate direction of introduction.

Figure 11. SARS-CoV-2 Omicron BA.5 movements between regions within the United States from study of BA.5 emergence during January–June 2022. Thickness of the lines indicates the prevalence of the movement across the...

No noticeable geographic structure within the phylogeny was observed (i.e., sequences from different locations were intermixed, implying frequent interregion transmission during the emergence period) (Figure 4). Inspection of the 3 largest and earliest US clades, rooted in region 2 (several northeastern states, including New York), region 9 (southwestern states, including California), and region 4 (southeastern states, including Florida) (Figure 10), indicated that geographically close locations tended to have more interregion movement. Clades from region 2 and 4 were primarily transmitted to other East Coast regions, and clades from region 9 were transmitted to other West Coast and West/Central regions (Figures 10, 11). Nonetheless, interactions between regions 4 and 9, and to a lesser extent regions 2 and 5 (including Illinois), indicated coast-to-coast spread was a critical BA.5 emergence mechanism.

Figure 12

Number of domestic SARS-CoV-2 Omicron BA.5 introductions into each region of the United States in study of BA.5 emergence during January–June 2022. Regions designated by the US Department of Health and Human Services are indicated in Figure 3.

Figure 12. Number of domestic SARS-CoV-2 Omicron BA.5 introductions into each region of the United States in study of BA.5 emergence during January–June 2022. Regions designated by the US Department of Health...

Several key hotspots for transmission existed. All DHHS regions had considerably higher introduction counts originating from regions 4 and 9 (Figure 12). The interaction rate between regions 4 and 9 and other regions represented 71.6% of total domestic BA.5 movements (Figure 12). Correspondingly, regions 4 and 9 also had the highest volumes of both international and domestic air travel (Figure 6). Region 1 (New England) had the highest (≈70%) number of incoming domestic introductions originating from regions 4 and 9 (Figure 12). Therefore, the strong transmission from regions 4 and 9 likely underpinned BA.5 emergence in the United States. We also theorize that region 1 was the top recipient of domestic introduction events because of the higher rate of interstate travel between regions 1–3, as well as incoming air travel from other regions (Figures 6, 11).

To explore possible underlying drivers of virus movement across the United States, we performed linear regressions between pairs of locations, using population sizes and whether those locations shared a border as predictors. We found that the population size of the origin location was a significant predictor for the number of virus movements between a pair of locations (p<0.0001). In comparison, the destination population combined with whether the 2 locations shared a land border was not a significant predictor for virus movement (p>0.1) (Appendix Table 2).

Discussion

As SARS-CoV-2 continues to spread in the United States and globally, it will be essential to elucidate how new variants disseminate. We found that Omicron BA.5 was first introduced into the United States primarily from its geographic origin in Africa and then spread domestically from large populations and key hotspots, which are common between variants.

The earliest BA.5 introductions into the United States came from Africa despite low rates of air travel, indicating the importance of a variant’s geographic origin. Early introduction events were also much larger than later introductions, which is a common thread among the waves of SARS-CoV-2 across the globe, despite different demographic and intervention contexts (13). As prevalence rose globally in the later half of the study period, a higher proportion of introductions from Europe and Asia occurred, potentially corresponding to higher travel volume (35). Similar dynamics have occurred with Delta variant introductions into the United Kingdom (11). The combination of the earliest introductions being the most important and later introductions coming from many locations makes international travel restrictions challenging to implement, even aside from ethical concerns (36); the speed required to prevent the most critical early introductions from a particular origin, if it is even known, is unachievable in most settings.

Domestic transmission played a substantial role in BA.5 dissemination in the United States. Whereas rates of interregion transmission exceeded those of global importation across the entire study period, most domestic virus movement occurred during the later phase. We show widespread secondary transmission occurred across the United States after the initial international introduction, which corroborates previous findings indicating SARS-CoV-2 transmission is driven by domestic dynamics (15,17). The domestic BA.5 spread was significantly associated with population size of the origin location, which fits with previous descriptions of SARS-CoV-2 transmission starting from large urban centers into other areas (37,38). Along with geographic proximity being somewhat essential, that finding fits a classical gravity model of disease transmission (39).

Cross-country BA.5 spread between DHHS regions 2, 4, and 9 highlight the role of specific hotspots in promoting BA.5 emergence. Those 3 regions received the most introductions from Africa and had the 3 largest and earliest US clades, playing a critical role in receiving and disseminating early BA.5 introductions. That finding is similar to the dissemination of the SARS-CoV-2 Alpha variant (17); New York, New York, received the most introductions from the Alpha variant’s origin, followed by California and Florida. Therefore, we might expect those regions to be critical during future variant introductions. Furthermore, we found that region 1 (New England) was the highest recipient of domestic introductions, likely from high interaction rates with 2 of the key hotspots (regions 2 and 4). We suggest that regions 2, 4, and 9 were primary hotspots because of their major urban centers (e.g., New York, Atlanta, and Los Angeles). Those findings fit the description of early virus lineage movements between larger cities, followed by spatial expansion into nearby areas (14).

The first limitation of our study is that our subsampling method reflected the broader inequality in genomic surveillance worldwide (22,23). We attempted to minimize those biases through subsampling and categorization into broader continents and US DHHS regions. Rooting our tree in Africa, despite sequences from Europe overwhelming the global dataset, suggests that our attempts to mitigate this international bias were somewhat successful. Our categorization into larger regions (within and outside the United States) might have introduced residual confounding, preventing exploration of interstate introduction events. We also chose to use population size to subsample, rather than case-based metrics that might appear more relevant. However, obtaining unbiased incidence/hospitalization/death estimates during an outbreak is challenging, especially when comparing large geographic areas, such as the United States or entire continents. All data are imperfect sources of information in this context because large amounts of heterogeneity exist in how those data are recorded because of resource limitations, varying case definitions, and political concerns. We therefore used population size, which we concluded should be less biased. Second, geographic variation in sequencing efforts might have affected our cluster size results by artificially increasing the size of introductions from Africa compared with Europe (i.e., there might be missing sequences from Africa, which would split clusters into smaller introductions). Our downsampling scheme should have helped mitigate this limitation, and the pattern of early and large introductions fits with other settings (13). Third, we defined the variant emergence phase according to a frequency growth curve to filter for early BA.5 sequences, which we deemed essential to our research; that definition might not properly reflect the true emergence time for a novel variant, although this only changes the length of our study period. Finally, we did not test other factors that might have driven the international introduction of Omicron BA.5 into the United States, such as distance through air networks or income levels.

In conclusion, our findings support the role of phylogenetics in SARS-CoV-2 surveillance and contribute a phylogeographic framework for studying the emergence of other infectious pathogens in the United States. Countries have lifted pandemic restrictions and the general population has a mosaic of immunity; thus, the epidemiologic landscape presents opportunities for positive selection of novel SARS-CoV-2 variants. Determining the different dynamics of introduction in US regions will be critical for timely and cost-effective policymaking, particularly for health authorities. Our methods can be used to extend beyond SARS-CoV-2 analyses and can form a framework for phylogeographic analysis of large datasets to discern the spatiotemporal spread of other novel pathogens.

Mr. Pham holds an MPH from Yale School of Public Health in Connecticut and works as a technical officer/data analyst for the Program for Appropriate Technology in Health (PATH), Southeast Asia regional hub and adjunct lecturer at Hanoi Medical University in Vietnam. His research interests focus on infectious disease modeling and phylogenetic methods.

Top

Acknowledgments

We thank Anne Hahn and Nicholas Chen for their help with this study and everyone who has contributed genomic data to GISAID, which makes work like this possible. We gratefully acknowledge all data contributors; that is, the authors and their originating laboratories responsible for obtaining the specimens and their submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based.

This work was supported by the Centers for Disease Control and Prevention Broad Agency Announcement (contract nos. 75D30122C14697 and 75D30120C09570) (to N.D.G.).

N.D.G. is a paid consultant for Pfizer-BioNTech.

Top

References

  1. Viana  R, Moyo  S, Amoako  DG, Tegally  H, Scheepers  C, Althaus  CL, et al. Rapid epidemic expansion of the SARS-CoV-2 Omicron variant in southern Africa. Nature. 2022;603:67986. DOIPubMedGoogle Scholar
  2. Chaguza  C, Coppi  A, Earnest  R, Ferguson  D, Kerantzas  N, Warner  F, et al. Rapid emergence of SARS-CoV-2 Omicron variant is associated with an infection advantage over Delta in vaccinated persons. Med (N Y). 2022;3:325334.e4. DOIPubMedGoogle Scholar
  3. Cao  Y, Wang  J, Jian  F, Xiao  T, Song  W, Yisimayi  A, et al. Omicron escapes the majority of existing SARS-CoV-2 neutralizing antibodies. Nature. 2022;602:65763. DOIPubMedGoogle Scholar
  4. Tegally  H, Moir  M, Everatt  J, Giovanetti  M, Scheepers  C, Wilkinson  E, et al.; NGS-SA consortium. Emergence of SARS-CoV-2 Omicron lineages BA.4 and BA.5 in South Africa. Nat Med. 2022;28:178590. DOIPubMedGoogle Scholar
  5. Ma  KC, Shirk  P, Lambrou  AS, Hassell  N, Zheng  XY, Payne  AB, et al. Genomic surveillance for SARS-CoV-2 variants: circulation of Omicron lineages—United States, January 2022–May 2023. MMWR Morb Mortal Wkly Rep. 2023;72:6516. DOIPubMedGoogle Scholar
  6. Chakraborty  C, Bhattacharya  M, Chopra  H, Islam  MA, Saikumar  G, Dhama  K. The SARS-CoV-2 Omicron recombinant subvariants XBB, XBB.1, and XBB.1.5 are expanding rapidly with unique mutations, antibody evasion, and immune escape properties - an alarming global threat of a surge in COVID-19 cases again? Int J Surg. 2023;109:10413. DOIPubMedGoogle Scholar
  7. Hill  V, Githinji  G, Vogels  CBF, Bento  AI, Chaguza  C, Carrington  CVF, et al. Toward a global virus genomic surveillance network. Cell Host Microbe. 2023;31:86173. DOIPubMedGoogle Scholar
  8. Giovanetti  M, Slavov  SN, Fonseca  V, Wilkinson  E, Tegally  H, Patané  JSL, et al. Genomic epidemiology of the SARS-CoV-2 epidemic in Brazil. Nat Microbiol. 2022;7:1490500. DOIPubMedGoogle Scholar
  9. Kanteh  A, Jallow  HS, Manneh  J, Sanyang  B, Kujabi  MA, Ndure  SL, et al. Genomic epidemiology of SARS-CoV-2 infections in The Gambia: an analysis of routinely collected surveillance data between March, 2020, and January, 2022. Lancet Glob Health. 2023;11:e41424. DOIPubMedGoogle Scholar
  10. Douglas  J, Winter  D, McNeill  A, Carr  S, Bunce  M, French  N, et al. Tracing the international arrivals of SARS-CoV-2 Omicron variants after Aotearoa New Zealand reopened its border. Nat Commun. 2022;13:6484. DOIPubMedGoogle Scholar
  11. McCrone  JT, Hill  V, Bajaj  S, Pena  RE, Lambert  BC, Inward  R, et al.; COVID-19 Genomics UK (COG-UK) Consortium. Context-specific emergence and growth of the SARS-CoV-2 Delta variant. Nature. 2022;610:15460. DOIPubMedGoogle Scholar
  12. Kraemer  MUG, Hill  V, Ruis  C, Dellicour  S, Bajaj  S, McCrone  JT, et al.; COVID-19 Genomics UK (COG-UK) Consortium. Spatiotemporal invasion dynamics of SARS-CoV-2 lineage B.1.1.7 emergence. Science. 2021;373:88995. DOIPubMedGoogle Scholar
  13. du Plessis  L, McCrone  JT, Zarebski  AE, Hill  V, Ruis  C, Gutierrez  B, et al.; COVID-19 Genomics UK (COG-UK) Consortium. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science. 2021;371:70812. DOIPubMedGoogle Scholar
  14. Tsui  JLH, McCrone  JT, Lambert  B, Bajaj  S, Inward  RPD, Bosetti  P, et al.; COVID-19 Genomics UK (COG-UK) consortium¶. Genomic assessment of invasion dynamics of SARS-CoV-2 Omicron BA.1. Science. 2023;381:33643. DOIPubMedGoogle Scholar
  15. Fauver  JR, Petrone  ME, Hodcroft  EB, Shioda  K, Ehrlich  HY, Watts  AG, et al. Coast-to-coast spread of SARS-CoV-2 during the early epidemic in the United States. Cell. 2020;181:990996.e5. DOIPubMedGoogle Scholar
  16. Zeller  M, Gangavarapu  K, Anderson  C, Smither  AR, Vanchiere  JA, Rose  R, et al. Emergence of an early SARS-CoV-2 epidemic in the United States. Cell. 2021;184:49394952.e15. DOIPubMedGoogle Scholar
  17. Alpert  T, Brito  AF, Lasek-Nesselquist  E, Rothman  J, Valesano  AL, MacKay  MJ, et al. Early introductions and transmission of SARS-CoV-2 variant B.1.1.7 in the United States. Cell. 2021;184:25952604.e13. DOIPubMedGoogle Scholar
  18. Dong  E, Du  H, Gardner  L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20:5334. DOIPubMedGoogle Scholar
  19. Klaassen  F, Chitwood  MH, Cohen  T, Pitzer  VE, Russi  M, Swartwood  NA, et al. Changes in population immunity against infection and severe disease from severe acute respiratory syndrome coronavirus 2 Omicron variants in the United States between December 2021 and November 2022. Clin Infect Dis. 2023;77:35561. DOIPubMedGoogle Scholar
  20. Rambaut  A, Holmes  EC, O’Toole  Á, Hill  V, McCrone  JT, Ruis  C, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020;5:14037. DOIPubMedGoogle Scholar
  21. Aksamentov  I, Roemer  C, Hodcroft  EB, Neher  RA. Nextclade: clade assignment, mutation calling and quality control for viral genomes. J Open Source Softw. 2021;6:3773. DOIGoogle Scholar
  22. Brito  AF, Semenova  E, Dudas  G, Hassler  GW, Kalinich  CC, Kraemer  MUG, et al.; Swiss SARS-CoV-2 Sequencing Consortium. Global disparities in SARS-CoV-2 genomic surveillance. Nat Commun. 2022;13:7003. DOIPubMedGoogle Scholar
  23. Abbasi  J. How the US failed to prioritize SARS-CoV-2 variant surveillance. JAMA. 2021;325:13802. DOIPubMedGoogle Scholar
  24. Minh  BQ, Schmidt  HA, Chernomor  O, Schrempf  D, Woodhams  MD, von Haeseler  A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:15304. DOIPubMedGoogle Scholar
  25. Hasegawa  M, Kishino  H, Yano  T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985;22:16074. DOIPubMedGoogle Scholar
  26. Rambaut  A, Lam  TT, Max Carvalho  L, Pybus  OG. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2016;2:vew007. DOIPubMedGoogle Scholar
  27. Hill  V, Baele  G. Bayesian estimation of past population dynamics in BEAST 1.10 using the Skygrid coalescent model. Mol Biol Evol. 2019;36:26208. DOIPubMedGoogle Scholar
  28. Suchard  MA, Lemey  P, Baele  G, Ayres  DL, Drummond  AJ, Rambaut  A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4:vey016. DOIPubMedGoogle Scholar
  29. Sagulenko  P, Puller  V, Neher  RA. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol. 2018;4:vex042. DOIPubMedGoogle Scholar
  30. Hodcroft  EB, Zuber  M, Nadeau  S, Vaughan  TG, Crawford  KHD, Althaus  CL, et al.; SeqCOVID-SPAIN consortium. Spread of a SARS-CoV-2 variant through Europe in the summer of 2020. Nature. 2021;595:70712. DOIPubMedGoogle Scholar
  31. Aggarwal  D, Warne  B, Jahun  AS, Hamilton  WL, Fieldman  T, du Plessis  L, et al.; Cambridge Covid-19 testing Centre; University of Cambridge Asymptomatic COVID-19 Screening Programme Consortium; COVID-19 Genomics UK (COG-UK) Consortium. Genomic epidemiology of SARS-CoV-2 in a UK university identifies dynamics of transmission. Nat Commun. 2022;13:751. DOIPubMedGoogle Scholar
  32. Ghafari  M, du Plessis  L, Raghwani  J, Bhatt  S, Xu  B, Pybus  OG, et al. Purifying selection determines the short-term time dependency of evolutionary rates in SARS-CoV-2 and pH1N1 influenza. Mol Biol Evol. 2022;39:msac009. DOIPubMedGoogle Scholar
  33. Gill  MS, Lemey  P, Faria  NR, Rambaut  A, Shapiro  B, Suchard  MA. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol Biol Evol. 2013;30:71324. DOIPubMedGoogle Scholar
  34. Rambaut  A, Drummond  AJ, Xie  D, Baele  G, Suchard  MA. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst Biol. 2018;67:9014. DOIPubMedGoogle Scholar
  35. Lemey  P, Rambaut  A, Bedford  T, Faria  N, Bielejec  F, Baele  G, et al. Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog. 2014;10:e1003932. DOIPubMedGoogle Scholar
  36. Jecker  NS, Atuire  C. Who’s in? Who’s out? The ethics of COVID-19 travel rules. The Conversation, Nov 30, 2021 [cited 2024 May 22]. https://theconversation.com/whos-in-whos-out-the-ethics-of-covid-19-travel-rules-172053
  37. McBroome  J, Martin  J, de Bernardi Schneider  A, Turakhia  Y, Corbett-Detig  R. Identifying SARS-CoV-2 regional introductions and transmission clusters in real time. Virus Evol. 2022;8:veac048. DOIPubMedGoogle Scholar
  38. Barajas-Carrillo  VW, Covantes-Rosales  CE, Zambrano-Soria  M, Castillo-Pacheco  LA, Girón-Pérez  DA, Mercado-Salgado  U, et al. SARS-CoV-2 transmission risk model in an urban area of Mexico, based on GIS analysis and viral load. Int J Environ Res Public Health. 2022;19:3840. DOIPubMedGoogle Scholar
  39. Truscott  J, Ferguson  NM. Evaluating the adequacy of gravity models as a description of human mobility for epidemic modelling. PLOS Comput Biol. 2012;8:e1002699. DOIPubMedGoogle Scholar

Top

Figures
Table

Top

Cite This Article

DOI: 10.3201/eid3113.240981

Original Publication Date: May 02, 2025

1Current affiliation: Program for Appropriate Technology in Health (PATH) Southeast Asia, and Hanoi Medical University, Hanoi, Vietnam.

Table of Contents – Volume 31, Supplement—May 2025

EID Search Options
presentation_01 Advanced Article Search – Search articles by author and/or keyword.
presentation_01 Articles by Country Search – Search articles by the topic country.
presentation_01 Article Type Search – Search articles by article type and issue.

Top

Comments

Please use the form below to submit correspondence to the authors or contact them at the following address:

Verity Hill, Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT 06510, USA

Send To

10000 character(s) remaining.

Top

Page created: March 24, 2025
Page updated: May 02, 2025
Page reviewed: May 02, 2025
The conclusions, findings, and opinions expressed by authors contributing to this journal do not necessarily reflect the official position of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.
file_external