Spatiotemporal Analyses of 2 Co-Circulating SARS-CoV-2 Variants, New York State, USA

The emergence of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants in late 2020 and early 2021 raised alarm worldwide because of their potential for increased transmissibility and immune evasion. Elucidating the evolutionary and epidemiologic dynamics among novel SARS-CoV-2 variants is essential for understanding the trajectory of the coronavirus disease pandemic. We describe the interplay between B.1.1.7 (Alpha) and B.1.526 (Iota) variants in New York State, USA, during December 2020–April 2021 through phylogeographic analyses, space-time scan statistics, and cartographic visualization. Our results indicate that B.1.526 probably evolved in New York City, where it was displaced as the dominant lineage by B.1.1.7 months after its initial appearance. In contrast, B.1.1.7 became dominant earlier in regions with fewer B.1.526 infections. These results suggest that B.1.526 might have delayed the initial spread of B.1.1.7 in New York City. Our combined spatiotemporal methodologies can help disentangle the complexities of shifting SARS-CoV-2 variant landscapes.

The emergence of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants in late 2020 and early 2021 raised alarm worldwide because of their potential for increased transmissibility and immune evasion. Elucidating the evolutionary and epidemiologic dynamics among novel SARS-CoV-2 variants is essential for understanding the trajectory of the coronavirus disease pandemic. We describe the interplay between B.1. This study employed spatial scan statistics paired with phylogeographic analyses to describe the shifting SARS-CoV-2 variant landscape in NYS during December 2020-April 2021, specifically the interplay between co-circulating B.1.526 and B.1.1.7 lineages. Our findings elucidate the dynamics of competing SARS-CoV-2 variants at a time when the highly transmissible VOC Delta had overtaken B.1.1.7 worldwide and future variant displacements were likely to occur.

Sample Acquisition and RNA Extraction
This study was approved by the NYSDOH Institutional Review Board, under study numbers 02-054 and 07-022. The NYSDOH Wadsworth Center coordinated with >30 clinical laboratories throughout NYS that routinely submitted respiratory swabs positive for SARS-CoV-2 for whole-genome sequencing (WGS). Specimens were required to have a real-time cycle threshold value <30. We performed nucleic acid extraction on a MagNAPure 96 with the Viral NA Small Volume Kit (Roche, https://www.roche.com) with 100 μL sample input and 100 μL eluate.

Sample Inclusion Criteria
We included specimens with collection dates during December 2020-April 2021 with ZIP codes of patient addresses. We removed specimens that were prescreened for specific mutations or for clinical or epidemiologic criteria. For persons with multiple specimens collected, we included only the earliest specimen.

COVID Incidence Calculation
We obtained monthly COVID-19 case counts by ZIP code from online NYC COVID-19 data (https:// github.com/nychealth/coronavirus-data) and from the NYSDOH Communicable Disease Electronic Surveillance System. We included reports with case status of confirmed or probable in the case count and assigned a month on the basis of diagnosis date. We converted ZIP code data to ZIP code tabulation area (ZCTA) and calculated incidence using population data from the 2019 1-year American Community Survey estimates (https://data.census.gov/cedsci).

Retrospective Multinomial Space-Time Scan Statistic
We used the retrospective multinomial space-time scan statistic in SaTScan version 9.6 and applied the nonordinal method (12,13). We calculated estimated SARS-CoV-2 variant data for each ZCTA-month aggregation by multiplying the proportion of either B.1.1.7, B.1.526, or other variants in our sample by the total number of COVID-19 cases.
We set the maximum cluster size parameter a priori to 10% of the population at risk (14) . Spacetime cluster detection in SaTScan has a noted limitation where the size of clusters cannot change over time (15,16). Given that our data are aggregated to the temporal unit of months (December 2020-April 2021), we set the maximum temporal cluster size parameter to 1 month, to enable clusters to change their shape from month to month by being designated as new clusters (Appendix).

Inverse-Distance Weighted Interpolation and Spatial Average of SARS-CoV-2 Whole-Genome Sequencing
We used inverse-distance weighted (IDW) interpolation to visualize the spatiotemporal variation in the proportion of COVID-19 cases attributable to each SARS-CoV-2 variant in NYS and to provide estimates for these proportions in areas where we were missing data (17). We assigned the percentage of COVID-19 cases attributable to each variant per ZCTA to the ZCTA's centroid for the IDW calculation. IDW interpolation generated a continuous surface of values representing the percentage of total COVID-19 cases attributed to B.1.1.7 and B.1.526, which we then averaged over each ZCTA geometry.
We then multiplied the estimated percentage of each SARS-CoV-2 variant generated from IDW interpolation by the total number of COVID-19 cases for each ZCTA and month to estimate the total number of COVID-19 cases attributable to each variant. Estimated numbers of variant cases generated geographic mean centers for each month of the study period (18) (Appendix).

Phylogeographic Analyses
We incorporated into the analysis all NYS B.1.526 genomes generated by Wadsworth from the study period, barring a small fraction that did not pass quality control and those removed as redundant (Appendix). We downloaded all B.1.526 genomes from the United States and associated metadata (excluding NYS sequences) from GISAID (https://www.gisaid.org) and randomly subsampled them proportionally to their overall frequency in the United States. The final dataset included B.1.526 genomes from domestic locations (Massachusetts, New Jersey, Pennsylvania, Connecticut, California, Florida, Maryland, Michigan, Minnesota, and North Carolina), the 5 boroughs of NYC (Bronx, Brooklyn, Queens, Staten Island, and Manhattan), Long Island, the Hudson Valley and upstate New York (Western NYS, the Finger Lakes, the Capital District, and Central NYS regions). We aligned genomes in mafft 7.475 (19), masking problematic sites (https://github.com/W-L/Problemat-icSites_SARS-CoV2). We generated a maximumlikelihood phylogeny in IQTree 1.6.12 (20) with 1,000 ultrafast bootstrap replicates (21) and time-calibrated it in TreeTime 0.7.6 (22). This tree served as the fixed tree for ancestral state reconstruction in Beast 2.6.2 (23) to infer timing and source of B.1.526 introductions within NYS. We allowed the Bayesian analysis to run for >4 million generations and monitored it in Tracer 1.7.1 (24) until the effective sample size of all parameters >200 and the Markov chain Monte Carlo appeared to reach stationarity.
We conducted a B.1.1.7 phylogeographic analysis in the same manner with the states inferred for a fixed topology until all effective sample sizes reached >200. The final dataset included B.1.1.7 genomes from domestic locations (Massachusetts, New Jersey, Pennsylvania, Connecticut, California, and Florida) NYC, Long Island, Mid-Hudson, Finger Lakes, southwestern NYS (the Southern Tier and western regions of NYS) and Northern NYS (Capital District, Mohawk Valley, Central NYS, and the North Country).

Summary Statistics
We included in the study a total of 8,517 SARS-CoV  , 362). Results from the phylogeographic analysis indicated that B.1.526 emerged within the NYC area near the end of 2020 and that the Bronx was a major source of spread to other regions of NYS and the United States (domestic) (Figure 3). Although sampling biases could have influenced the number of introductions assigned to the Bronx, the domestic category had greater representation in the dataset but led to substantially fewer introductions (Appendix Table 2). Domestic genomes represented 29.5% of the dataset but this location was responsible   Table 2). Excluding the Bronx, B.1.526 transmission between boroughs and from these boroughs to other locations was relatively infrequent. We used subsampling strategies to investigate the strength of our results from the full B.1.526 phylogeographic analysis. These strategies included evenly sampling each region or borough, sampling evenly across time (except for December, which had very few B.1.526 cases compared with other months), sampling proportionally to the total incidence of SARS-CoV-2 per region or borough, and downsampling high-incidence regions or boroughs to the mean of B.1.526 cases per month. We performed each subsampling analysis in triplicate. Despite the different subsampling strategies, the root of the tree consistently fell within NYC, and the Bronx continued to serve as a major source of B.1.526 (data not shown).
Multiple domestic introductions contributed to the initial presence of B.1.1.7 in NYS (11) (Figure 4), with transmission occurring most frequently in the Finger Lakes and Northern NYS (Figure 4). The Finger Lakes and Northern NYS were well-represented in the dataset (32% of genomes) but contributed substantially less to the distribution of B.1.1.7 (accounting for 13% of the total number of introductions) than domestic sites, which represented 20% of the data and were responsible for the highest percentage of introductions (≈39%) (Appendix Table  3). The Finger Lakes showed the lowest proportion of sequenced cases attributable to introductions but the largest sample size in NYS, suggesting more sustained transmission of B.1.1.7 in this region.

Discussion
The repeated emergence of novel variants of SARS-CoV-2 has largely defined the COVID-19 pandemic response in 2021. As vaccination rates, prior exposure levels, and behavioral public health measures continuously change, so too will selective pressures (26). Given that selective pressures likely vary across regions, it follows that the emergence and spread of SARS-CoV-2 variants are also regionally dynamic. We combined spatial statistical, phylogeographic, and cartographic visualization techniques to examine the spatiotemporal dynamics of the VOC B.   There are some limitations to our study. A degree of selection bias exists within the dataset, given that specimens were screened by cycle threshold value and were submitted by a selected group of clinical and commercial laboratories that cannot perfectly represent all COVID-19 cases in NYS. We were unable to assess the demographic and clinical representativeness of our dataset because these data were not available to us. In addition, the number of specimens sequenced varied over the space and time of the study period, which created small sample sizes within many ZCTA-months. This limitation extended to the multinomial scan statistic, which was run with estimated values for COVID-19 cases attributable to B.1.1.7 and B.1.526, giving all ZCTAs with samples equal weight. However, the spatial scan assesses data according to their proximity to each other. In this context, ZCTAs are analyzed together rather than individually, which has the potential to reduce bias. Another consequence of our limited sampling was that our data exhibited zero samples from many ZCTAs for each month, which we addressed by using IDW interpolation of the proportion of B.1.1.7 and B.1.526 sequenced samples at the ZCTA-month level to visualize general patterns of variant proportions over geography. Phylogeographic analyses were hampered by similar limitations; uneven sampling among regions and the lack of global representation in our datasets could lead to incorrect trait assignments. Smaller sample sizes for some regions might have caused an underestimation of their contributions to variant transmission in NYS, whereas larger sample sizes might have inflated the number of introductions assigned. However, we believe our results largely capture the transmission dynamics of B.1.526 and B.1.1.7 in NYS, given that larger sample sizes did not always correspond to regions with outsized contributions to the spread of either variant, and subsampling the B.1.526 dataset consistently showed NYC as the dominant source of introductions.
Our phylogeographic and spatiotemporal analyses offer a method for evaluating the competitive advantages of co-circulating SARS-CoV-2 variants. We believe the emergence of VOI B.1.526 contributed to the slower rise of VOC B.1.1.7 as the dominant lineage in NYC compared with regions devoid of B.1.526. In this way, our study describes important dynamic interactions between variants with unequal transmissibility and is potentially generalizable to interactions between any known variants, including the highly transmissible Delta and Omicron variants and other variants to come.