Detecting Emerging COVID-19 Community Outbreaks at High Spatiotemporal Resolution - New York City, June 2020

To quickly detect hotspots, the New York City Health Department launched a SARS-CoV-2 percent positivity cluster detection system using census tract resolution and the SaTScan prospective Poisson-based space-time scan statistic. Soon after implementation, this system prompted an investigation identifying a gathering with inadequate social distancing where viral transmission likely occurred.

Spatiotemporal analysis of high resolution COVID-19 data can support local health officials to monitor disease spread and target interventions (1,2). Publicly available data have been used to detect COVID-19 space-time clusters at county and daily resolution across the US (3,4) and purely spatial clusters at ZIP code resolution in New York City (NYC) (5).
For routine public health surveillance, the NYC Department of Health and Mental Hygiene (DOHMH) uses the case-only space-time permutation scan statistic (6) in SaTScan * to detect new outbreaks of reportable diseases (7) (e.g., Legionnaires' disease (8) and salmonellosis (9)). Given wide variability in testing across space and time, case-only analyses would be poorly suited for COVID-19 monitoring, as true differences in disease rates would be indistinguishable from differences in testing rates. In addition, we sought to detect newly emerging or re-emerging hotspots during an ongoing epidemic, which is more challenging than detecting a newly emerging outbreak in the context of minimal or stable disease incidence. A new approach was needed to detect areas where COVID-19 diagnoses were increasing or not decreasing as quickly relative to other parts of the city. We developed a system to detect community-based clusters of increased percent test positivity for SARS-CoV-2 in near-real time at census tract resolution in NYC, accounting for testing variability. DOHMH launched the system on June 11, 2020, and the first COVID-19 cluster with a verified common exposure was detected on June 22.

The Study
Clinical and commercial laboratories are required to report all results (including positive, negative, and indeterminate results) for SARS-CoV-2 tests for New York State residents to the *  To detect emerging clusters, the space-time scan statistic uses a cylinder where the circular base covers a geographical area and the height corresponds to time (11). This cylinder is moved, or "scanned," over both space and time to cover different areas and time periods. At each position, the number of cases inside the cylinder is compared with the expected count under the null hypothesis of no clusters using a likelihood function, and the position with the maximum likelihood is the primary candidate for a cluster. The statistical significance of this cluster is then evaluated, adjusting for the multiple testing inherent in the many cylinder positions evaluated.
To quickly detect emerging hotspots, prospective analyses are conducted daily (12). To adjust for the multiple testing stemming from daily analyses, recurrence intervals are used instead of p-values (13). A recurrence interval of D days means that under the null hypothesis, if we conduct the analysis repeatedly over D days, then the expected number of clusters of the same or larger magnitude is one.
The space-time scan statistic can be utilized with different probability models. We used the Poisson model (11), where the number of cases is distributed according to the Poisson probability model, with an expected count proportional to the number of persons tested.
Analyses were adjusted non-parametrically for purely geographical variations that were consistent over time, as the goal was to detect newly emerging hotspots. Fitting a log-linear function, we also adjusted for citywide temporal trends in percent positivity, as the goal was to detect local hotspots rather than general citywide trends.
We developed SAS code (SAS Institute, Inc., Cary, NC, USA) that generated input and parameter files ( Table 1, Technical Appendix Table 1), invoked SaTScan in batch mode, read analysis results back into SAS for further processing, and output files to secured folders. For any signals (defined as clusters with recurrence interval ≥100 days), the code also generated a patient linelist, visualizations, and investigator notification email. Similar SAS code referencing markedly different input parameters is freely available. † During June 11-30, 28 unique primary clusters were detected ( Table 2). Despite a permissive maximum spatial cluster size setting of half of persons tested, clusters during this period were geographically small (median radius: 0.69 km). Citywide during this period, SARS-CoV-2 percent positivity was 1.3%, while median percent positivity within these clusters was 4.7% (range: 1.2%-30.6%). In 10 clusters, at least half of patients were 18-34 years-old ( Table   2).
On June 22, in the context of waning case counts citywide, the system detected a cluster of 6 patients (median age: 40 years) residing in a 0.64-kilometer radius, all with specimens collected on June 17 ( Figure). DOHMH staff interviewed patients for common exposures, such as attending the same event or visiting the same location. On June 23, a DOHMH surveillance investigator (D.B.) determined that two patients in the cluster had attended the same gathering, where recommended social distancing practices had not been observed. In response, DOHMH launched an effort to limit further transmission, including testing, contact tracing, community engagement, and health education emphasizing the importance of isolation and quarantine. † https://github.com/CityOfNewYork/communicable-disease-surveillance-nycdohmh

Conclusions
Automated spatiotemporal cluster detection analyses detected emerging, highly focused areas to target COVID-19 containment efforts in NYC. One-third of clusters consisted predominantly of young adults, suggesting poor adherence to social distancing guidelines in this age group (14).
Cluster investigations required substantial effort, and while only one cluster included patients with a verified common exposure, detecting localized transmission is important to prioritize focused interventions such as promoting increased testing and public messaging.
During June, we made several adjustments to improve signal prioritization, including increasing the minimum temporal cluster size from 2 to 3 days and increasing the minimum number of cases in clusters from 2 to 5 cases.
Our system is subject to several limitations. First, analyses were based on specimen collection date, but given delays in testing availability and care seeking, these dates did not necessarily represent recent infections. Timeliness was further limited by delays from specimen collection to laboratory testing and reporting. Clusters dominated by asymptomatic patients or patients with illness onset >14 days prior to diagnosis may not require intervention, as a positive PCR result indicates the presence of viral RNA but not necessarily viable virus (15). Second, geocoding is required for precision, and of unique NYC residents with a specimen collected during June 2020 for a PCR test for SARS-CoV-2 RNA, 4.9% had a non-geocodable residential address and were excluded from analyses. Finally, automation coding was complex (Technical Appendix). Planned SaTScan software enhancements that will facilitate wider adoption by other health departments include: adding a software interface for prospective surveillance, enabling temporal and spatial adjustments for the Bernoulli probability model, and enabling the log-linear temporal trend adjustment with automatically calculated trend at a sub-annual scale.
Our COVID-19 early detection system has highlighted areas in NYC warranting a rapid response. This work has guided prioritization of case investigations, contact tracing efforts, health education, and community engagement activities. Such local targeted, place-based approaches are necessary to minimize further transmission and to better protect people at high risk for severe illness, including older adults and people with underlying health conditions.

Acknowledgments
We thank all staff members of the DOHMH Incident Command System Surveillance and Epidemiology Section for processing, cleaning, and managing input data; for conducting patient interviews and cluster investigations; and for logistical support. We also thank the NYC Test and Trace Corps for their assistance in managing the cases and contacts included in and identified by

First author biographical sketch
Dr. Greene  Extending the study period further may decrease the accuracy of the temporal trend adjustment but might be of interest to detect more prolonged clusters. If citywide percent positivity reaches an inflection point (e.g., begins to increase again after a period of decrease), the study period will need to be temporarily shortened and reset after that inflection point to accurately adjust for the temporal trend. Lag for data accrual

days
Given lags between specimen collection and report, exclude very incomplete data at end of study period when estimating the temporal trend. Three days is the minimum lag possible to preserve a timely analysis while allowing for at least some data to be reported, geocoded, and analyzed prior to open of business.  ‡To account for data accrual lags, a 3-day delay was imposed between the end of the SaTScan study period and the detection date. Analyses were not run on June 13, 2020. §Excluding >1 case per building and same last name ("household") for detection dates through June 29; excluding >1 case per building for detection June 30 (see Table 1: omissions from input files). Increased minimum number of cases in cluster from 2 to 5, effective detection date June 25, 2020. ¶Including all persons without regard to same-household or same-building residence. #Patients in this cluster shared a common exposure: attendance at a social gathering. **One patient in this cluster reported having attended the same social gathering that was identified by investigating the cluster detected on June 22.  Table 1. Analysis parameter settings for SARS-CoV-2 percent positivity analyses in New York City, using the prospective Poisson-based space-time scan statistic.

Parameter
Parameter setting Notes Analysis type Prospective spacetime For timely cluster detection, prospective (rather than retrospective) analyses are used, evaluating only the subset of possible clusters that encompass the last day of the study period. To detect acute, ongoing, localized disease clusters, space-time analyses (rather than purely temporal or purely spatial analyses), are used Model type Discrete Poisson We apply the discrete Poisson-based scan statistic, defining the "population" file as persons tested, to scan for clusters of increased percent positivity. If SARS-CoV-2 percent positivity is high (say, >10%), then the discrete Poisson-based scan statistic is a poor approximation for Bernoulli-type data of persons testing positive and negative. The analysis would produce conservative p-values (i.e., recurrence intervals biased too low), and true clusters might be missed. However, SaTScan v9.6 does not include features for spatial and temporal adjustments for the Bernoulli probability model. Maximum spatial cluster size 50% of the population being tested The option that imposes the fewest assumptions is to allow the cluster to expand in size to include up to 50% of all cases during the study period. Forcing clusters to be smaller than 50%, or restricting in terms of geographic size by setting a maximum circle radius, can be motivated in geographically larger study regions. Maximum temporal cluster size 7 days* To focus on hotspots emerging during the most recent week.
Minimum temporal cluster size 3 days* Clusters of <3-day duration considered less credible for investigation as an emerging hotspot.

Minimum number of cases 5 cases
Require a minimum number of cases to improve the probability of at least 3 patients within a given cluster being reachable for interview to support identification of a common exposure. Temporal trend adjustment Log-linear with automatically calculated trend* If citywide percent positivity decreasing overall, then wish to detect areas where decreasing slower than citywide average. If citywide percent positivity increasing overall, then wish to detect areas where increasing more than citywide average. Adjusting for temporal trend nonparametrically is not possible if also using nonparametric spatial adjustment. Spatial adjustment Nonparametric, with spatial stratified randomization Goal is to detect areas with relative increases from baseline, even if still lower than average citywide. This method adjusts the expected count separately for each location, removing all purely spatial clusters. The randomization is then stratified by location ID to ensure that each location has the same number of events in the real and random data sets. Scan for areas with:

High rates
Interested only in increased disease transmission.

No cluster centers in other clusters
Any disease may have multiple active clusters at any moment, so secondary clusters should be reviewed. By reviewing clusters with no cluster centers in other clusters (rather than no, or more geographic overlap), secondary clusters with some overlap can be detected. * See "study period and time precision" section below.
Geocoding Patient addresses were geocoded daily using version 20A of the NYC Department of City Planning's Geosupport geocoding software, implemented in R through C++ using the Rcpp package. 3 Addresses that failed to geocode were then cleaned using a string searching algorithm performed against the Department of City Planning's Street Name Dictionary and Property Address Directory. Addresses that failed to geocode after cleaning were then verified using the IBM Infosphere USPS service.

Study period and time precision
SaTScan v.9.6 can estimate a temporal trend (see below), but only at an annual time scale, as this feature was originally developed to accommodate long-term secular trends across multiple years, as for cancer incidence. As a workaround to accommodate a rapidly changing trend, as for SARS-CoV-2 test positivity, reassign one day as if it were one year in the SaTScan case and population input files and conduct analyses at annual resolution. For example, for a 21day study period ending June 19, 2020, reassign May 30, 2020 as the year "2000" and June 19, 2020 as the year "2020," and indicate a time precision and a time aggregation of "year," (i.e., PrecisionCaseTimes=1 and TimeAggregationUnits=1 in the SaTScan parameter file). The minimum and maximum temporal cluster sizes would be input as years instead of days.
Similarly, with input data expressed in years, nonparametric adjustment for space by day-ofweek interaction was not possible.

Temporal trend adjustment
As a workaround for a bug in SaTScan v.9.6 in calculating a temporal trend adjustment in the prospective setting, first use the case and population files to run a retrospective purely temporal Poisson analysis, with the temporal adjustment "Log linear with automatically calculated trend" (TimeTrendAdjustmentType=3 in the SaTScan parameter file). Read in this automatically calculated temporal trend from the SaTScan text output. Retain the magnitude of trend ("X") and sign of X determined by "increase" or "decrease." Example SaTScan text output Retrospective Purely Temporal analysis scanning for clusters with high rates using the Discrete Poisson model. Adjusted for time trend with an annual decrease of 6.42984%.
The time trend is the same for retrospective and prospective analyses. Then, run the prospective spatio-temporal Poisson analysis, inserting the calculated time trend in the parameter file as user-specified (TimeTrendAdjustmentType=2, TimeTrendPercentage=-6.42984 in the SaTScan parameter file). Example user interface screenshot: