Skip directly to site content Skip directly to page options Skip directly to A-Z link Skip directly to A-Z link Skip directly to A-Z link
Volume 27, Number 5—May 2021

Detecting COVID-19 Clusters at High Spatiotemporal Resolution, New York City, New York, USA, June–July 2020

Sharon K. GreeneComments to Author , Eric R. Peterson, Dominique Balan, Lucretia Jones, Gretchen M. Culp, Annie D. Fine, and Martin Kulldorff
Author affiliations: New York City Department of Health and Mental Hygiene, Long Island City, New York, USA (S.K. Greene, E.R. Peterson, D. Balan, L. Jones, G.M. Culp, A.D. Fine); Harvard Medical School, Boston, Massachusetts, USA (M. Kulldorff)

Main Article

Table 1

Input file specifications for SARS-CoV-2 test percent positivity cluster detection analyses in New York City, NY, USA, June–July 2020*

Feature Selection Notes
Geographic aggregation
Census tract (defined by using US Census 2010 boundaries) of residential address at time of report
With less aggregated data, the more precisely areas with elevated rates can be identified. New York City has 2,165 census tracts located on land. If geocoding is not feasible, then ZIP code could be used but with a loss of spatial precision.
Case file
Unique persons reported with a positive result for a molecular amplification detection (PCR) test for SARS-CoV-2 RNA in a clinical specimen. Retain specimen collection date of first positive test.
Confirmed COVID-19 cases ( Interim-20-ID-01_COVID-19.pdf)
Population file
Unique persons reported with a molecular amplification detection (PCR) test for SARS-CoV-2 RNA in a clinical specimen. For persons who ever tested positive, retain specimen collection date of first positive test. Otherwise, retain most recent specimen collection date. For a given census tract and date, if no specimens were collected, then include in file as having 0 population.
Necessary to control for spatial and temporal variability in testing access. A census-based population denominator would not control for variable testing uptake because the number of persons tested is not necessarily proportional to population size.
Omissions from input files
Residents of long-term care facilities, correctional facilities, facilities housing people with developmental disabilities, or homeless shelters; persons whose home address matches selected providers or facilities; persons diagnosed in the 14 d before a more recent case residing in the same building identification number from geocoding; persons with COVID-19 illness onset (where available from patient interview) >14 d before specimen collection.
To focus on detecting recent community-based transmission, exclude residents of congregate settings because building-level clusters are detected by using other methods (13), persons whose listed home address is not a residence, >1 case/building, and patients whose diagnosis was made long after illness onset.
Date of interest for analysis
Specimen collection date
Defining reportable disease clusters according to when patients
became ill is preferred, although a large proportion of COVID-19 infections are asymptomatic. Specimen collection date is the earliest date available for the study population of persons tested.
Study period
21 d for analysis to support prioritization of case investigations; since June 1, 2020, for analysis to support place-based resource allocation
Defining a study period >3 times the maximum temporal window helps with statistical power. Extending the study period further may decrease the accuracy of the log-linear temporal trend adjustment but might be of interest for detecting more prolonged clusters. If citywide percent positivity reaches an inflection point (e.g., begins to increase again after a period of decrease), the study period would need to be either temporarily shortened and reset after that inflection point to preserve suitability of a log-linear temporal trend adjustment or a nonparametric temporal trend adjustment could be used. For a longer temporal window, June 1, 2020, was selected as the earliest date when citywide percent positivity trend seemed stable without an inflection point. After 63 d elapsed from June 1, 2020, switched to 63-d rolling study period until next inflection point was reached.
Lag for data accrual 3 d Given lags between specimen collection and report, exclude very incomplete data at end of study period when estimating the temporal trend. Three days is the minimum lag possible to preserve a timely analysis while allowing for at least some data to be reported, geocoded, and analyzed before open of business.

*The prospective Poisson-based space-time scan statistic was used. COVID-19, coronavirus disease; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

Main Article

Page created: March 09, 2021
Page updated: April 26, 2021
Page reviewed: April 26, 2021
The conclusions, findings, and opinions expressed by authors contributing to this journal do not necessarily reflect the official position of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.