Volume 22, Number 10—October 2016
Daily Reportable Disease Spatiotemporal Cluster Detection, New York City, New York, USA, 2014–2015
Each day, the New York City Department of Health and Mental Hygiene uses the free SaTScan software to apply prospective space–time permutation scan statistics to strengthen early outbreak detection for 35 reportable diseases. This method prompted early detection of outbreaks of community-acquired legionellosis and shigellosis.
The Bureau of Communicable Disease (BCD) at the New York City Department of Health and Mental Hygiene (DOHMH) monitors and investigates >70 reportable diseases among the city’s 8.49 million residents. Each day, healthcare providers and laboratories submit ≈1,000 communicable disease reports to BCD. Clusters (significant increases in observed vs. expected cases) and outbreaks (clusters believed to be associated with a common infection source) are detected through several methods, including notification by astute healthcare providers and by applying the modified historical limits method to detect increases in disease counts during the previous 4 weeks (1). This temporal analysis is applied weekly citywide and for each of 5 boroughs and 42 neighborhoods.
Cluster detection methods have been applied to syndromic data sources (e.g., emergency department visits) since the early 2000s (2,3). Less extensively described is cluster detection using reportable disease data, which reflect specific laboratory-confirmed diagnoses, contain patient home addresses, and may include illness onset dates and work addresses collected during patient interviews and medical record reviews. Other public health practitioners have applied purely temporal prospective cluster detection methods to reportable disease data (4,5) or conducted proof-of-concept spatiotemporal prospective analyses (6,7). However, published descriptions of actual prospective application of spatiotemporal methods to reportable diseases are rare (8,9), suggesting lack of widespread adoption among public health officials. We describe BCD’s experience with automated daily reportable disease spatiotemporal cluster detection using prospective space–time permutation scan statistics (3) in SaTScan (10) during February 2014–September 2015, highlighting instances in which findings guided public health action.
For 35 reportable communicable diseases for which cluster detection could inform programmatic activities (1), we analyzed disease counts for patients of all ages combined. For amebiasis, cryptosporidiosis, and giardiasis, for which outbreaks among young children are of particular interest, additional analyses were restricted to disease counts among patients <5 years of age, for 38 total daily analyses.
In BCD’s application, the space–time permutation scan statistic detects disease clusters in space–time cylinders centered on every census tract centroid; the circular base represents space (maximum geographic cluster size of 50% of all reported cases), and the height represents time (maximum temporal window length of 30 days, for most diseases). For each cylinder, a likelihood ratio–based test statistic is calculated. The test statistic is considered elevated if the observed disease count during the time window in census tracts with centroids inside the cylinder’s circular base exceeds the expected number of cases, which is a function of 1) the case count in the circle during a baseline period (which accounts for any purely geographic variations in disease occurrence, diagnosis, and reporting) and 2) the total case count citywide during the time window (which accounts for citywide purely temporal patterns, such as seasonality or secular trends) (3). The cylinder with the maximum test statistic is the cluster least likely to be due to chance under the null hypothesis that the same process generated disease counts inside and outside the cylinder.
To create a simulated dataset, cases’ dates are randomly shuffled and assigned to the original census tracts. The maximum statistic for each simulated dataset is calculated in the same way as for the observed dataset. For each disease, this process is repeated daily 999 times. The maximum value for the observed dataset is ranked among the 999 trial maxima. A p value (range 0.001–1) is derived from this ranking; p = 0.001 represents the highest significance relative to the permutation trials. The Monte Carlo approach to deriving significance by using repeated trials, each permuting observed data attributes, is designed to control for multiple testing.
A recurrence interval (RI) is calculated as the reciprocal of the p value and represents the number of days of daily surveillance required for the expected number of clusters at least as unusual as the observed cluster to be equal to 1 by chance (11). We defined a signal as any cluster with an RI >100 days; that is, during any 100-day daily analysis period, the expected number of clusters at least as unlikely as the current cluster is 1.
We developed a SAS program (SAS Institute, Inc., Cary, NC, USA) to generate case and parameter files (Table 1), read in a coordinate file of census tract centroids, invoke SaTScan in batch mode, read analysis results back into SAS for further processing, and output files to secured folders. For any signals, the program also generated emails notifying BCD leadership and staff responsible for follow-up (Technical Appendix).
This automated analysis detected the second largest US outbreak of community-acquired legionellosis (12), identifying a cluster of 8 cases centered in the South Bronx on Friday, July 17, 2015 (RI = 500 days) (Figure), before any human public health monitor noticed it. On Monday, July 20, an increase in cases was independently noticed by BCD staff members routinely investigating individual cases, and on July 21, an infection-control nurse working in the outbreak area called BCD to report an increase. The DOHMH and state and federal partners conducted an extensive epidemiologic, environmental, and laboratory investigation to identify and remediate the outbreak source, a cooling tower.
A shigellosis outbreak among the observant Jewish community in Brooklyn (13) began in late October 2014 and was detected with 9 cases on November 14, 2014 (RI = 333 days). BCD does not routinely investigate individual shigellosis reports, so automated analysis alone prompted early outbreak identification. Shigellosis outbreaks within this community occur cyclically and have been linked to daycare and preschool attendance (14). Starting in mid-November, BCD staff visited community schools, daycare centers, and health fairs to promote appropriate handwashing. The outbreak subsided by mid-March 2015. Other clusters prompting investigations included legionellosis (Queens, April–May 2015) and campylobacteriosis (Brooklyn, October 2014). During a 1-year period, 28 unique signals were observed across 15 diseases (Table 2), which staff perceived as a reasonable number for investigation.
Not all detected clusters were actionable. No public health response was conducted for an amebiasis cluster (Manhattan, April 2015; RI = 143 days) consisting of 6 men (34–49 years of age) diagnosed within a 12-day period and residing within a 0.35-mile radius because no case-patients were identified as food handlers or daycare workers. A public health response also was not conducted for a giardiasis cluster (Bronx, April 2015; RI = 1,000 days) that consisted of 6 household members who acquired the infection during international travel. Investigators were interested in being notified of and following such clusters over time, even if they ultimately were not actionable or verified as true outbreaks.
Several outbreaks in New York City, New York, were detected by daily automated spatiotemporal analyses. Early cluster detection facilitated prioritization of individual case investigations, outbreak recognition and investigation, provider and community outreach, and timely intervention to limit sickness and death. This method has proven particularly useful for identifying and monitoring outbreaks of shigellosis (6,8,9) and legionellosis and might be useful for monitoring additional diseases with outbreak potential, including pertussis, syphilis, and tuberculosis.
Key to the system’s success is a strong informatics infrastructure, especially electronic laboratory reporting and near real-time geocoding of surveillance data. Other facilitators include a powerful statistical disease surveillance methodology, knowledgeable epidemiologists to interpret signals, and adequate outbreak investigation resources.
These methods could be useful to other health departments receiving more reports than can be rapidly reviewed manually. State health departments could consider conducting similar analyses to detect clusters spanning multiple jurisdictions.
Dr. Greene is director of the Data Analysis Unit at the Bureau of Communicable Disease of the New York City Department of Health and Mental Hygiene, Queens, New York. Her research interests include infectious disease epidemiology and applied surveillance methods for outbreak detection.
We thank Alison Levin-Rector for contributions to the SAS code and Lisa Alleyne, Catherine Dentinger, Robert Fitzhenry, Lucretia Jones, Lan Li, Ellen Lee, Sally Slavinski, Vasudha Reddy, HaeNa Waechter, and Don Weiss for contributions to signal interpretations for particular diseases.
S.K.G., E.R.P., and D.K. were supported by the Public Health Emergency Preparedness Cooperative Agreement (grant 5U90TP000546-03) from the Centers for Disease Control and Prevention. A.D.F. was supported by New York City tax levy funds. M.K. was funded by the National Institutes of Health (grant RO1CA165057).
SaTScan is a trademark of Martin Kulldorff. The SaTScan software was developed under the joint auspices of Martin Kulldorff, the National Cancer Institute, and Farzad Mostashari of the New York City Department of Health and Mental Hygiene.
- Levin-Rector A, Wilson EL, Fine AD, Greene SK. Refining historical limits method to improve disease cluster detection, New York City, New York, USA. Emerg Infect Dis. 2015;21:265–72.
- Heffernan R, Mostashari F, Das D, Karpati A, Kulldorff M, Weiss D. Syndromic surveillance in public health practice, New York City. Emerg Infect Dis. 2004;10:858–64.
- Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F. A space-time permutation scan statistic for disease outbreak detection. PLoS Med. 2005;2:e59.
- Hutwagner L, Thompson W, Seeman GM, Treadwell T. The bioterrorism preparedness and response Early Aberration Reporting System (EARS). J Urban Health. 2003;80(Suppl 1):i89–96.
- Rigdon SE, Turabelidze G, Jahanpour E. Trigonometric regression for analysis of public health surveillance data. Journal of Applied Mathematics. 2014;2014:1.
- Jones RC, Liberatore M, Fernandez JR, Gerber SI. Use of a prospective space-time scan statistic to prioritize shigellosis case investigations in an urban jurisdiction. Public Health Rep. 2006;121:133–9.
- Hughes GJ, Gorton R. An evaluation of SaTScan for the prospective detection of space-time Campylobacter clusters in the North East of England. Epidemiol Infect. 2013;141:2354–64.
- Viñas MR, Tuduri E, Galar A, Yih K, Pichel M, Stelling J, ; Group MIDAS - Argentina. Laboratory-based prospective surveillance for community outbreaks of Shigella spp. in Argentina. PLoS Negl Trop Dis. 2013;7:e2521.
- Glatman-Freedman A, Kaufman Z, Kopel E, Bassal R, Taran D, Valinsky L, Near real-time space-time cluster analysis for detection of enteric disease outbreaks in a community setting. J Infect. 2016;73:99–106.
- Kulldorff M; Information Management Services, Inc. SaTScan v9.1.1: software for the spatial and space-time scan statistics. 2015 [cited 2015 Sep 24]. http://www.satscan.org/
- Kleinman K, Lazarus R, Platt R. A generalized linear mixed models approach for detecting incident clusters of disease in small areas, with an application to biological terrorism. Am J Epidemiol. 2004;159:217–24.
- New York City Department of Health and Mental Hygiene. Health Alert Network. 2015 alert 21: increase in Legionnaire’s disease in the Bronx [cited 2015 Sep 24]. https://a816-health30ssl.nyc.gov/sites/nychan/Lists/AlertUpdateAdvisoryDocuments/HAN_LegionellaSouthBronx.pdf
- New York City Department of Health and Mental Hygiene. 2014 Alert 39: Outbreak of shigellosis in Borough Park and Williamsburg [cited 2015 Sep 24]. https://a816-health30ssl.nyc.gov/sites/nychan/Lists/AlertUpdateAdvisoryDocuments/HAN_Shigella.pdf
- Garrett V, Bornschlegel K, Lange D, Reddy V, Kornstein L, Kornblum J, A recurring outbreak of Shigella sonnei among traditionally observant Jewish children in New York City: the risks of daycare and household transmission. Epidemiol Infect. 2006;134:1231–6.