Daily Reportable Disease Spatiotemporal Cluster Detection, New York City, New York, USA, 2014–2015

Each day, the New York City Department of Health and Mental Hygiene uses the free SaTScan software to apply prospective space–time permutation scan statistics to strengthen early outbreak detection for 35 reportable diseases. This method prompted early detection of outbreaks of community-acquired legionellosis and shigellosis.


Exceptions to Case File Specifications in Main Text
1. Exceptions to the 1-y study period: for analyses with a longer maximum temporal window, the study period is extended accordingly so the temporal window of interest does not constitute a disproportionately high fraction of the study period. For the 60-d maximum temporal window, the study period is extended to 1.5 y. For shigellosis, outbreaks occur cyclically and have prolonged person-to-person transmission. The study period was extended to 2 y so that as cases in ongoing outbreaks shift as time passes from the temporal window of interest to the baseline, they do not constitute a disproportionately high fraction of the study period. Including true outbreak-associated cases in the baseline period of the case file can bias prospective analyses, making it more difficult to detect an elevated number of diagnoses in an area with a past outbreak. To minimize this bias, for example, after a large legionellosis outbreak in the South Bronx (1), all cases citywide with event dates of July 8-August 3, 2015, were removed from the case file. Including nosocomial outbreak-associated cases in the temporal period of interest in the case file can make it more difficult to detect a simultaneous outbreak of community-associated cases. To minimize this bias, after an outbreak of legionellosis in a nursing home, the 7 cases for patients residing in this home during the outbreak period were removed from the case file.
2. Exceptions to including all reported cases: Including all cases assumes that the overall reporting volume is constant over time. This assumption can be violated by geographic changes in laboratory testing practices. For legionellosis, 1 case status (unresolved) is reserved for events for which the only laboratory report is an antibody test; such events do not meet the case definition. Before excluding this case status, several false legionellosis signals occurred. One hospital laboratory adopted a new culture-independent diagnostic test for a panel of pathogens causing gastrointestinal illness. Before excluding cases where the only laboratory report was from this panel, several clusters of Shiga toxin-producing Escherichia coli were attributable to reports from this facility using this diagnostic test, but the cases did not meet the case definition.
Technical Appendix Table 1. Analysis parameter settings for routine reportable disease analyses in New York City using the prospective space-time permutation scan statistic. Parameter Parameter setting Notes Analysis type Prospective spacetime For timely cluster detection, prospective (rather than retrospective) analyses are used, evaluating only the subset of possible clusters that encompass the last day of the study period. Such clusters do not necessarily require the existence of cases at the very end of the study period. To detect acute, ongoing, localized disease clusters, space-time analyses (rather than purely temporal or purely spatial analyses), are used.

Model type
Space-time permutation The space-time permutation probability model has several advantages over other models developed for count data, including requiring only case data and not requiring the assumption that the probability of being diagnosed and reported as a case is independent of location of patient residence.

Scan areas
High rates For disease cluster detection, areas with high (rather than low) rates are of interest. Separate quality control measures are in place to detect unusual drop-offs in disease reporting. Time aggregation and length 1 d This setting must equal the interval between prospective analyses. For daily analyses, data must be aggregated into units of one day, corresponding with the daily resolution of data in the case file. Maximum spatial cluster size 50% of all cases during the study period The option that imposes the fewest assumptions is to allow the cluster to expand in size to include up to 50% of all cases during the study period. Forcing clusters to be smaller than 50% (or restricting in terms of geographic size by setting a maximum circle radius) can be arbitrary and lead to selection bias. Maximum temporal cluster size 30 d* Thirty days is long enough to encompass most cases in point-source clusters, which typically span a few days or weeks, depending on the pathogen. Thirty days may not be long enough for diseases with long incubation periods, extended propagated transmission, or intermittent common-source outbreaks. Maximum number of Monte Carlo replications 999 The number of replications must be at least 999 for adequate power. To minimize computing time across the many diseases being independently analyzed each day, we chose not to increase to 9,999 replications. Secondary cluster reporting criteria (output parameter)

No cluster centers in other clusters
Any disease may have multiple active clusters at any moment, so secondary clusters should be reviewed. By reviewing clusters with no cluster centers in other clusters (rather than no, or more geographic overlap), secondary clusters with some overlap can be detected. For each disease regardless of signal status, the daily case input file and SaTScan results files were archived. Cluster summary information (one row per disease per day) was appended to a cumulative SAS dataset to track how clusters strengthened or weakened over time.
Timely cluster detection is dependent on up-to-date data. Delays resulted when diseases were reported by telephone or fax instead of electronically and reports were still awaiting data entry at the time of analysis. Relevant spatial and temporal data elements obtained from case investigation (e.g., work address and onset date) also needed to be entered quickly.
Strong data quality control procedures must be maintained to avoid missed and false signals. Patient addresses that were not geocodable (e.g., due to typos) were reviewed and corrected if possible to keep those cases in analysis and retain power. Multiple disease reports for the same patient were de-duplicated to avoid false signals. Rather than using a fixed statistical threshold to launch a formal outbreak investigation, it is better to rely on experienced epidemiologists to weigh several factors, including the type of disease, the patient population, and the RI magnitude. Signals suggest excess disease activity and are intended to facilitate investigations, but provide only approximate spatial and temporal boundaries for a suspected outbreak. Some patients within the cluster likely represent background, non-outbreak-associated cases, while some patients outside the cluster may have had the outbreak-causing exposure. In addition, a geographically large cluster may have a less clear public health action compared with highly spatially focused clusters.
Analytic adjustments were warranted in response to false, delayed, and difficult-tointerpret signals. False signals were attributable to the inclusion of legionellosis cases with only antibody test results and Shiga toxin-producing E. coli cases with only culture-independent diagnostic test results. These reports did not meet the case definitions and reflected the adoption of specific tests by specific facilities rather than true disease increases. Case files were modified to exclude such cases (main text Table 1). A legionellosis outbreak in the East Bronx in September 2015 was detected later than it might have been, because the only spatial element included in analysis initially was the home address of NYC residents, whereas the outbreak also affected those who worked or attended programs but did not live in the area. The case file was modified to include the NYC work address obtained from case investigation if a NYC home address was unavailable (main text Table 1). In addition, in response to heightened public concern following a large community-acquired legionellosis outbreak, the RI signaling threshold Several signals were difficult to interpret due to the influence of past outbreaks in the study period. Following a nosocomial legionellosis outbreak in a nursing home, the individual outbreak-associated cases were excluded from the case file. Following large communityassociated legionellosis outbreaks, all cases citywide during each outbreak period were excluded from the case file. In addition, a shigellosis outbreak was so prolonged that after 3 months, the center of the cluster appeared to migrate to a different part of the city, despite continued excess shigellosis activity in the original outbreak location. The case file was modified to double the study period so that as older cases in the ongoing outbreak shifted from the temporal cluster window of interest into the baseline period, they would not unduly influence the baseline (main text

Sample SAS Code for Running SaTScan Analyses and Producing Output
This code generates case and parameter files formatted for SaTScan for each disease, reads in the NYC census tract coordinate file, calls SaTScan in batch mode, reads in analysis results, creates output files when a cluster is identified above the RI threshold, and sends e-mails to Bureau of Communicable Disease (BCD) leadership and designated staff for follow-up. The code is partitioned into three files, which are presented here in the order they are run by SAS (5,6). The first file reads in a series of macros that establish SaTScan input, analysis, and output parameters; call SaTScan in batch mode; flag new clusters above the RI threshold; and generate figures and linelists for output. The second file creates an event-level dataset of all diseases and runs iteratively over a list of designated disease-analysis parameter combinations to create disease-specific event-level datasets for analysis in SaTScan, then calls the macros to perform analyses and produce output files. The third file sends e-mails to alert staff to new and ongoing signal summary output and inform BCD data unit staff that the program has run successfully.
Jurisdictions seeking to adopt SaTScan for cluster detection should keep in mind that the parameters macro program presented here (TractParam) is only compatible with SaTScan version 9.1.1. If using a different version of SaTScan, it will be necessary to replace this code with parameters derived from the version of SaTScan to be used for analysis. This is done by setting input, analysis, and output parameters in the SaTScan user interface, saving the SaTScan session as a .txt file, and replacing the code in the body of the TractParam macro progam with the text of the saved .txt file. All lines of the substituted text will need to be enclosed in double