Characterizing Norovirus Transmission from Outbreak Data, United States

Norovirus is the leading cause of acute gastroenteritis outbreaks in the United States. We estimated the basic (R0) and effective (Re) reproduction numbers for 7,094 norovirus outbreaks reported to the National Outbreak Reporting System (NORS) during 2009–2017 and used regression models to assess whether transmission varied by outbreak setting. The median R0 was 2.75 (interquartile range [IQR] 2.38–3.65), and median Re was 1.29 (IQR 1.12–1.74). Long-term care and assisted living facilities had an R0 of 3.35 (95% CI 3.26–3.45), but R0 did not differ substantially for outbreaks in other settings, except for outbreaks in schools, colleges, and universities, which had an R0 of 2.92 (95% CI 2.82–3.03). Seasonally, R0 was lowest (3.11 [95% CI 2.97–3.25]) in summer and peaked in fall and winter. Overall, we saw little variability in transmission across different outbreaks settings in the United States.

N orovirus is the most common cause of outbreaks of acute gastroenteritis (AGE) in the United States (1,2). The Centers for Disease Control and Prevention (CDC) collects data on AGE outbreaks through the National Outbreak Reporting System (NORS). During 2009-2017, norovirus was the suspected or confirmed etiology of 47% of AGE outbreaks reported to NORS (3). The size and severity of outbreaks varies across different settings, times of year, and genotypes, suggesting norovirus transmissibility is variable across different outbreak settings and contexts (4). Generally, the transmission potential of infectious diseases is influenced by the infectiousness of the pathogen, the duration of infectiousness, and the number of susceptible contacts exposed during the infectious period (5).
The reproduction number is a metric for quantifying transmissibility of a pathogen. The basic reproduction number (R 0 ) is the average number of secondary cases that arise from a primary case in a completely susceptible population. The effective reproduction number (R e ) quantifies the average number of secondary cases that arise from a primary case in a population that is not completely susceptible. R e varies over the course of an outbreak as the proportion of the susceptible population changes (6,7). R 0 and R e are not just metrics of the biologic properties of pathogens but also measures of the transmissibility of a pathogen within a specific population or setting (8,9).
Several transmission modeling studies in different settings have estimated R 0 and R e of norovirus, but a large variation in these estimates occurs and R 0 ranges from 1.1-7.2 (10). Much of the R 0 variation likely is due to differences in the structures, population mixing assumptions, and data between transmission models in different settings (10). Generally, model estimates from community surveillance data result in an R 0 of ≈2, but estimates from outbreak data tend to be higher and more variable. The variability of estimates from models that use outbreak data likely are driven by context; outbreaks might occur in populations that are not representative of the population as a whole and transmission likely is higher in these settings than in the community (4).
We estimated R 0 and R e for thousands of norovirus outbreaks in the United States. We evaluated whether R 0 was associated with setting, season, year, or geographic region. In addition, we assessed whether norovirus was suspected or confirmed as the cause of the outbreak.

Data
Norovirus is the leading cause of acute gastroenteritis outbreaks in the United States. We estimated the basic (R 0 ) and effective (R e ) reproduction numbers for 7 an outbreak as >2 epidemiologically linked cases of suspected or laboratory-confirmed norovirus. NORS data consist of web-based reports of all foodborne, waterborne, and enteric disease outbreaks transmitted by contact with environmental sources, infected persons or animals, or unknown modes of transmission reported by state, local, and territorial public health agencies. This web-based reporting system collects epidemiologic information, including the dates; settings, such as long-term care facilities, child daycare facilities, hospitals, and schools; geographic location of the outbreak; the estimated total number of cases; and exposed population (2). For settings that report staff and guest case numbers, we included these data in the estimated total number of cases and exposed population. CaliciNet data consists of sequence-derived genotypes and epidemiologic data from norovirus outbreaks submitted from local, state, and federal public health laboratories. We obtained CaliciNet genotypes that were linked to outbreak data we acquired from NORS.
For all outbreaks reported to NORS, data are collected on the total estimated primary cases, including all laboratory-confirmed and suspected primary cases. These data exclude cases associated with secondary illnesses, such as person-to-person norovirus transmission in households after a restaurant-based outbreak. However, data for calculating attack rates, specifically the number of exposed persons and the subset of the exposed persons who became ill, are only collected for outbreaks with person-to-person, environmental, or unknown transmission modes. In addition, data collected from outbreaks might not be documented consistently across a report. For example, outbreaks for which setting-specific information on the total number of guests and staff that are reported to be ill, referred to as total ill, might not match the reported total estimated primary cases. During 2009-2017, a total of 17,822 suspected and confirmed norovirus outbreaks were reported to NORS. We excluded 10,728 outbreaks based on the following criteria, which we imposed hierarchically: transmission was not person-to-person (n = 3,866); the outbreak exposure occurred in multiple states (n = 8); the outbreak occurred in Puerto Rico (n = 3), which we excluded because of small sample size; the size of total exposed population or major setting were not reported (n = 5,573); the total estimated primary cases and the total ill among the exposed population were not equal (n = 1,231); or the total estimated primary cases or the total ill among the exposed population were reported to be greater than the total exposed population size (n = 47) (Appendix Figure 1, https://wwwnc.cdc.gov/EID/ article/26/8/19-1537-App1.pdf). In all, 7,094 norovirus outbreaks met our inclusion criteria in subsequent analyses (Appendix Table 1). We did not use imputation techniques to infer values for missing data because no good proxy variables inferred missing data for major settings and exposed population size.

Estimating R 0 and R e
We used the final size method to calculate R 0 , R e , and associated SEs (12; Appendix). The final size method calculates R 0 and R e based on 3 variables: the total population size of the outbreak (N), the total number of cases in the outbreak (C), and the number of susceptible persons at the start of the outbreak (S). In our calculations, C was informed by NORS outbreak data for the estimated total number ill and N by the exposed population. NORS data does not include nor can it inform the number of susceptible persons at the start of an outbreak. Therefore, to estimate S, we used norovirus challenge study data on the percent of persons that become infected and develop AGE after challenge with virus. Across all published studies, the weighted average of participants in whom gastroenteritis developed after challenge is 47% (range 27%-80%; Appendix Table 1) (13)(14)(15)(16)(17)(18)(19). We assumed S is the number of persons susceptible to disease, as opposed to infection. To calculate S, we multiplied 47% by N and rounded to the nearest integer. For 890 outbreaks, the total number of cases, C, was greater than our estimated S; for these outbreaks we set S equal to C, corresponding to a 100% attack rate. We also calculated S assuming 27% and 80% of N were susceptible to assess the sensitivity of our model results to this parameter.

Regression Analysis
After estimating R 0 , R e , and associated SEs for each norovirus outbreak, we fit a linear regression model to the log-transformed estimated reproduction numbers to assess whether outbreak setting, census region, season, year, suspected or confirmed norovirus, or genotype were associated with transmissibility. All variables were categorical, where the reference was assigned as the group with the most outbreaks reported, except for the suspected or confirmed variable, for which we set the referent to outbreaks with confirmed norovirus etiology. We used weighted least squares combined with estimated standard errors to produce robust estimates accounting for heteroscedasticity and non-normally distributed model residuals by using the estimatr package in R version 3.4.2 (20,21). We included the following variables in our models: outbreak setting; census region; meteorological season, defined as spring (March 1-May 31), summer (June 1-August 31), fall (September 1-November 30), or winter (December 1-February 28); year, defined as July-June; whether norovirus was suspected or confirmed; and norovirus genotype, categorized as GI, GII.4, or GII.non4.
For outbreaks for which we calculated R 0 and R e , we had norovirus genotype data for only 22% (1,571). In a preliminary analysis, we fit a univariate linear regression model to estimate R 0 by norovirus genotype alone and by norovirus genotype and year and found no evidence for variation (Appendix). Given these results and the small sample size, we did not include norovirus genotype in our models and performed model selection on the remaining variables. To determine which variables to include, we used a forward selection process and selected the model with the lowest Akaike information criterion and Bayes information criterion values.

Sensitivity Analysis
We tested the sensitivity of our regression model results to different modeling approaches and different assumptions of the percent susceptible at the start of an outbreak. We also fit a logistic regression model of binary transmission and a negative binomial regression of the final outbreak size by using the logtransformed exposed population size as a measure of the attack rate of an outbreak. Thus, we could make comparisons between the models to see if the results from modeling continuous transmission were consistent with the results of modeling binary transmission and attack rates. In addition, we ran all the regression models again using the assumption that 27% and 80% susceptible at the start of an outbreak, which corresponds to the minimum and maximum percent susceptible to AGE from published challenge studies (Appendix Table 2).

Model Selection and Regression Analysis
The final selected model included the following variables: major setting, census region, season, year, and whether norovirus was suspected or confirmed (Akaike information criterion = 5,803; Bayes information criterion = 5,968) (Appendix Table 3). For long-term care and assisted living facilities, R 0 was 3.35 (95% CI 3.26-3.45). R 0 for outbreaks in all other settings did not differ substantially, except for outbreaks in schools, colleges, and universities, in which R 0 was slightly reduced, 2.92 (95% CI 2.82-3.03) (Table 2; Appendix Figure 2). We found that R 0 differed substantially by outbreak status; suspected norovirus outbreaks had a lower R 0 , 3.02 (95% CI 2.94-3.10), than that for confirmed outbreaks (R 0 = 3. 35 Figure 2). Our findings were generally robust to assumptions about the proportion susceptible at the start of the outbreak and whether we modeled the outcome of R 0 , R e , or final outbreak size (Appendix Tables 4-6, Figure 3).

Discussion
By using a large national outbreak dataset, we investigated transmission patterns of norovirus outbreaks. Our analysis led to several key findings. First, reported norovirus outbreaks in the United States have modest R 0 (2.75 [IQR 2.38-3.65]) and R e (1.29 [IQR 1.12-1.74]) values. Second, we found that R 0 and R e did not vary across most settings, except for outbreaks in schools, colleges, and universities, which had lower estimated transmission values. Third, we found higher transmission in laboratory-confirmed outbreaks relative to suspected outbreaks and higher transmission for outbreaks occurring in the winter months relative to summer months.
Our finding that norovirus outbreaks in the United States have modest transmission values is somewhat surprising. In a recent review of norovirus modeling studies, Gaythorpe et al. (10) found R 0 estimates for norovirus were 1.1-7.2. Of note, R 0 and R e estimates from transmission modeling studies that analyzed data from norovirus outbreaks were high, but variability between studies was high; R e estimates were ≈1-14 (22-24). Our estimates are within the reproduction numbers estimated by using transmission models of norovirus based on outbreak data (22,25).
However, our estimates are higher than those from several studies that estimated reproduction numbers by using population-level transmission models (26)(27)(28)(29), suggesting that transmission of norovirus in outbreak settings is higher than sporadic transmission in the community.
From our main analysis, we found that outbreaks in schools, colleges, and universities had lower estimated transmission, but transmission varied little across all other settings. Relative to outbreaks in longterm care and assisted living facilities, outbreaks that occurred in private homes or residences and restaurants had higher final sizes, and schools, colleges, and universities had lower estimated attack rates. Our finding that outbreaks in the winter had higher estimated transmissibility than outbreaks that occurred in summer is likely a factor of the strong wintertime seasonality of noroviruses in the United States (30,31). Consistent with this finding are the observations that norovirus case and outbreak reports are inversely correlated with temperature (30,31) and that survival of norovirus surrogate viruses, such as murine norovirus and feline calicivirus, declines with increasing temperatures (32,33).
Several differences we found might be driven by surveillance biases rather than differences in norovirus transmission. Suspected norovirus outbreaks without a laboratory-confirmed outbreak etiology had lower transmission than laboratory-confirmed norovirus outbreaks, perhaps because suspected outbreaks are not investigated as well as confirmed outbreaks and have lower rates of case ascertainment. Outbreaks reported in the south had higher estimated R 0 and R e relative to outbreaks in the northeast, which might be related to differences in the quality of reporting between these regions. For example, if surveillance in certain regions only captured larger, more easily detectable outbreaks with higher attack rates, this could bias our estimates of transmissibility upwards. Tremendous variability exists in outbreak reporting between states, ≈100-fold difference between the highest and lowest reporting states, which  likely affects the observed outbreak characteristics we included (34). Similarly, NORS has been collecting outbreak reports since January 2009, but in August 2012 CDC began a concerted effort to improve norovirus outbreak reporting to NORS and CaliciNet with the introduction of NoroSTAT (35,36). Thus, our finding that norovirus outbreaks reported before August 2012 were larger and had higher estimated R 0 and R e values might be related to CDC's efforts to capture outbreaks that previously would not have been reported, such as smaller outbreaks. Further, because the transmission mode can be difficult to identify for norovirus outbreaks, our analysis might have included outbreaks for which the mode of transmission was misclassified as person-to-person. Larger outbreaks with higher transmission are more likely to be reported, and our results might not reflect transmission in smaller outbreaks. In addition, the exposed population size is difficult to quantify and is not consistently reported to NORS. Thus, the differences we found in estimated attack rates across different settings could be due to true variability in the exposed population size across settings or variability in the reliable reporting of the exposed population size. However, our analysis restricted to outbreaks in long-term care and assisted living facilities found the same trends among the variables for outbreak status, census region, season, and year as our analysis of all outbreaks, which suggests the results are robust. Our study has several additional limitations. First, our process of data selection might have introduced bias into our analyses. We excluded outbreaks that occurred in multiple states, which are likely to have higher transmissibility given the larger geographic range involved; however, only 8 multistate outbreaks occurred during the study period, thus the bias is likely negligible. A substantial proportion of the dataset, 5,573 (31%) outbreaks, had to be excluded because the exposed population size was not reported. Excluding these outbreaks could introduce bias if the exposed population size is more likely to be reported for outbreaks with smaller, or larger, exposed population sizes. We only included outbreaks with person-to-person transmission; thus, our estimates of transmissibility are not generalizable to norovirus outbreaks where transmission occurs via other modes, such as foodborne, waterborne, or environmental transmission.
A second set of limitations relates to the final size method. This method assumes a susceptibleinfected-recovered type infection in a homogenously mixing population (12), but this simplification likely does not reflect true mixing patterns. In addition, we might observe different mixing patterns in each of the different outbreak settings, such as older persons in long-term care facilities versus young children in childcare. The final size method also underestimates reproduction numbers for outbreaks with high attack rates. For example, in private homes, attack rates are high, but exposed population sizes are small. If everyone in the household is infected, then no additional infections can occur in the home. Thus, the final size method cannot capture any additional transmission that could have happened if the exposed population size had been larger, such as a higher number of persons in the household. Becker termed this limitation the "wasted infection potential" (37). Further, the final size method does not account for the effect of control measures. For some of the outbreaks represented in our dataset, control measures were most likely implemented, such as isolating ill persons and cleaning contamination. Such interventions likely would reduce the number ill, and the estimated R 0 would be lower than the R 0 in the absence of control measures.  In addition, the final size method assumes that the proportion of susceptible persons is known at the start of an outbreak; however, the level of susceptibility to norovirus is not well known. Certain host genetic factors are associated with the ability of norovirus to establish an infection within a human host (38)(39)(40)(41), leading to variable susceptibility to norovirus infection (42)(43)(44). Secretor-negative persons have nonfunctional fucosyltransferase-2 genes, causing infection failure for norovirus genogroups I and II type 4 (38,40,41,45,46). Our estimates of R 0 and R e assume that 47% of the population in our dataset is susceptible at the start of all outbreaks. However, the proportion susceptible varies among outbreaks and potentially over time and age as the distribution of circulating norovirus genotypes change. Further, our regression model estimates were sensitive to our assumption of the percent susceptible at the start of an outbreak. When we assumed 47% and 80% of the population was susceptible, the estimated transmissibility of norovirus in private homes or residences and restaurants was higher than transmissibility in long-term care and assisted living facilities. However, when we assumed 27% of the population was susceptible at the start of an outbreak, the association between private homes or residences and restaurants reversed. These settings then had lower estimated transmission relative to outbreaks in long-term care and assisted living facilities because the population size that can be infected is much lower, thus reducing the estimates of R 0 and R e . For example, if a household had 15 persons, the maximum possible R 0 assuming 27% susceptibility is 4, which is lower than the average predicted R 0 for outbreaks in the reference group. Therefore, the results for private homes or residences and restaurants, where exposed population sizes are lower, should be interpreted with caution because transmission values in these settings might be underestimated. We also assumed that only symptomatic persons contribute to transmission in our calculation; persons with asymptomatic norovirus infections can contribute to transmission, but they likely are not as infectious as persons with symptomatic infections (22,47).
Finally, our main analysis does not account for norovirus genotype. Because of the limited data available on outbreak genotype we were not able to fully assess whether certain genotypes were more transmissible. As more genotyping data become available, future studies should investigate transmissibility.
We estimated reproduction numbers by using the final size method for >7,000 outbreaks from a national outbreak reporting system, then used these estimates to examine factors associated with norovirus transmission. Our analyses suggest that norovirus transmission rates are modest. Such modest rates of R e suggest there are opportunities for effective control measures to curtail transmission of norovirus. However, challenges remain. Transmission by asymptomatic persons, which we did not account for in this analysis and generally goes undetected in surveillance, can limit the effectiveness of traditional control methods focused on ill persons, even for pathogens with modest transmission (48).
Overall, we found limited variation in R 0 and R e for reported norovirus outbreaks in the United States, particularly across different settings. Our findings highlight the need for better data on the total exposed population sizes in outbreaks, which heavily influence estimates of attack rates, R 0 , and R e , to further refine estimates of these outbreak factors.