Comprehensive Profiling of Zika Virus Risk with Natural and Artificial Mitigating Strategies, United States

Zika virus is transitioning to become a long-term public health challenge, and countries should remain informed of the risk for emergence. We developed a stochastic epidemiologic model to profile risk for Zika virus emergence, including trimester-specific fetal risk across time, in all 3,208 counties in the United States, including Puerto Rico. Validation against known transmission in North America demonstrated accuracy to predict epidemic dynamics and absolute case counts across scales (R2 = 0.98). We found that, although sporadic single transmission events could occur in most US counties, outbreaks will likely be restricted to the Gulf Coast region and to late spring through autumn. Seasonal fluctuations in birth rates will confer natural population-level protection against early-trimester infections. Overall, outbreak control will be more effective and efficient than prevention, and vaccination will be most effective at >70% coverage. Our county-level risk profiles should serve as a critical resource for resource allocation.

transmission and EIP are borrowed from the large body of literature surrounding dengue virus dynamics, as it is a closely related but more completely studied mosquito-borne flavivirus that shares the same mosquito vector host system. Such a strategy for closely related viruses is common for emerging pathogens, and has been successfully employed for other Zika virus transmission models (3,4).

Spatial Properties
To best inform at the county-level (and by extension also inform at state and national levels), our aim was to describe the potential for Zika virus transmission within each county and municipality independently. Simulations thus assume that an index case arrives in that county, regardless of whether the case arrives from a neighboring county or via a traveler from a distant country. The current investigation therefore provides information on potential for Zika virus transmission in a given county in a manner that is source agnostic, and thus equally useful for a county sharing a border with a nearby county with an ongoing epidemic, or a county likely to receive travelers coming from Zika infected regions.

Mosquito-Human Interactions per Day (m)
For each county, a maximum number of mosquitos-human interactions per day (λ = mmax) during peak vector abundance was calculated using high resolution (5km x 5km) maps detailing the global spatial distribution of Aedes aegypti and Aedes albopictus (5). In brief, these maps, produced by Kraemer et  Additionally, these global maps, which are freely available for download (http://goo.gl/Zl2P7J) agree with the estimated range of Aedes aegypti and Aedes albopictus in the U.S. as put forth by the U.S. Centers for Disease Control and Prevention (6,7). Using these high resolution Aedes occurrence probability maps, we calculated mean probability of occurrence of Ae. aegypti and Ae. albopictus for each county or municipality using border data retrieved from the Global Administrative Areas or GADM database, freely available at http://www.gadm.org, and easily accessible in R using the getData() function of the Raster package (8). The mean occurrence probability for each county was converted to vector abundance as described by Perkins,et al. (3) In brief, by assuming that mosquito abundance is distributed as a Poisson random variable, Perkins et al. nicely point out that the expected abundance of mosquitos (λ) in a given location (i) can be calculated from the probability of a mosquito (PM) at the i th location as PMi = 1-exp(-λi).
Thus, the expected peak abundance of mosquitos available per person per day (λ) can be calculated as λ = -ln(1 -PM), and λ is thus taken to approximate the maximum mean daily number of mosquitos available per person. Importantly, because the probability of occurrence maps were generated to reflect the extent of the geographic distribution of Ae. aegypti and Ae.
albopictus, as mentioned above, λ is considered to reflect the maximum expected number of mosquitos per person, or the number of mosquitos available per person during peak vector abundance. λ therefore represents an upper bound of m, the maximum average number of mosquitos per person per day (calculated separately for each Aedes vector) throughout the year.
For each county, simulations were initiated at time of peak vector abundance (see below and Appendix Figure 2) and as simulations progressed, mosquitos available per human changed dynamically. For each simulation, λ was used to initialize the mosquito populations ( 0 ) such that: 0 = , where represents the total human population in the respective county.
Simulations were thus initiated during the month of the year in each county when vector abundance was expected to be greatest (see below). Once initialized however, vector populations were modeled explicitly via a series of stochastic ordinary differential equations (ODE) and thus human-mosquito interactions changed dynamically. At peak mosquito abundance (i.e. = ), the rate of mosquito bites per person per day (m) was given by = = , Where represents the mean number of daily blood meals taken by each respective Aedes vector, and the proportion of blood meals taken on humans. To account for fluctuating mosquito vector dynamics, which occur over orders of magnitude between summer and winter, the mosquitohuman bite rate (m) was constantly updated at every time-step to account for changes in the individual Aedes mosquito populations, as well as the number of blood meals taken per day, which is dependent on the duration of the gonotrophic cycle. Blood meals per mosquito per day (τ) was defined as the number of expected feeding events that a mosquito takes per day; calculated as the number of feeds, including interrupted feeding patterns per gonotrophic cycle: two and four for Ae. Aegypti and Ae. Albopictus, respectively (9,(11)(12)(13)(14), divided by the duration (days) of the gonotrophic cycle. were fit to the monthly data to model expected proportions of annual births per month per county. For counties with populations fewer than 50,000, stochastic effects masked clear seasonality of human birth rates. Therefore, for those counties with populations below 50,000, the proportion of annual births in each month were pooled within a given state (excluding counties with populations above 50,000) and GAMs were fit to the pooled statewide data. The county-level (for counties ≥50,000) or statewide (counties <50,000) GAM output, indicating the expected proportion of annual births in each calendar month, was then coupled to annual birth numbers for each individual county to calculate monthly county-specific expected pregnancies.

Trimester-Specific Pregnancies and Exposure Calculations
From the monthly birth data for each county, we calculated the number of pregnancies in their first, second or third trimester during each month for each county or municipality. To achieve this, we assumed that within a given month, births were uniformly distributed and that each trimester is 13.33-weeks long, for a 40-week gestation. Therefore, for example, among January births, we assume that all of those pregnancies delivering in January were in their third trimester in January (up until delivery), as well as in December and November, and that half were in their third trimester in October (those with deliveries in the first half of January), and half were in their second trimester in October (those with deliveries in the second half of January). We iterated this routine for each trimester and month for each county to derive numbers of pregnancies in each trimester across all months.
To calculate infections during pregnancy during the simulations, we assumed that pregnancies were among women who remained well-mixed within the population. Given the large numbers of simulations, for efficiency, rather than including numerous compartments for women in each trimester of their pregnancy into our SEIR model, using the assumption of homogenous mixing, we derived the number of fetal exposures per trimester per week for each simulation by drawing from a binomial distribution, with the size equal to the number of first, or second, or third trimester pregnancies in the county, during the week of interest, and with a probability equal to the proportion of the population infected during that week. Therefore, for example, to calculate number of first trimester pregnancies infected during week 36 in a given county, q, in R we coded: rbinom(n = 1, size = n_tri1_wk36_q, prob = prop_Inf_wk36_q), where n_tri1_wk36_q is the number of first trimester pregnancies expected in week 36 for county q, and prop_Inf_wk36_q is the proportion of the county (q) population infected during that week.
By drawing from a binomial distribution, we incorporate stochastic effects that influence the number of infections among gravid women, relative to the proportion infected across the population as a whole.

Model
The model is a stochastic SEIR / SEI model with an additional two equations to describe mosquito egg, larval/pupal, and adult development for simulating mosquito populations as functions of temperature throughout the year in each county. The model is defined as follows (Appendix Figure 1 and equations below).

SEIR Model (Human)
Where SH, EH, IH, RH, RHs indicate compartments for the susceptible, exposed, infectious, recovered following infectious period, and recovered following non-infectious period, respectively. Additionally, SM, EM, IM, LV and Egg indicate population compartments for susceptible, exposed and infectious mosquitos, as well as larvae and egg populations. Definitions for each of the variables and rate parameters are presented in Appendix Table 1.

Simulations
Simulations were run using an adaptation of the Gillespie Stochastic Simulation Algorithm (SSA), adaptive tau-leaping algorithm on a continuous time scale. Tau-leaping was initially described by Gillespie (20). Briefly, tau-leaping procedures achieve efficient implementations of the SSA by relaxing the requirement to model every change (step) in the system -a hugely resource consuming process -by using a Poisson approximation to take "leaps" over fast (stable) steps, while well approximating the stochastic behavior that would be expected over the duration (tau) of the leap. A limitation of tau-leaping however is that the leap size must be pre-specified. When the system is stiff and, by definition predictable, such as during the exponential growth phase of an epidemic, relatively large leaps can be taken because the Poisson approximation is sufficient to approximate the behavior of the system. In fact, under stiff (stable) conditions, with very fast steps and large populations, the tau leaping method limits to the explicit Euler method. However, when the system is nonstiff, and a particular single event can have profound effects on the simulation, i.e., a "critical reaction," such as following introduction of an index case to a community or, more generally, when an extinction of a compartment is possible, the leap sizes must be much smaller. Therefore, if critical reactions are possible, the leap sizes must be reduced, eventually approximating the fundamental Gillespie SSA such that no more than a single critical reaction can take place during a given time-step or leap, also ensuring a population cannot become negative. Because epidemic behavior shifts from nonstiff systems (for example just following introduction of an index case, when extinction of an infectious disease in a community is likely, or when mosquito populations become very low in winter months) to very stiff systems (for example once an epidemic takes off in a susceptible population) declaring a prespecified leap size will either risk missing critical reactions when tau is large but the system is nonstiff, or will be highly inefficient if tau is small but the system is stiff. The adaptive tau-leaping procedure solves this problem by monitoring the stiffness of the system and determines tau by automatically switching between explicit (best for nonstiff systems) and implicit (best for stiff systems) tau selection formulas to determine tau size compatible for the leap conditions at each step. The details of tau selection, including switching between implicit-and explicit-tau selection over the course of a simulation is outside of the scope of this description, and we refer to initial work by Cao, Gillespie and Petzold (21).

Simulation Initialization
Initial mosquito abundance per person was given by λ, the maximum mosquito:human ratio described in previous sections and every simulation was initialized at the time of peak vector abundance. For each simulation however, a 'settling' period of at least 1 year was incorporated to allow mosquito population dynamics to be driven by their temperature functions, rather than purely by their initialized values. This was important to allow for simulations with index cases introduced at any time of the year, not just at peak vector abundance (initialization). To determine the start-month for each county's simulations (i.e., the month of peak Aedes abundance when TM/TH = λ for each county) we ran 36-month simulations for every county and municipality and recorded calendar month when vector abundance was maximized during each year of the simulations. The calendar month recorded to have highest average mosquito population was taken to be the start month of all further simulations for each respective county (this was important to determine because mosquito populations were initialized by their maximum abundance). Nevertheless, sensitivity analysis of the starting month demonstrated no effect of the particular starting month on outcomes, owing to the winter bottleneck during the minimum 1-year 'settling' period (these results not shown). This was further confirmed by checking that the month of peak Aedes abundance was consistent from year one to year three.
Additionally, our findings of month of peak vector abundance are in agreement with results For each simulation and each vector, Aedes abundance was initialized by λ mosquitos available per person, and the simulation was allowed to progress for at least a full year to allow mosquito dynamics to be driven by their respective temperature dependent functions, as described above, before a single infected individual (index case) was input into a fully susceptible population at a time specified by the simulation of interest, as described in the main text.

Duration
For each simulation, epidemics were initiated by a single index case and were allowed to progress either until 4 consecutive months passed without any new infections, or for a maximum 60 months.

Design and Analysis
For all of the simulations described in the main text, at least 500 simulations were run for each county or municipality and for each scenario, requiring 1.6 million simulations for a full evaluation of single scenario (for example, a June index case in the setting of 10% vaccination is one scenario). We explored running as many as 1000 simulations per county, however results were generally robust, with minimal added information gained beyond even 200 simulations.
Thus, 500 simulations were chosen owing to the already long run times per simulation (3. Summary statistics discussed in the text were calculated from daily summaries (Appendix Table   2).

Computational Environment
All simulations were performed in R installed on Amazon Web Services® elastic cloud We therefore coupled the monthly fractional abundance to the estimated case counts by Chevalier et al. to derive monthly incidence and case counts between April and August 12 th , 2016, which was the duration over which the blood donor screening data was reported. 6 (27,28) No θ: probability that a human who enters the exposed class will become infectious (regardless of symptomatic vs. asymptomatic state) 0.8 (Assumes a fraction of exposed individuals will not become symptomatic nor infectious (distinct from just asymptomatic The probability that an index case will result in at least a single transmission event. In other words, the fraction of simulations with at least with local transmission event.
1A, 3A, 3B, 3D, 3E, 3G, 3H Minimal transmission The idea of defining a county as having 'minimal transmission' is to be highly sensitive to detect places where transmission is possible (even if very unlikely). A county where 'minimal transmission' is achieved is defined as a county that had at least 1 local transmission event occur in at least 0.5% of simulations. Within a given simulation, minimal transmission was considered to be a single local transmission event.

Proportion infected
Among those simulations where minimal transmission from index cases is achieved, the median proportion of the population infected.

Infections
Among those simulations where minimal transmission from index cases is achieved, the median number infected.

1C, 1E
Incidence per 100,000 Among those simulations where minimal transmission from index cases is achieved, the median number infected per 100,000 population.
1D, 3C, 3F, 3I Standardized prevalence (per trimester) Among counties in the southeastern United States and Texas, county-specific prevalence of pregnancy for each respective trimester per month (see Trimester-specific pregnancies and exposure calculations) was standardized for plotting qualitative dynamics on a single scale by scaling to a mean prevalence across months of zero and standard deviation of one.

Standardized incidence
Among those simulations where minimal transmission from index cases is achieved, monthly incidence was standardized for plotting qualitative dynamics on a single scale by scaling to a mean incidence across months of zero and standard deviation of one.

2A
Exposure risk ratios The relative risk of exposure to Zika virus during the first (Figure 2, panel B) or second trimester (Figure 2, panel C) of pregnancy compared to the risk of Zika virus exposure during the third trimester of pregnancy, using data in Figure 2, panel A. This was calculated by first isolating all simulations per county with at least a minimum number of infections during pregnancy. For each simulation in each county, the total number of infections per trimester was calculated (described in the methods) and divided by the total number of pregnancies to obtain a trimester specific ZKV exposure incidence per simulation. For each county, the median trimester specific ZKV exposure incidence was calculated and the risk ratio per county was taken as the median (across simulations) trimester 1 (Figure 2,  . Notable is the difference in the month that maximizes incidence versus the month that maximizes probability of transmission, and that this difference is primarily apparent in the top 10% of counties for each respective metric, but not the bottom 80%, suggesting that among the bottom 80%, transmission chains are not limited by onset of winter, but rather likely die out within a short time-frame, regardless of month of import, and thus overall incidence is driven primarily by mosquito density at time of import. Appendix Figure 13. Monthly incidence of infections throughout Puerto Rico, in the absence of reintroductions. Incidence per month per municipality when index cases were introduced in late 2015 or early 2016 to match timing of initial cases reported in the current epidemic in Puerto Rico. Data shown, and that reported in the text, is absent of reintroductions. Therefore, once an outbreak takes off (cases >10), no reintroductions occur. In the absence of reintroductions here, most municipalities fail to sustain transmission through the winter, suggesting that sustained transmission will be driven by reintroduction events from municipalities where transmission persists uninterrupted, in particular the metropolitan centers of San Juan, Bayamón, Caguas and Ponce. Beyond the first year, these estimates likely underestimate the true state-wide incidence because reintroduction events are expected to be common due to human movement.