Using Big Data to Monitor the Introduction and Spread of Chikungunya, Europe, 2017

With regard to fully harvesting the potential of big data, public health lags behind other fields. To determine this potential, we applied big data (air passenger volume from international areas with active chikungunya transmission, Twitter data, and vectorial capacity estimates of Aedes albopictus mosquitoes) to the 2017 chikungunya outbreaks in Europe to assess the risks for virus transmission, virus importation, and short-range dispersion from the outbreak foci. We found that indicators based on voluminous and velocious data can help identify virus dispersion from outbreak foci and that vector abundance and vectorial capacity estimates can provide information on local climate suitability for mosquitoborne outbreaks. In contrast, more established indicators based on Wikipedia and Google Trends search strings were less timely. We found that a combination of novel and disparate datasets can be used in real time to prevent and control emerging and reemerging infectious diseases.


Twitter Data
We developed a mining algorithm and collected Tweets by using the Twitter Streaming Application Programming Interface (https://developer.twitter.com). Although the tweets collected from the API represent only 1% of the total Tweeter feed, when geographic boundary boxes are used for data collection it provides a high representation of the overall geo-located activity on Twitter (8). We filtered the collected tweets based on location by using geocodes, and we extracted only those originating from the study area in July, August, and up to September 19, 2017. We longitudinally analyzed 8,120,417 Tweets. When Tweets from the same users could be followed by geographic coordinates, we obtained users' individual files. We analyzed unidirectional mobility of Twitter users by estimating the frequency of a user being observed in a specific geographic department within the study area and later being observed in any other department within the same month. To compute a rate, we aggregated the total number of movements in a month between any 2 departments and divided this by the total movement across all the departments. The range of all between-department mobility values was 0-1 and added up to 1 when summarized across the departments for inbound and outbound movements. We derived this quantity as a proxy for mobility proximity between any 2 departments and computed it for each month.

Vectorial Capacity
To estimate seasonal variability in the ability of Ae. albopictus mosquitoes to transmit chikungunya virus, we modified our previously established climate dependent vectorial capacity arbovirus models (9,10). The model uses temperature and diurnal temperature range to estimate the epidemic potential of an outbreak. Theoretically, vectorial capacity is related to R0. More exactly, the R0 is a function of vectorial capacity (VC) and duration of viremia in humans (Th), that is R0 = VC × Th. Vectorial capacity is a function of vector competence, vector lifespan, and extrinsic incubation period (11) and is defined mathematically in Appendix 3.
The 4 vector-related parameters in the vectorial capacity are 1) average vector biting rate, a; 2) the product of the probability of vector infection (bmi) and transmission per bite (bmt), bm; 3) extrinsic incubation period, n (i.e., the interval between the acquisition of a pathogen by a vector and the vector's ability to then transmit the pathogen to another susceptible host); and 4) vector mortality rate, μm ; and 4, female vector-to-human population ratio, m.
The effect of temperature on the ability of Ae. albopictus mosquitoes to transmit chikungunya virus has not been well studied. However, μm and a in relation to temperature have been described for Ae. albopictus mosquitoes. We assumed that n, bm would have a dependence on temperature for chikungunya virus transmission similar to that for dengue virus, although we found evidence to support that it can be slightly lower at around 90% (11)(12)(13) and that n is shorter, peaking at around 8 instead of 10 days (11)(12)(13). Similar to a previous study (9), m was assumed to be proportional to its temperature-dependent survival curve. Parameter relationships used in the analysis are provided in Appendix 1 Figure.

Climate Data
We used data from the Climate Research Unit of East Anglia University (14) to estimate the average vectorial capacity for July, August, September, and October during 1996-2015. To describe the effect of warmer than usual temperature, we increased the average monthly temperature to its 75th percentile value for each month and recalculated the vectorial capacity.
The Climate Research Unit data, originally provided in 0.5° × 0.5° grids by latitude and longitude, were resampled to fit into a grid of 0.01° to better align with the geographic departments of the study area.