Volume 25, Number 6—June 2019
Using Big Data to Monitor the Introduction and Spread of Chikungunya, Europe, 2017
With regard to fully harvesting the potential of big data, public health lags behind other fields. To determine this potential, we applied big data (air passenger volume from international areas with active chikungunya transmission, Twitter data, and vectorial capacity estimates of Aedes albopictus mosquitoes) to the 2017 chikungunya outbreaks in Europe to assess the risks for virus transmission, virus importation, and short-range dispersion from the outbreak foci. We found that indicators based on voluminous and velocious data can help identify virus dispersion from outbreak foci and that vector abundance and vectorial capacity estimates can provide information on local climate suitability for mosquitoborne outbreaks. In contrast, more established indicators based on Wikipedia and Google Trends search strings were less timely. We found that a combination of novel and disparate datasets can be used in real time to prevent and control emerging and reemerging infectious diseases.
Many sectors of society have taken full advantage of new opportunities provided by big data, but public health has not (1). Although electronic health records have long been used in surveillance, novel applications of big data are rare. Internet search query data from Google or Wikipedia have been applied to anticipate influenza epidemics but are hampered by several limitations, including specificity and granularity (2–4). More recently, crowdsourcing of symptoms through emails, text messages, or tweets has been explored, and outbreaks have been tracked by scanning high-volume surveillance systems (5,6). However, when it comes to fully harvesting the potential of big data, public health still lags behind other fields. Using chikungunya as a case study, we illustrate how big data can help tackle emerging infectious diseases through prevention, detection, and response.
A key driver of the emergence and spread of vectorborne diseases is human mobility (7–10), yet little is known about the epidemiologic consequences of mobility patterns at different spatial scales within the context of vectorborne diseases. A main obstacle to studying the complex interactions between human hosts, pathogens, and vectors has been the limited availability of spatiotemporal datasets for analyzing human mobility patterns. Prior research relied on low-resolution mobile phone records, such as call and messaging logs from mobile phone networks (11–13), for which biases were notable (14,15). Furthermore, use of mobile phone data for tracking human mobility is likely to be fraught with privacy concerns and data access restrictions (15).
Recently, social media has emerged as an alternative source of real-time, high-resolution geospatial data on a large scale (1,15). Use of this unique aspect of publicly available social media data to study the human dimensions of the introduction and spread of emerging infectious diseases has not been explored to its fullest extent. In areas where risk for virus importation and onward transmission is heightened, such knowledge can inform outbreak preparedness and response planning by pinpointing receptive areas where proactive countermeasures should be implemented in a timely fashion (16,17).
The impediments to using big data in public health are not only the size of the databases but also the complexity of their processing. The challenges include 3 main dimensions: volume, velocity, and variety (18–20). Volume calls for statistical sampling; velocity, for instant access to near real-time transaction data; and variety, for management of nonaligned data structures. We illustrate how big data can be used to monitor the introduction and spread of the 2017 chikungunya outbreak in Europe by tackling these challenges (18–20).
To assess risk for virus importation from international areas with active chikungunya transmission, we extracted air passenger volume from large-scale aviation data. To quantify the risk for short-range dispersion (defined as the potential for onward transmission and spread of chikungunya virus from the initial outbreak foci to other areas during transmission season), we used a mining algorithm to process quasi–real-time, geolocated Twitter activity data and computed mobility patterns of users. We have previously shown that mobility data from Twitter users is predictive of disease spread (21). We then estimated the seasonal vectorial capacity of Aedes albopictus mosquitoes to transmit chikungunya virus and linked it with human mobility patterns. We further complemented these data with Internet and information search activities related to chikungunya infection, vectors, and clinical signs and symptoms collected from Wikipedia and Google Trends. Last, we estimated the empirical basic reproduction number (R0) from the outbreaks and compared these numbers with our model predictions of epidemic potential based on climate conditions. More detail on our methods in Appendix 1.
The vectorial capacity of Ae. albopictus mosquitoes to transmit chikungunya virus in areas of Europe where the vector is established (17), such as the outbreak zones in France and Italy, was estimated to be high in July and August but lower in September and October. Estimates of suitability were low in October for most areas, except those in southern Italy and Greece and southeastern Spain (Figure 1). Overall, warmer than average temperatures led to a substantial increase in vectorial capacity during the study period (June–October 2017) (Appendix 2 Figure 1). Using empirical data from the outbreaks in Italy (22), we estimated R0 to be 2.28 (95% CI 2.01–2.59) for the Anzio region, 3.54 (95% CI 2.62–4.97) for the Rome region, and 3.11 (95% CI 2.16–4.79) for the Calabria region (Figure 2).
On average, ≈50,000 air passenger-journeys (1 passenger flight, including all legs of travel) were taken each month from areas with active chikungunya transmission worldwide to the outbreak zones (Figure 3). Specifically, in August, 56,300 passengers from outbreak zones were estimated to arrive in Rome, 6,484 in Nice, and 5,629 in Marseille. The passenger-journey volume into Europe when the outbreak started in June is shown in Appendix 2 Figure 2. The countries with the highest number of departing passengers in August were Thailand (352,332 passengers), Brazil (255,439 passengers), and India (301,298 passengers). According to molecular epidemiology, the genome sequence of a chikungunya virus isolate from the Lazio region of Italy revealed the East/Central/South African lineage, Indian Ocean sublineage, which is similar to that of recent sequences from Pakistan and India (23). We also extracted air passenger-journey data for flights from the outbreak zones in southeastern France and central Italy to other areas in Europe (Figure 3). The top 5 destinations with the highest volume were the larger metropolitan areas of Europe, most of which were outside the boundaries of areas where the vector is known to be present (Figure 4). However, high flight connectivity was observed from the outbreak zones to Barcelona (Spain) and Catania and Palermo (Italy).
The spatiotemporal analysis of geocoded Twitter data showed strong human mobility from Lazio (Figure 4) and the Var department in France (Appendix 2 Figure 3) toward several larger cities where Ae. albopictus mosquitoes are present. The top 10 estimates of mobility out of the 2 outbreak zones of Var and Lazio showed the strongest pattern for potential dispersion of chikungunya virus not only into the areas geographically close to the outbreak zones but also to several relatively large cities in Italy, France, and Spain (Table). The monthly mobility patterns during the study period varied between months; for example, the vacation month of August showed a stronger mobility pattern out of Var to areas not in direct connectivity, most notably to Rome (Appendix 2 Figure 4). When we contrasted the mobility proximities between the 2 outbreak zones, we observed the highest proximities within countries (Figure 4; Appendix 2 Figure 3). Although the Var and Lazio outbreak zones experienced high mobility proximity to Barcelona, Lazio was also highly connected to southern Italy (e.g., Catania and Palermo), in close proximity to the chikungunya outbreak in the Calabria area, which was also observed in the International Air Transport Association (IATA) flight passenger data (Figures 3, 4). In Italy, cases were first notified in Anzio at the end of June, followed by notifications in Rome later in July, and in Calabria in early August in order of temporal appearance (Figure 2). In our mobility analysis, we identified the mobility links to all outbreak regions (Figure 4), with the exception of the Emilia-Romagna region, although the region neighboring Emilia-Romagna was positive in our analysis. The mobility patterns correlated more strongly to the outbreak regions in July and August.
A closer look at the Lazio outbreak zone in Italy revealed strong connectivity between Anzio (where the first cases in Italy were confirmed) and Rome (where a higher number of cases were notified) (Figure 5). We compiled the top 10 mobility proximity areas from the outbreak zones of Anzio and Rome in August and September (Table). Although the highest mobility proximity from Anzio was to Rome in August and September, the mobility proximity from Rome to Anzio was also found among the top 10 destinations. Overall, Rome had higher connectivity to many more areas than Anzio.
We derived risk maps for autochthonous chikungunya transmission by combining the vectorial capacity and mobility proximity estimates for the Lazio region in Italy and Var department in France for August–October 2017 (Appendix 2 Figure 4). The areas at risk because of the outbreak in Var were identified to be located along the French and northern Spanish Mediterranean coastlines, Mallorca, and Rome in August (Appendix 2 Figure 4); the risk regions for the Lazio outbreak in August included large parts of Italy as well as areas in France, Spain, and Greece (Figure 6). In general, the size of the area at risk contracted in September and more so in October because of less favorable climate conditions, except in the most southern region of Italy (Figure 6), such as the Calabria region, where the outbreak also empirically continued longer in the fall (Figure 2).
In the Lazio region, an analysis of the combination of vectorial capacity (Appendix 2 Figure 5) and mobility proximity revealed a higher transmission potential in August (Appendix 2 Figure 6), with implications for targeting surveillance and outbreak control activities to this region. The largest area of risk for spread from Anzio was Rome, but the risk for spread from Rome was more widespread in the region (Appendix 2 Figures 6, 7). The areas at risk for spread in the Lazio region differed during August compared with September and October.
For the outbreaks in Italy, several pathogen and vector-related Wikipedia and Google Trend search pattern anomalies are illustrated (Appendix 2 Figure 8). The peaks in these abnormalities coincided with the peak of the outbreak and therefore are not useful for early detection and response activities. Detailed information about Wikipedia and Google Trend indicators are provided in Appendix 3.
In light of the arrival and explosive expansion of chikungunya in the Americas in 2013 through Ae. aegypti moquitoes (24), big data offer the opportunity to monitor the introduction and spread of chikungunya in Europe. An outbreak can be divided, broadly speaking, into 2 distinct phases. The first phase is importation of the virus via a viremic person into a virus-naive population. For this phase, we used big data (volume) to estimate air passenger-journeys from areas with active chikungunya transmission as a measure of the force of introduction of the virus into the outbreak zones in Europe. To identify areas with onward transmission risk, we also considered the volume of air passengers leaving these outbreak zones. For the second phase, the establishment of autochthonous transmission in Europe is a function of virus importation, population density, vector activity, climate conditions, exposure patterns, and several other factors that are more difficult to quantify (17). Our study addressed some of these epidemiologic challenges by using big data. Rather than a Twitter content analysis, which has been performed for several outbreaks (25–28), we used near–real-time geocoded Twitter data (velocity) to quantify human mobility patterns and disentangled connectivity between populations. Mobility estimates also reflect population density and indirectly take into account exposure patterns because such populations on the move are occasionally susceptible to exposure and are also a source of exposure. The ecology of the virus and the human-vector transmission cycle were captured by vectorial capacity (variety), which quantified transmission risk on the basis of climate conditions. Thus, we were able to quantify the trajectory of an arbovirus outbreak by dissecting and better understanding its phases.
Our analysis of big data revealed distinct mobility patterns between the outbreak zones in France and Italy, between Rome and Anzio, and between Rome and most of the local outbreak clusters in Italy. However, the potential effects of these mobility patterns on local spread need to be confirmed epidemiologically by phylogenetic analyses. Although the sensitivity of our risk maps based on mobility and climate data to identify areas at risk for virus spread was good, the specificity needs to be further improved, for example, by including local contextual factors such as land use and vector activity. Wikipedia page hits and Google Trends have been proposed as resources for disease surveillance and outbreak detection. However, our analysis demonstrates that these sources seemed to mainly indicate public awareness of the chikungunya outbreaks as they peaked. For such reasons, they seem to be of little use for early response.
The combination of short-distance air passenger-journeys (within Europe, as opposed to overseas) and geocoded Twitter data lends itself to cross-validation. We found that the 2 approaches consistently identified several cities with established vector populations at a heightened risk for virus importation, reflecting the potential for spread between countries and cities in Europe. Some of these regions had previously encountered autochthonous transmission (29).
The R0 estimates, which were derived by using epidemiologic data, were in accordance with the vectorial capacity predictions for the outbreak zones based on local climate conditions. Based on the vectorial capacity, R0 can be derived by multiplication with the infectious period. For chikungunya, an infectious period of 3–7 days was reported (30). The vectorial capacity of ≈0.7 would give rise to an R0 of ≈2–3. This range is within that which we observed in the Rome and Anzio regions in July and August, but the vectorial capacity was estimated to be higher (≈0.8) in the Calabria region, translating into an R0 of just over 3–4, which is in agreement with the epidemiologic analysis of the outbreak data (Figure 2).
Although our mobility analysis showed that the local mobility from Var was considerable, no autochthonous chikungunya cases were reported from other identified risk regions along the Mediterranean coast of France and in northern Spain. However, the vectorial capacity of Ae. albopictus mosquitoes to transmit the virus is lower in Var than in Lazio, which may explain this discrepancy. Previous studies assessing the risk for local outbreaks after outbreaks outside of Europe found that inbound flight traveler frequencies correlated strikingly well with local reports of virus importation frequencies into Europe (9). However, most of these studies evaluated these risks independently and did not attempt to estimate the combined risk for virus importation and climate suitability (31,32). Moreover, they did not assess local dispersion patterns from airports or outbreak areas. We analyzed big data for long- and short-distance mobility. A major strength of this big data approach is the near real-time availability of mobility patterns based on social media, which are timelier and more accessible and less costly than air passenger data available from commercial providers, such as the IATA. This approach can identify areas of heightened mobility that are potentially at risk for onward transmission, as we have shown in this analysis. Geocoded Twitter data can be a good proxy for human mobility (15), but prior research did not explore how such data can be a timely resource for preparedness and response to infectious disease outbreaks.
Similar to others who have used IATA and Twitter data in their studies, we found these novel data sources to be reliable and useful. However, we note that Twitter data can potentially be biased because Twitter users may represent a select population whose mobility patterns differ from those of the general population; more specifically, they represent a population of Twitter users who have allowed Twitter to follow their geolocations. Future studies need to validate the use of social media data in such applications. These methods are an improvement over mobile telephone tracking data because they do not rely on a single provider network and are a less costly data source to acquire.
Seasonal weather forecasts may have provided better input into the assessment of vectorial capacity, specifically for the fall of 2017. Moreover, autochthonous transmission risk may also be related to local proliferation of vectors and local environmental, social, and behavioral characteristics, such as awareness about the symptoms of chikungunya (Appendix 3). Such factors have been found to be associated with the local transmission risk for dengue (33). Last, because of the paucity and underreporting of chikungunya cases, we may have potentially underestimated the passenger volume from active transmission areas in Africa.
This study illustrates the potential value of using big data (18–20) to pinpoint areas at risk for the introduction and dispersion of emerging infectious diseases. The analysis identified that the areas at greatest risk were those in close proximity to the original outbreaks and several larger metropolitan areas. The trajectory and sustained spread of emerging infectious diseases can be anticipated with predictive modeling in realtime. This study suggests that big data can be an indispensable tool for the prevention and control of emerging infectious diseases.
J.R. received partial funding from the Swedish Research Council for Sustainable Development (FORMAS) (no. 2017-01300). The funder had no influence on the research conducted.
Dr. Rocklöv is professor of epidemiology and public health at Umeå University. His research interests focus on how climate, environmental, biological, medical, and social information can benefit preparedness and control of infectious diseases and be applied to early warning and response systems.
- Simonsen L, Gog JR, Olson D, Viboud C. Infectious disease surveillance in the big data era: towards faster and locally relevant systems. J Infect Dis. 2016;214(suppl_4):S380–5.
- Butler D. When Google got flu wrong. Nature. 2013;494:155–6.
- Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009;457:1012–4.
- McIver DJ, Brownstein JS. Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time. PLOS Comput Biol. 2014;10:
- Health HSoP. HealthMap. 2016 [cited 2017 Jan 5]. http://www.healthmap.org
- World Health Organization [cited 2017 Jan 5]. http://www.who.int/csr/alertresponse/epidemicintelligence
- Semenza JC, Lindgren E, Balkanyi L, Espinosa L, Almqvist MS, Penttinen P, et al. Determinants and drivers of infectious disease threat events in Europe. Emerg Infect Dis. 2016;22:581–9.
- Semenza JC, Rocklöv J, Penttinen P, Lindgren E. Observed and projected drivers of emerging infectious diseases in Europe. Ann N Y Acad Sci. 2016;1382:73–83.
- Semenza JC, Sudre B, Miniota J, Rossi M, Hu W, Kossowsky D, et al. International dispersal of dengue through air travel: importation risk for Europe. PLoS Negl Trop Dis. 2014;8:
- Stoddard ST, Morrison AC, Vazquez-Prokopec GM, Paz Soldan V, Kochel TJ, Kitron U, et al. The role of human movement in the transmission of vector-borne pathogens. PLoS Negl Trop Dis. 2009;3:
- Bengtsson L, Gaudart J, Lu X, Moore S, Wetter E, Sallah K, et al. Using mobile phone data to predict the spatial spread of cholera. Sci Rep. 2015;5:8923.
- Finger F, Genolet T, Mari L, de Magny GC, Manga NM, Rinaldo A, et al. Mobile phone data highlights the role of mass gatherings in the spreading of cholera outbreaks. Proc Natl Acad Sci U S A. 2016;113:6421–6.
- Wesolowski A, Metcalf CJ, Eagle N, Kombich J, Grenfell BT, Bjørnstad ON, et al. Quantifying seasonal population fluxes driving rubella transmission dynamics using mobile phone data. Proc Natl Acad Sci U S A. 2015;112:11114–9.
- Bansal S, Chowell G, Simonsen L, Vespignani A, Viboud C. Big data for infectious disease surveillance and modeling. J Infect Dis. 2016;214(suppl_4):S375–9.
- Jurdak R, Zhao K, Liu J, AbouJaoude M, Cameron M, Newth D. Understanding human mobility from Twitter. PLoS One. 2015;10:
- Semenza JC, Zeller H. Integrated surveillance for prevention and control of emerging vector-borne diseases in Europe. Euro Surveill. 2014;19:20757.
- Semenza JC, Suk JE. Vector-borne diseases and climate change: a European perspective. FEMS Microbiol Lett. 2018;365:365.
- Laney D. 3D data management: controlling data volume, velocity, and variety [cited 2019 Apr 3]. https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
- Heitmueller A, Henderson S, Warburton W, Elmagarmid A, Pentland AS, Darzi A. Developing public policy to advance the use of big data in health care. Health Aff (Millwood). 2014;33:1523–30.
- Gandomi AHM, Haider M. Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manage. 2015;35:137–44.
- Ramadona AL, Tozan Y, Lazuardi L, Rocklöv J. A combination of incidence data and mobility proxies from social media predicts the intra-urban spread of dengue in Yogyakarta, Indonesia. PLoS Negl Trop Dis. 2019;13:
- Istituto Superiore di Sanita. Italy: autochtonous cases of chikungunya virus [cited 2019 Mar 21] http://www.salute.gov.it/portale/temi/documenti/chikungunya/bollettino_chikungunya_20171221.pdf
- Carletti F, Marsella P, Colavita F, Meschi S, Lalle E, Bordi L, et al. Full-length genome sequence of a chikungunya virus isolate from the 2017 autochthonous outbreak, Lazio region, Italy. Genome Announc. 2017;5:e01306–17.
- Leparc-Goffart I, Nougairede A, Cassadou S, Prat C, de Lamballerie X. Chikungunya in the Americas. Lancet. 2014;383:514.
- Chew C, Eysenbach G. Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PLoS One. 2010;5:
- Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS One. 2011;6:
- Kim EK, Seok JH, Oh JS, Lee HW, Kim KH. Use of hangeul twitter to track and predict human influenza infection. PLoS One. 2013;8:
- Broniatowski DA, Dredze M, Paul MJ, Dugas A. Using social media to perform local influenza surveillance in an inner-city hospital: a retrospective observational study. JMIR Public Health Surveill. 2015;1:
- Italy Ministry of Health. National plan of surveillance and response to arbovirus transmitted by mosquitoes (aedes sp.), with particular reference to chikungunya, dengue and zikaviruses–2017 [cited 2019 Apr 3]. http://www.salute.gov.it/portale/temi/documenti/chikungunya/bollettino_chikungunya_20171221.pdf
- Centers for Disease Control and Prevention. Travelers health. Chapter 3. Infectious diseases related to travel: chikungunya [cited 2018 Oct 16]. https://wwwnc.cdc.gov/travel/yellowbook/2018/infectious-diseases-related-to-travel/chikungunya
- Rocklöv J, Quam MB, Sudre B, German M, Kraemer MUG, Brady O, et al. Assessing seasonal risks for the introduction and mosquito-borne spread of Zika virus in Europe. EBioMedicine. 2016;9:250–6.
- Faria NR, Azevedo RDSDS, Kraemer MUG, Souza R, Cunha MS, Hill SC, et al. Zika virus in the Americas: Early epidemiological and genetic findings. Science. 2016;352:345–9.
- Reiter P, Lathrop S, Bunning M, Biggerstaff B, Singer D, Tiwari T, et al. Texas lifestyle limits transmission of dengue virus. Emerg Infect Dis. 2003;9:86–9.
TableCite This Article
Original Publication Date: 5/3/2019