Comparison of 2016–17 and Previous Epizootics of Highly Pathogenic Avian Influenza H5 Guangdong Lineage in Europe

We analyzed the highly pathogenic avian influenza (HPAI) H5 epizootic of 2016–17 in Europe by epidemiologic and genetic characteristics and compared it with 2 previous epizootics caused by the same H5 Guangdong lineage. The 2016–17 epizootic was the largest in Europe by number of countries and farms affected and greatest diversity of wild birds infected. We observed significant differences among the 3 epizootics regarding region affected, epidemic curve, seasonality, and outbreak duration, making it difficult to predict future HPAI epizootics. However, we know that in 2005–06 and 2016–17 the initial peak of wild bird detections preceded the peak of poultry outbreaks within Europe. Phylogenetic analysis of 2016–17 viruses indicates 2 main pathways into Europe. Our findings highlight the need for global surveillance of viral changes to inform disease preparedness, detection, and control.

All statistical analyses were performed in STATA 14 (StataCorp, College Station, TX, USA).
Epidemiologic analyses were done to: • Describe the size of the epizootics: frequency of poultry outbreaks and wild bird incidents by country and poultry type or wild bird species were described.
• Investigate differences in geographic spread: spatial analyses were performed using ArcMap 10.2.2 (ESRI, USA) to visualize locations of poultry outbreaks and wild bird incidents, and to assess spatial progression of the epizootics by month. In addition, geographic spread was examined in relation to major wild bird migratory patterns. Countries were grouped into 4 regions based on the broad migration patterns of the major migratory water bird species affected by HPAI (Technical Appendix Figure 1). Justification for these groupings can be found under  • Investigate differences in temporal spread : We compared the different epidemic curves. We tested for differences in the shape of epidemic curves using a 2-sample Kolmogorov-Smirnov test to measure the temporal distance to the median point of the poultry epizootic (day at which half of poultry outbreaks have occurred). In addition, seasonality was compared (period of the year where cases have occurred).
• Investigate differences in clinical presentation: Differences in the number of mass dieoff events observed in wild bird species were tested as a measure of disease impact in wild birds.

Page 3 of 17
Differences in clinical presentation in poultry were tested by comparing poultry morbidity and mortality data for each species in each epizootic. Estimation on morbidity using ADNS data was used from farms rearing only 1 species when possible (e.g., only ducks).
Morbidity was calculated based on 3 variables collected on ADNS:  At risk = number of animals at risk of infection on the farm at time of investigation  Cases = number of cases observed on the farm at time of investigation  Deaths = number of bird dead on the farm at time of investigation Due to different system users inputting data to the ADNS databank, it is suspected that while most observations in the "At risk" data field correspond to the total number of birds on the farm before the outbreak (deceased, moribund and healthy birds), in some instances this field may have been interpreted to represent the number of healthy animals remaining on the farm, or the total number of animals within an epidemiologic unit on the farm. For the "Cases" data field, it is suspected that some entries include both deceased and moribund birds within the estimation, while some entries only report moribund birds, alive with clinical signs. The following assumptions were made for the calculation of morbidity:  Number of "At risk" was believed to be the total number of animals on the farm.
 The "Cases" field includes both the number of dead and moribund birds, except in those entries where the number of deaths is larger than number of cases (74 entries, which is 9.3% of poultry morbidity estimates). When the number of deaths is bigger than number of cases, the total number of cases was believed the sum of "cases" and "deaths."  When the "Cases" field was blank, and the number of "Deaths" was known, the number of deaths was used to represent the number of cases observed on the farm (n = 265, which is 33.4% of morbidity estimates).  All farms with null "Cases" and "Deaths" were discarded, as these were considered to be farms without clinical disease, reported as dangerous contacts due to links with infected farms.

A1.2. Phylogenetic Analysis
Virus haemagglutinin (HA) gene sequence data were obtained from countries' submissions to EURL; sequences generated as part of this study; and from the Global Initiative on Sharing All Influenza Data (GISAID) platform downloaded on June 2, 2017. To determine the genetic relationships among strains circulating in each epizootic, we analyzed viral sequence data from each outbreak separately. Haemagglutinin (HA) gene sequences of viruses from each epizootic were first subject to a quality control step where all duplicate sequences, sequences with <900 nt, and those with >10% undefined nucleotides (Ns) were discarded. Sequences in each dataset were aligned using MAFFT v7.305b, and trimmed to remove signal sequence at the N terminus and STOP codon at the C-terminus. Trimmed sequences were realigned using prank v.170427 with model parameters codon (empirical codon model) and F (force insertions to always be skipped).
IQ-TREE version 1.5.5 was used to infer maximum-likelihood trees from each dataset.
The codon model was determined by the Model Finder algorithm within IQ-TREE, and both alrt (approximate likelihood ratio test, 1000 replicates) and bootstrap (100 replicates) were calculated to determine support for branching. The "best tree" was then annotated in Root-to-tip regression analyses were performed using Tempest v1.5 on the downsampled datasets before Bayesian phylogenetic trees were inferred using BEAST v1.8.4 to determine the mean substitution rate and TMRCA (time to most recent common ancestor). The HKY site model was used with estimated base frequencies and gamma site heterogeneity with 4 gamma categories and 3 codon partitions. An uncorrelated relaxed lognormal clock was used with constant population prior and random starting tree. All priors were set to default except allMus which was set to a uniform distribution ranging from 0 to 1E100. MCMC was set to 100 million generations. Log files were analyzed in Tracer v1.6.0, to determine convergence, and check ESS values were beyond threshold (>200). Tree annotator v1.8.4 was used to generate a maximum credibility tree (MCC) using 10% burnin and median node heights. The MCC tree was Page 5 of 17 then annotated to include time scale and mean substitution rate in FigTree v 1.4.3. The nucleotide diversity for each outbreak was measured using the PopGenome package in R.

A1.3. Biases Associated with ADNS Data
The following data gaps and uncertainties were detected in the poultry data: Bias could also be introduced in the following ways: sensitivity of passive surveillance (procedures for detection by state vet services), intensity of scanning surveillance (serology), frequency of serology positive PCR follow-ups (e.g., sampling at slaughter influence proportion of PCR follow-ups?), and changes to intensity of surveillance following recent outbreaks in close proximity. Differences on implementation of passive surveillance in wild birds by countries are shown in the EU Avian Influenza surveillance report (2).

The following data gaps and uncertainties were detected in the wild bird ADNS data:
Biases related to "Notification date": "Date of suspicion" was missing for 1,191 wild birds incidents (76.4%) in the 2016-17 epizootics and date of confirmation was used as the date of suspicion for the epidemic curves. In addition, seven wild bird incidents had the same suspicion and confirmation date. It is possible that some countries have used the same confirmation date to report all incidents that have occurred within a week. This is possible due to work load pressure during the period with high number of outbreaks.

Biases related to number of wild birds death reported by incident:
It is possible that for many incidents countries reported only those animals that they have tested and confirmed positive and did not report all the birds found dead in each incident.
Heterogeneity in wild bird species and density across space and time may lead to differences in detection rates in some of the following ways: density of wild birds combined with level of human traffic influence the likelihood of an individual coming across a sick/dead bird and reporting it, species of wild birds are heterogeneously distributed across Europe, some species are less likely to exhibit disease symptoms/suffer mortality, temporal variation in wild birds densities and species types due to seasonal migration.
Wild bird surveillance is carried out actively (sampling live healthy birds, mainly detecting LPAI H5 and H7) and passively (sampling sick or found-dead birds, mainly detecting HPAI). Each of these strategies may introduce bias in the following ways: France were assigned to South-West Europe, while southern Germany was included in Central Europe.

A2.1. Number of wild bird deaths reported by incident in the 3 epizootics
Results show that number of wild birds death reported by incident were different between the 2005-06 and 2016-17 epizootic (p<0.001) (Technical Appendix Figure 2).

A2.2. Distribution of poultry flock sizes of infected poultry premises between epizootics
Technical Appendix Figure 3 shows the distribution of poultry flock size in farms detected with H5 HPAI in the 3 epizootics.
Page 9 of 17

A2.4. Comparison of the epidemic curves between the 3 epizootics
Three types of analysis were done to assess differences in the distribution and values of the epidemic curve; differences in the distance to the day at which half of poultry outbreaks have occurred; and difference in seasonality.
Differences in the distribution of the 3 poultry epidemic curves