Reassortment of Influenza A Viruses in Wild Birds in Alaska before H5 Clade 2.3.4.4 Outbreaks

Sampling of mallards in Alaska during September 2014–April 2015 identified low pathogenic avian influenza A virus (subtypes H5N2 and H1N1) that shared ancestry with highly pathogenic reassortant H5N2 and H5N1 viruses. Molecular dating indicated reassortment soon after interhemispheric movement of H5N8 clade 2.3.4.4, suggesting genetic exchange in Alaska or surrounds before outbreaks.


Virus Isolation and Sequencing
Viral RNA was extracted using the Omega Mag-Bind Viral DNA/RNA Kit (Omega Bio- Tek, Norcross, GA) and a Kingfisher magnetic particle processor (Thermo Scientific, Waltham, MA). RNA was screened for the presence of influenza A virus (IAV) using qScript XLT onestep RT-qPCR Tough Mix (Quanta Biosciences, Gathersburg, MD) targeting the matrix gene.
PCR assays were run on an ABI 7500 real-time PCR System (Applied Biosystems, Foster City, CA). PCR-positive samples (Ct value <45) were subjected to a H5-specific rRT-PCR to identify potentially highly pathogenic samples (4). All H5-positive samples were sent to the U.S. National Poultry Research Center, USDA to culture potential HPAI viruses. To amplify virus, positive VTM (100ul) was inoculated into the allanotic cavity of 9-11 day old embryonating specific pathogen free chicken eggs (Charles River, CT) and incubated at 37°C for 72 hours or until embryo death, as detected by daily candling. RNA was extracted from the allantoic fluid and the matrix rRT-PCR repeated (5). Whole-genome sequencing was performed at the J. Craig Venter Institute in Rockville, MD as described by Nelson et al. (6) and all sequences deposited into GenBank (online Technical Appendix Table 1). Subtyping was confirmed by BLASTn of the sequence against isolates in GenBank and identifying the subtype match that showed highest percentage identity.

Multiple Sequence Alignment
Global IAV sequences were downloaded from the Influenza Research Database (IRD [7]) and Global Initiative on Sharing All Influenza Data (GISAID) during the period between 1/17/16 -2/16/16. Only full genomic segments (PB2, PB1, PA, H5, NP, N2, N1, M and NS) of isolates collected during the period (1976 -2016) were considered. Only sequences associated with a full date of collection (DD/MM/YYYY) were included. To facilitate filtering out duplicate sequences, taxa names were edited to remove apostrophes, brackets and all hyphens were replaced with underscores. Sequences containing misreads (NNNs) were deleted to improve the quality of alignments and subsequent trees. Sequences were then aligned using MUSCLE v3.8.31 (8), inspected visually in Geneious v7.1.5 to remove indels and taxa containing premature termination codons, then re-aligned once again using MUSCLE.

Down-Sampling before Tree Reconstruction
The aligned global sequence datasets were down-sampled to reduce the number of sequences to a size suitable for molecular clock Bayesian phylogenetic analysis (~300 taxa). The to ensure convergence on the same optimal tree. After removing 10 -30% burn-in from each chain we combined trees and constructed maximum clade credibility trees. High similarity between the closest 10 and 25 relatives for the 3 metrics tested (online Technical Appendix Table 2) validated our decision to use the closest 10 relatives to reduce computational burden without compromising the reproducibility of the Bayesian trees. This data-driven approach guided how to down-sample the internal segments. For the H5 and N2 segments, we included the closest 100 relatives, and for N1, we included the closest 25 relatives owing to the limited taxa available.
To achieve down-sampling stratified by timea strategy we used to increase accuracy of molecular dating of divergence events, a maximum of 100 taxa were randomly selected from our

Bayesian Phylogenetic Analysis
Dated phylogenetic trees for each segment were constructed using BEAST v1.8.3. At least 4 independent MCMC chains, of 40 -80 million generations each, were run using the GMRF Bayesian skyride coalescent tree prior and uncorrelated lognormal distribution to produce 10,000 trees. To ensure MCMC chains converged on the same optimal tree, runs were visually inspected in Tracer v1.6.0. After removing 10 -30% burn-in from each chain, the trees were combined to produce a single maximum clade credibility (MCC) tree.

Molecular Dating Analysis
The time of most recent common ancestry (tMRCA) for each taxon was inferred from the nodes of the MCC tree inspected in FigTree v1.4.3. The tMRCA dates and 95% highest posterior densities (95% HPD) for all segments were used to assess the order of gene introduction events ( Figure 1, panel A). We also considered the posterior probabilities associated with each branch node when estimating gene segment introduction events. Only gene segment introductions associated with a posterior probability greater than 0.85 were considered such that bifurcating nodes with low support were ignored. We considered this an important step in view of the uneven sampling regime (targeting poultry farms) and low number of viruses sequenced during the outbreak period. As a result, not all influenza segments contributed equally to this analysis.   Figures 4-12) are shaded in gray.