Presence of Segmented Flavivirus Infections in North America

Identifying viruses in synanthropic animals is necessary for understanding the origin of many viruses that can infect human hosts and developing strategies to prevent new zoonotic infections. The white-footed mouse, Peromyscus leucopus, is one of the most abundant rodent species in the northeastern United States. We characterized the serum virome of 978 free-ranging P. leucopus mice caught in Pennsylvania. We identified many new viruses belonging to 26 different virus families. Among these viruses was a highly divergent segmented flavivirus whose genetic relatives were recently identified in ticks, mosquitoes, and vertebrates, including febrile humans. This novel flavi-like segmented virus was found in rodents and shares ˂70% aa identity with known viruses in the highly conserved region of the viral polymerase. Our data will enable researchers to develop molecular reagents to further characterize this virus and its relatives infecting other hosts and to curtail their spread, if necessary.

Identifying viruses in synanthropic animals is necessary for understanding the origin of many viruses that can infect human hosts and developing strategies to prevent new zoonotic infections. The white-footed mouse, Peromyscus leucopus, is one of the most abundant rodent species in the northeastern United States. We characterized the serum virome of 978 free-ranging P. leucopus mice caught in Pennsylvania. We identified many new viruses belonging to 26 different virus families. Among these viruses was a highly divergent segmented flavivirus whose genetic relatives were recently identified in ticks, mosquitoes, and vertebrates, including febrile humans. This novel flavi-like segmented virus was found in rodents and shares ˂70% aa identity with known viruses in the highly conserved region of the viral polymerase. Our data will enable researchers to develop molecular reagents to further characterize this virus and its relatives infecting other hosts and to curtail their spread, if necessary. segments, 2 of which encode a polymerase protein (NS5) and a helicase protein (NS3) that show close phylogenetic relatedness with the corresponding proteins of classical flaviviruses (9). Later, several genetically diverse relatives of JMTV were found in several species of ticks, insects, and mammals (9)(10)(11)(12)(13). Together, these JMTV-like viruses are highly diverse and show differences in the number of genomic segments as well as protein coding strategies (9,10,12). In 2018, a metagenomics study revealed the presence of JMTV-like sequences in serum samples from human patients with Crimean-Congo hemorrhagic fever in Kosovo (14). Two studies published in 2019 reported the presence of JMTV sequences in humans in China with febrile illness and a history of recent tick bites (15,16). To date, no information is available on the presence of JMTV infections in insects, ticks, or vertebrates in North America.
Knowledge about viruses infecting P. leucopus and the levels to which humans are being exposed is limited. Although several studies have examined the prevalence of hantaviruses (17) and highly diverse hepaciviruses (18), there has been no attempt to characterize the blood virome of these common rodents. We used an unbiased, metaviromics approach to identify all viruses in the serum samples of 978 freeranging white-footed mice captured over a period of 7 years from suburban and wild areas of Pennsylvania.

Origin and Details of White-Footed Mouse Samples
We collected serum samples from P. leucopus mice each spring and autumn during 2011-2017 from 4 sites in central Pennsylvania. We selected these sites because they provided gradient levels of human activity and so exposure to rodents; site 1 had the highest level of exposure and site 4 had the lowest. At site 1 (Deer Pens), mouse infestation is reported in residences, workplaces, and the surrounding area, providing a potentially high level of exposure to rodents. Site 2 (Spray Fields) is a disturbed forest where treated wastewater is sprayed; it is frequented daily by maintenance employees and dog walkers, but there are no residences on site. Site 3 (Scotia) is on Pennsylvania state game lands that border housing communities and has mostly seasonal visitors, such as hikers, hunters, and trappers. Site 4 (Stone Valley) is in a large contiguous forest that has very few people or residences. Additional details about the sites have been described previously (19).
Whole blood was obtained from the retro-orbital sinus of the anesthetized mice. These blood samples were centrifuged at 8,000 × g for 10 min and the serum was then collected with a micropipette. Samples were stored at −80°C. All procedures were approved by Penn State's Institutional Animal Care and Use Committee (IACUC #46246).

Metagenomics, Metaviromics, and Bioinformatics
We used serum samples from 978 wild free-ranging P. leucopus mice to generate the high-throughput sequence data to characterize the virome of these rodents. We used a 10-µL subsample of serum from each individual mouse to create 9 pools. We assigned individual mice to pools by bodyweight (as a proxy for age) and study site. We centrifuged serum pools at 8,000 × g for 10 min to remove particulate matter. The supernatants were treated with DNase (100 U), RNase (20 U), and Benzonase (250 U) to enrich samples for particle-protected (virion) nucleic acids. We used the QIAamp Viral RNA Mini Kit (QIAGEN Inc., https:// www.qiagen.com) to extract nucleic acid from the serum pools and eluted it in 60 μL of elution buffer supplemented with 40 U of ribonuclease inhibitor, then stored at it −80°C before library preparation. We used a unique 20-nt barcoded oligonucleotide primer for each sample pool during reverse transcription PCR (RT-PCR) and second-strand DNA synthesis, as previously described (20). We prepared libraries for Illumina sequencing as previously described (21) and performed sequencing on a HiSeq 4000 platform (Illumina Inc., https://www.illumina.com) for 2 × 150 cycles in the Biomedicine Genomics Core at the Research Institute of Nationwide Children's Hospital (Columbus, Ohio, USA).
We used the 20-nt unique barcodes included in the primers to make libraries to demultiplex the sequence data. After removing low-complexity regions and low-quality bases, we aligned pairedend reads to the P. leucopus genomes with Bowtie2 mapper version 2.0.6 (SourceForge, https://sourceforge.net) to remove the host-derived sequences. We used MIRA version 4.0 (SourceForge) (22) for de novo assembly of remaining sequences. Finally, we analyzed all contigs and unique reads using MegaBLAST https://blast.ncbi.nlm.nih.gov/Blast. cgi) against the GenBank nonredundant nucleotide database. Next, we used BLASTX to analyze sequences that showed poor or no homology (e-score >0.001) against the viral GenBank protein database, followed by BLASTX against the nonredundant protein database. We then used reference genomes of known viruses available in GenBank to extract related sequences present in the 9 serum pools. Finally, we used specific PCR assays and traditional dideoxy sequencing of amplicons to confirm bioinformaticsbased assembly of virus reads.

Presence and Prevalence of Flavi-Like Segmented Virus and South Bay Virus
We selected 2 viruses recently identified in ticks for further characterization: a highly divergent JMTV-like virus, provisionally named flavi-like segmented virus (FLSV or FLSV-US); and South Bay virus (SBV), a tick virus that belongs to the family Nairoviridae, order Bunyavirales (23,24). We extracted individual serum samples from the 72 P. leucopus mice that made up the pool with the maximum number of unique FLSV-US sequence reads and screened for the presence of FLSV-US RNA by using a nested RT-PCR targeting the conserved NS5 polymerase region. We used the primers FLSV-US-F1 (5′-GGWGCYATGGGYTACCAGAT-3′) and FLSV-US-R1 (5′-TCCARGGTGAGTARTCCTTTCG-3′) in the first round of PCR, and FLSV-US-F2 (5′-GGW-GCYATGGGYTACCAGATGGA-3′) and FLSV-US-R2 (5′-CCARGGTGAGTARTCCTTTCGARATC-3′) in the second round. The first-round PCR cycle included 8 min of initial denaturation at 95°C; 10 cycles of 95°C for 40 s, 56°C for 1 min, and 72°C for 1 min; 30 cycles of 95°C for 30 s, 54°C for 30 s, and 72°C for 1 min; and a final extension at 72°C for 5 min. In the first 10 cycles, the annealing temperature was ramped down by 0.5°C each cycle to enable nucleotide mismatch tolerance during primer hybridization. The second-round PCR conditions included 8 min of initial denaturation at 95°C; 10 cycles of 95°C for 40 s, 60°C for 1 min, and 72°C for 1 min; 30 cycles of 95°C for 30 s, 58°C for 30 s, and 72°C for 1 min; and a final extension at 72°C for 5 min.
We used a heminested RT-PCR targeting the viral polymerase gene to screen serum samples from the same 72 mice for the presence of SBV. We used the primers SBV-F1 (5′-AYCCAGATTGGAARCACTTCATA-ATG-3′) and SBV-R1 (5′-CCATATGTDGTAATMA-CYTTWGCATA-3′) for first round of PCR, and SBV-F2 (5′-GTTATGTTGAAGGACCTTAACAAAG-3′) and SBV-R1 for the second round. The PCR cycle for both rounds of PCR included 8 min of initial denaturation at 95°C; 10 cycles of 95°C for 30 s, 55°C for 2 min, and 72°C for 1 min; 30 cycles of 95°C for 25 s, 54°C for 30 s, and 72°C for 30 s; and a final extension at 72°C for 5 min. In the first 10 cycles, the annealing temperature was ramped down by 0.5°C each cycle to enable nucleotide mismatch tolerance during primer hybridization step.

Phylogenetic Analysis and Genome Organization of FLSV-US
We aligned sequences that showed substantial similarity with JMTV NS5 proteins with reference sequence YP_009029999.1 (GenBank accession no. MN811583) to design PCR primers for direct amplification of 2,568-nt FLSV-US sequences. After we confirmed the FLSV-US assembled sequence by Sanger sequencing, we aligned the translated protein sequence with the sequence of known JMTV-like viruses (aa 55 to 913 of NS5 protein) using BLOSUM protein weight matrix, using default parameters in MEGA 7.1 (25). We constructed a phylogenetic tree using the maximumlikelihood method and the best pattern substitution model with the lowest Bayesian information criterion score, LG+G+I model (Le Gascuel, Gamma distribution with 5 rate categories and evolutionary invariable sites) (26). To confirm that the FLSV-US genome is segmented, we used FLSV-US reads showing substantial similarity to JMTV NS3 protein for acquiring the 3′ end of FLSV-US NS3 coding segment using a poly-T oligonucleotide-primed cDNA synthesis followed by specific PCR, as previously described (27). The 3′ end sequence of FLSV-US NS3 coding segment and the complete 3′ untranslated region (UTR) is available in GenBank (accession no. MN811584).

Serum Virome of P. Leucopus
A total of 242 million paired-end sequences were generated from the 9 pools representing the serum samples of 978 P. leucopus. Of these, 148 million (61%) were derived from the host genome. Of the remaining 94 million sequences, 65.6% of reads showed substantial similarity to known viruses (E-value <0.001). Further analysis classified these sequences into 26 known RNA and DNA virus families (Figure 1).

DNA Virome
We found parvoviruses, circoviruses, Torque teno viruses (TTV), polyomaviruses, and papillomaviruses in all 9 serum pools, and these were the most abundant DNA viruses. Among parvoviruses, bocaparvovirus (BocPV) sequences shared up to 90% aa identity with the nonstructural protein of a rodent BocPV identified in Brazil (28), and adeno-associated viruses (AAV) shared up to 65% aa identity with the nonstructural protein of a caprine AAV (29). The identified TTV sequences were genetically equidistant from rodent TTV (GenBank accession no. AEF5869) and mosquito TTV (GenBank accession no. AEF58766) (30); the sequences shared up to 60% aa identity with these 2 species of the genus Omegatorquevirus (31). The polyomavirus sequences shared up to 79% aa identity with the large T antigen of a polyomavirus isolated from the Montane grass mouse (Akodon montensis) (32). These papillomavirus sequences belong to a tentatively identified new virus species within the genus Iotapapillomavirus that shares up to 79% aa identity with the papillomavirus found in P. maniculatus (Gen-Bank accession no. NC_039039) (33).

RNA Virome
The most abundant RNA viruses present in the P. leucopus serum pools were the genetically diverged variants of hepaciviruses and pegiviruses (34). Phylogenetic analyses of these sequences indicated that the P. leucopus hepaciviruses form a distinct new genetic cluster (35). We found sequence reads of a novel hepevirus in 8 of the 9 serum pools. Genetic analysis of these P. leucopus hepevirus sequences suggest it is a new member of the genus Orthohepevirus because it shares <75% aa identity with the capsid proteins of hepevirus isolated from the common kestrel, Falco tinnunculus (36), and from rodents (37). Astroviruslike sequences found in 8 serum pools shared up to 64% aa identity in the capsid protein and up to 73% aa identity in the nonstructural protein with the known species of the genus Mamastrovirus. Three of the 9 serum pools had sequences that were genetically closest to bat astroviruses (38) but shared up to 61% aa identity in the nonstructural protein. Sequences sharing up to 97% aa identity with hantaviruses recently found in Peromyscus sp. were also present in 6 serum pools (39). Coronavirus-related sequences were found in 2 serum pools and shared 98% aa identity with the porcine hemagglutinating encephalomyelitis virus (GenBank accession no. ACH72649). Paramyxovirus sequences were present in 5 serum pools and shared up to 90% nt identity with paramyxoviruses isolated from 3 rodent species: Apodemus peninsulae (accession no. KY370098), Rattus norvegicus (accession no. KX940961), and R. andamanensis (accession no. JN689227).

SBV
Analysis of virome data showed the presence of SBV sequences in 8 of the 9 P. leucopus serum pools. Because SBV is not known to infect vertebrates, we used RT-PCR to screen serum samples of 72 individual mice; 5 samples were positive for SBV. Subsequent sequencing of PCR products showed that the SBV variants present in mice serum samples share 99%-100% nt identity with the SBV variants identified in Ixodes scapularis black-legged ticks (24,40).

FLSV
We found 370 sequence reads that showed the highest sequence similarity with the NS5 proteins of JMTV-like viruses. Similarly, we found 42 sequence reads that showed the highest sequence similarity with the NS3 proteins of JMTV-like viruses (9,11,12,15). These FLSV-US sequences were genetically equidistant from the 3 prototypic virus members of the JMTV-like virus group, namely JMTV, Mogiana tick virus, and Alongshan virus (15,16,24). Considering the potential consequentiality of a divergent JMTV-like virus infection in a widely distributed mammalian species in North America, we developed a broadly reactive PCR assay to confirm our results and to define infection prevalence of FLSV-US in P. leucopus mice.
PCR screening and amplicon sequencing confirmed the presence of FLSV-US nucleic acids in serum samples from 8 of the 72 mice in this pool, indicating an infection prevalence of 11%. Comparative sequence analysis determined ≈3.8% nt divergence in the NS5 region among FLSV-US variants infecting these mice (data not shown); all of these mutations were synonymous. Next, we acquired the near-complete coding region for a FLSV-US NS5 polymerase segment and used it for phylogenetic analysis ( Figure  2). We determined that FLSV-US is more genetically diverse than the previously identified JMTV variants and Alongshan virus and shared <70% aa identity with these viruses (Table). We used a poly-T oligonucleotide primed cDNA synthesis to acquire the 3′ end of FLSV-US NS3 coding segment. Sequencing of the amplicon revealed the presence of a 388-nt 3′ untranslated region preceded by 96-aa NS3 protein coding region that showed highest sequence similarity with the carboxy terminal of JMTV NS3 protein. These results confirm that the FLSV-US genome is segmented like other known JMTV-like viruses and that the FLSV-US NS3 protein coding segment is polyadenylated.

Discussion
Synanthropic small mammals serve as reservoirs for many zoonotic infections (41)(42)(43)(44)(45)(46). Mice of the genus Peromyscus are highly adaptable and thrive in humanmodified landscapes. In particular, P. leucopus and P. maniculatus are closely related and very abundant in North America, with P. leucopus most common in the eastern two thirds of the United States, plus Canada and northern Mexico, and P. maniculatus present in the central and western United States. These rodents have been recognized as reservoirs of highly pathogenic hantaviruses, but the diversity of viruses infecting them has remained largely unknown. A recent study (18) showed the presence of highly diverse hepaciviruses and pegiviruses in these rodents, and a study published in 2011 by Phan et al. identified several other viruses in fecal samples of 20 P. leucopus mice (47). Our study expands the host range of the recently identified tick virus SBV and identifies a highly diverse virus, FLSV-US, whose genetic relatives were recently shown to be pathogenic in humans (15,16).
Our results show high genetic diversity among viruses infecting these rodents. We not only confirmed infections of hantaviruses, hepaciviruses, and pegiviruses but also found viruses representing homologs of almost all viruses commonly present in human serum samples: anelloviruses, parvoviruses, polyomaviruses, papillomaviruses, and hepatitis E virus. We also found several viruses not commonly present in human or animal serum samples, such as coronaviruses, paramyxoviruses, astroviruses, enteroviruses, and rotaviruses. It is plausible that some mice, when captured, had transient viremia of these otherwise respiratory or gastroenteric infections. It would also be worthwhile to further investigate other samples including feces, skin, urine, saliva, and tissue or organs to identify additional viral diversity. This may also aid in determining the tissue tropism of these viruses and provide clues to potential routes of transmission.
Recently, several novel viruses have been identified in ticks from the United States (24). However, the host range and public health relevance of most of these new tick viruses remains unknown (24). We found 1 of these newly identified and highly prevalent tick viruses, SBV, in P. leucopus serum samples.
To the best of our knowledge, SBV has not been detected in a vertebrate host; thus, our study indicates that this new tick virus can infect mammals and may have a wider host range than was previously known. Comparative sequence analyses indicate that the SBV variants detected in P. leucopus serum samples share 99%-100% nt identity with the SBV variants identified in ticks, indicating their common origin.
Several recent studies to characterize the viromes of ticks and mosquitoes in North America showed an absence of JMTV-like viruses (24,40,48). The presence of FLSV infections in a rodent species that is also a common host of ticks and mosquitoes is therefore intriguing and raises questions about the source of FLSV-infection in these mice. It is plausible that FLSV-US transmits through an arthropod vector other than ticks or mosquitoes because some distantly related viruses were found in pools of insects and arachnids (10).
In conclusion, we detected FLSV infections in a widespread mammalian species in North America, which is important because distant genetic relatives of FLSV-US have been shown to be transmitted by ticks and mosquitoes (9,13) and readily able to infect humans (15,16). Until recently, these unusual flavi-like viruses had only been found in ticks, mosquitoes, and animals from China, Kosovo, and Brazil (11,14,16). However, in 2019, Wang et al. reported JMTV viremia in 86 of 374 humans with febrile illness, headache, and history of tick exposure (16). In addition, JMTV was shown to replicate in several cell lines of human and animal origin (15,16). Taken together, these studies indicate that FLSV-US and related viruses have the potential to infect a wide range of mammals, including humans. Finally, FLSV-US is genetically distinct from all known viruses; therefore, its sequence data will help in the identification of FLSV-US and its related variants infecting animals and humans in North America. Dr. Vandegrift is a research associate professor in the Center for Infectious Disease Dynamics and Department of Biology at Penn State University. His primary research interests include disease ecology, zoonotic parasites, and emerging infectious diseases. Dr. Kumar is a postdoctoral fellow at the Center of Vaccine and Immunity, Nationwide Children's Hospital, Columbus, Ohio. His primary research interests include emerging virus discovery using next-generation sequencing and serologic approaches.