Foodborne Origin and Local and Global Spread of Staphylococcus saprophyticus Causing Human Urinary Tract Infections

Staphylococcus saprophyticus is a primary cause of community-acquired urinary tract infections (UTIs) in young women. S. saprophyticus colonizes humans and animals but basic features of its molecular epidemiology are undetermined. We conducted a phylogenomic analysis of 321 S. saprophyticus isolates collected from human UTIs worldwide during 1997–2017 and 232 isolates from human UTIs and the pig-processing chain in a confined region during 2016–2017. We found epidemiologic and genomic evidence that the meat-production chain is a major source of S. saprophyticus causing human UTIs; human microbiota is another possible origin. Pathogenic S. saprophyticus belonged to 2 lineages with distinctive genetic features that are globally and locally disseminated. Pangenome-wide approaches identified a strong association between pathogenicity and antimicrobial resistance, phages, platelet binding proteins, and an increased recombination rate. Our study provides insight into the origin, transmission, and population structure of pathogenic S. saprophyticus and identifies putative new virulence factors.

https://www.bd.com). The OD600nm of the liquid culture was adjusted to an initial OD of 0.5 MacFarland with buffers for specific pHs and TSB containing varying concentrations of hormones and grown with aeration (180 rpm) at 37°C for 18 h. Assays were performed in triplicate and each experiment was repeated 3 times.

Estimation of Evolutionary Rates
To estimate the evolutionary rates in S. saprophyticus population, as a first approach, we explored the degree and pattern of temporal signal and determined whether sufficient temporal signals were available in the S. saprophyticus phylogeny. We performed a regression of the divergence of each tip from the root against the date of sampling, a root-to-tip plot, of the global collection and separately for the lineages using TempEst v1.5.3 (4). We used the phylogenetic tree without recombination and the date of isolation of the isolates as inputs.

Average Nucleotide Identity Analysis
We calculated average nucleotide identity (ANI) for representative strains of S. saprophyticus 40 G lineages and 20 S lineages by using a standalone Python program, pyani version 0.2.9 (https://github.com/widdowquinn/pyani) and the ANIb option, which compares genomes using BLAST program (https://blast.ncbi.nlm.nih.gov). The closed genome of KS40 was used as a reference for lineage G and closed genome of KS160 was used for lineage S.

Intrasample Diversity
We assessed the genetic diversity between isolates recovered from the same sample in the meat processing chain. We determined whether intrasample diversity existed by comparing the SNP differences between these isolates.

Data Availability
All raw sequence data are available in the SRA (https://www.ncbi.nlm.nih.gov/sra) under the study accession no. PRJNA604222. We also provide individual accession numbers for raw sequence data (Appendix 1 Table 1) and the SNP matrices and list of genes in the pangenomes (Appendix 1 Tables 2-6).

Pangenome Analysis of S. saprophyticus Revealed an Open Pangenome
We annotated the 338 S. saprophyticus genomes by using Prokka (5) and constructed the pangenome by using Roary (6) with 85% blastp identity. A total of 10,222 genes were found, 48% (n = 4,925) of which were genes with unknown functions. The genes constituting the core of all isolates consisted of 1,871 genes. Also, we noted 118 soft core genes in 95%-99% of the isolates, 856 shell genes in 15%-94%, and we found 7,307 genes that constituted cloud genes in <15% of S. saprophyticus population. On average, 75% of S. saprophyticus genome is constituted by core genes and 25% of accessory genes. The plot of the total number of genes against the number of genomes indicate an open pangenome in which each genome sequence added several new genes. This finding implies that newly sequenced genomes will identify new genes and the pangenome size of this species will continue to increase (Appendix 2 Figure 2).

Lineages and Clinical Origins
We explored the pangenome gene presence to understand the difference in the genetic content of isolates from each of the genetic lineages defined by core SNPs. We used Scoary pipeline (7) and Bonferroni p<0.05 to identify genes that were exclusive or enriched in the S. saprophyticus genetic lineages. We categorized the hits into biologic function groups based on the annotations predicted by Prokka. For genes associated with different clinical origins (infection and colonization/contamination), we used Benjamini Hochberg and pairwise p<0.05 (Appendix 2 Tables 1-5). Appendix 2 Figure 2. Pangenome of Staphylococcus saprophyticus inferred from 338 isolates recovered from human infections and colonization. Among analyzed isolates, 321 were recovered from UTIs, 12 from blood, 4 from colonization, and 1 from reference strain ATCC 15305 (https://www.atcc.org; GenBank accession no. AP008934.1). A) Distribution of genes in the pangenome generated using Roary (6). We found a total of 10,222 genes. The core genes shared by all isolates were constituted of 1,871 genes. We also found 188 soft core genes in 95%-99% of isolates, and 856 shell genes in 15%-94% of isolates. In addition, we noted 7,307 cloud genes <15% of S. saprophyticus population. B) Gene accumulation plot for S. saprophyticus pangenome as a function of genomes sequenced indicating that S. saprophyticus has an open pangenome.