Whole-Genome Analysis of Streptococcus pneumoniae Serotype 4 Causing Outbreak of Invasive Pneumococcal Disease, Alberta, Canada

After the introduction of pneumococcal conjugate vaccines for children, invasive pneumococcal disease caused by Streptococcus pneumoniae serotype 4 declined in all ages in Alberta, Canada, but it has reemerged and spread in adults in Calgary, primarily among persons who are experiencing homelessness or who use illicit drugs. We conducted clinical and molecular analyses to examine the cases and isolates. Whole-genome sequencing analysis indicated relatively high genetic variability of serotype 4 isolates. Phylogenetic analysis identified 1 emergent sequence type (ST) 244 lineage primarily associated within Alberta and nationally distributed clades ST205 and ST695. Isolates from 6 subclades of the ST244 lineage clustered regionally, temporally, and by homeless status. In multivariable logistic regression, factors associated with serotype 4 invasive pneumococcal disease were being male, being <65 years of age, experiencing homelessness, having a diagnosis of pneumonia or empyema, or using illicit drugs.

disease has been nearly eliminated among children and reduced indirectly among adults through herd effect (1)(2)(3)(4). PCV7, administered as a 3-dose primary series plus a booster (3+1 dosing schedule) was introduced in Alberta, Canada, in 2002, followed in 2010 by PCV13 (2+1 dosing schedule); both vaccines include serotype 4. In Alberta Province and throughout Canada, invasive pneumococcal disease (IPD) has continued to decline in children <5 years of age since 2010, after PCV13 vaccine introduction, but among older age groups, IPD incidence has remained steady (5,6). No pediatric cases of IPD caused by S. pneumoniae serotype 4 have been diagnosed in Calgary, Alberta, Canada, since 2007 (3), although recent data from Calgary showed low levels of serotype 4 carriage in children identifi ed by using PCR but not by using conventional culture (7).
In 2011, IPD caused by S. pneumoniae serotype 4 began to increase in adults in the province of Alberta, particularly among persons who were homeless. A previous outbreak in Alberta in 2005-2007 included serotypes 5 and 8, primarily in persons experiencing homelessness and those using illicit drugs (8). Homelessness is overrepresented as a factor in adult IPD cases: 18.8% of adults with IPD are homeless, despite only 0.2% of adults in Calgary being homeless (9). We conducted this study to examine clinical and demographic factors associated with serotype 4 IPD and to conduct molecular characterization and phylogenetic analysis from whole-genome sequencing (WGS) data on the serotype 4 isolates collected during the outbreak. Our goal was to clarify the dynamics of an outbreak of serotype 4 IPD in a postvaccine community setting where serotype 4 had previously been uncommon.

Whole-Genome Analysis of Streptococcus pneumoniae Serotype 4 Causing Outbreak of Invasive Pneumococcal Disease, Alberta, Canada
After the introduction of pneumococcal conjugate vaccines for children, invasive pneumococcal disease caused by Streptococcus pneumoniae serotype 4 declined in all ages in Alberta, Canada, but it has reemerged and spread in adults in Calgary, primarily among persons who are experiencing homelessness or who use illicit drugs. We conducted clinical and molecular analyses to examine the cases and isolates. Whole-genome sequencing analysis indicated relatively high genetic variability of serotype 4 isolates. Phylogenetic analysis identifi ed 1 emergent sequence type (ST) 244 lineage primarily associated within Alberta and nationally distributed clades ST205 and ST695. Isolates from 6 subclades of the ST244 lineage clustered regionally, temporally, and by homeless status. In multivariable logistic regression, factors associated with serotype 4 invasive pneumococcal disease were being male, being <65 years of age, experiencing homelessness, having a diagnosis of pneumonia or empyema, or using illicit drugs.

Population
An inception cohort including all adult case-patients with serotype 4 IPD was identified through populationbased surveillance during 2010-2018 in Calgary (2018 population 1,648,385) and Edmonton, Alberta (2018 population 1,393,380). Epidemic curves were generated for Calgary and Edmonton from the number of cases of serotype 4 IPD reported each year during 2000-2018 ( Figure 1). We performed WGS to analyze isolates from all patients. We included all adults (≥18 years of age) with IPD reported in the Calgary S. pneumoniae Epidemiology Research (CASPER) (4) study during 2010-2018 in the clinical analysis.

Data Collection and Ethics
IPD is a reportable disease to the Ministry of Health in Alberta; therefore, all culture-confirmed cases of serotype 4 IPD in Calgary and Edmonton were identified. All pneumococcal isolates identified by diagnostic microbiology laboratories in Alberta must be submitted to Alberta Precision Laboratories-Public Health for pneumococcal serotyping. Serotyping was performed by quellung reaction (10). Clinical information was obtained from chart reviews in Calgary as part of the CASPER study. Ethics approval was provided for the clinical study by the Conjoint Health Research Ethics Board of the University of Calgary.

Analysis of Clinical Factors
We collected clinical data on all pneumococcal disease cases in Calgary identified through the CASPER study. Clinical data were not available for cases from Edmonton, so we included only cases from Calgary in the clinical analysis. We used tests of proportions to compare serotype 4 IPD with non-serotype 4 IPD in a univariable analysis to determine clinical and demographic factors and outcomes. We used the Student 2-tailed t-test to compare risk by age as a continuous variable. We chose clinical, demographic, and outcome factors a priori on the basis of biologic plausibility and clinical relevance. For underlying health conditions we sorted patients into 3 groups: those having no underlying conditions increasing risk for IPD; those with underlying conditions but immunocompetent; and those with underlying conditions and immunocompromised, according to Public Health Agency of Canada recommendations for immunization (11). For factors with multiple possible responses (e.g., disease manifestation, underlying conditions), which were therefore not possible to collapse into 2 groups, we ran a Fisher χ 2 test to determine p value. However, although p<0.05 indicates a significant difference between >2 groups, it does not provide information on where the difference occurs. We used stepwise multivariable logistic regression to analyze clinical and demographic factors and determine adjusted odds ratios and 95% CIs for factors associated with infection from IPD serotype 4 compared with IPD from all other serotypes. We included age as a dichotomous variable: <65 or ≥65 years of age. We did not include indigenous background, intensive care unit (ICU) admission, death, or hospitalization as variables in the model: indigenous status because it is a difficult factor to determine from chart reviews, which are often missing large amounts of data, but its effect was not significant in univariable analysis; ICU admission, death, and hospitalization because they are outcomes and we were interested in clinical factors associated with serotype 1868 Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 27, No. 7, July 2021 4 IPD. We removed smoking status because of nonsignificance and a large amount of missing data.

Whole Genome Sequencing Analyses
We conducted WGS analyses on S. pneumoniae serotype 4 isolates from Calgary and Edmonton as well as background serotype 4 isolates collected from other provinces in Canada at the National Microbiology Laboratory in Winnipeg, Manitoba, as described elsewhere (19). We prepared DNA samples using Epicenter MasterPure Complete DNA and RNA Extraction Kit (Mandel Scientific, https://www.mandel.ca) and created libraries using Nextera sample preparation kits (Illumina, https://www.illumina.com) with 300 bp (n = 140) and 150 bp (n = 50) paired-end indexed reads generated on the Illumina NextSeq platform. We submitted read data for all S. pneumoniae serotype 4 isolates from Alberta to the National Center for Biotechnology Information Short Read Archive (BioProject accession no. PRJNA693536). We assessed the quality of the reads using FastQC version 0.11.4 (20) and assembled using Shovill (Galaxy version 1.0.4+galaxy0) programs (21).

Populations
We identified 190 IPD serotype 4 cases in adults (96 from Calgary, 94 from Edmonton) during 2010-2018 and used WGS to analyze isolates obtained from those patients. A total of 1,008 adults sought treatment at Calgary hospitals with IPD during 2010-2018; of these, 100 (10%) cases involved serotype 4 IPD. For clinical analysis, we completed chart reviews for 97% of the 1,008 IPD cases. For cases without full chart reviews, we collected basic demographic information from notifiable disease reports and laboratory reports and included the information in the analysis when possible. Of the 30 cases without a full chart review, 57% were because of patient refusal to participate in the study. Twelve percent of chart reviews were missing large amounts of information because smoking status was not consistently reported; 56% of reviews lacked clear indication of indigenous status. We included the 967 patients with full information available in the multivariable analysis.

Epidemic Curves
In Calgary, after PCV7 introduction, serotype 4 had almost disappeared by 2009-2010. The outbreak among adults peaked in 2015-2016 (Figure 1, panel  A). The number of cases of serotype 4 decreased and no cases occurred after July 2018, suggesting resolution of the outbreak. In Edmonton, the decline of serotype 4 after PCV7 introduction was less pronounced and the outbreak had 2 peaks, a smaller one in 2011 and a larger one in 2016-2017 (Figure 1). Serotype 4, which was originally prevalent, declined in the initial period after PCV7 introduction but then increased in 2011 after PCV13 was introduced.

Clinical Analysis
Homelessness, illicit drug use, alcohol abuse, and smoking were overrepresented as risk factors among patients with cases of serotype 4 IPD in data from the univariable analysis (Table 1). Persons with underlying conditions who were immunocompromised were underrepresented among those with serotype 4 IPD ( Table 1). People with serotype 4 IPD were also younger (mean age 47.0 years, 95% CI 44.2-49.8 years) than those with non-serotype 4 IPD (mean age 58.4 years, 95% CI 57.2-59.5 years) during 2010-2018 (t-test p value <0.001). The most common diagnosis for serotype 4 IPD was bacteremic pneumonia (82%). All serotype 4 IPD cases that occurred during 2010-2018 were susceptible to penicillin, ceftriaxone, and erythromycin.
In results from multivariable logistic regression, we found that being male, being <65 years of age, experiencing homelessness, having a diagnosis of pneumonia or empyema, or using illicit drugs were associated with serotype 4 IPD ( Table 2). Alcohol abuse was not significantly associated with serotype 4 IPD in the multivariable logistic regression, indicating that the association seen in the univariable analysis was because of confounding by another factor (Table 2).

WGS
We conducted WGS analyses on 96 S. pneumoniae serotype 4 isolates from Calgary, 94 from Edmonton, and 37 background serotype 4 isolates from the National Microbiology Laboratory, collected from other provinces in Canada (19). Illumina MiSeq sequencing yielded an average 817,775 reads/genome, and average genome coverage was 91X. De novo assembly resulted in an average contig length of 45,875 nt and an N50 length of 85,135 nt.
The 190 S. pneumoniae serotype 4 genomes from Alberta clustered into 3 major phylogenetic clades ( Figure 2); each clade was associated with an MLST. The largest number 93.7% (n = 159) of isolates were located in clade A and were MLST type ST244. Isolates in clade A were geographically relatively evenly distributed between Calgary (n = 69) and Edmonton (n = 90) and temporally after 2010 (n = 4); ≈19 isolates per year were found during 2011-2019. Clade B  (26). Clade C was associated with ST205 (n = 23) and ST15531 (n = 2), a single-locus variant of ST205. Twenty-three of the 24 isolates in clade C were collected from the Calgary region. Although another ST205 isolate from Edmonton and the TIGR4 reference strain (National Center for Biotechnology Information accession no. NC_003028.3) were proximal to clade C in the phylogenetic tree, ClusterPicker excluded them from the clade based on the clustering thresholds used. A further 2 isolates from Calgary and a third from Edmonton were distant phylogenetic outliers of ST2213, ST7776, and ST11662. Most national background isolates collected from other provinces (n = 36) clustered within the ST205 clade C lineage (n = 23), but the ST244 clade A lineage was predominantly associated with Alberta, with fewer national isolates present (n = 10) (Appendix Figure 1, https://wwwnc. cdc.gov/EID/article/27/7/20-4403-App1.pdf). Further phylogenetic analysis of the ST244 isolates from Alberta identified 6 major clades with isolates clustered by city (Figure 3). Isolates from Edmonton mainly comprised clades A1 (19 of 21) and A2 (8 of 8), but isolates from the Calgary area comprised clades A3 (9 of 10), A4 (21 of 25), A5 (19 of 19), and A6 (5 of 5). Clade A1 emerged in Edmonton in 2011, and clade A2 followed 4 years later in 2015; in Calgary, clade A6 was first seen in 2011, followed by clades A4 in 2013, and A3 and A5 in 2014, with A5 expanding in 2015 (Appendix Figure 2).
Additional information about homelessness, death, ICU admittance, and risk factors was available for case-patients from Calgary. Clade A1 had the highest proportion of isolates associated with homelessness (7 of 9 isolates), whereas clade A4 had the lowest (7 of 21 isolates; Figure 3). There was also a relatively high number of isolates (3 of 5) in clade A6 and from the miscellaneous nonclustered strains from Calgary (10 of 14) associated with homelessness. Although only about half of the isolates in clade A5 were associated with homelessness (10 of 19), a subgroup of 7 highly related isolates associated with homelessness were identical to each other (no SNV difference). Among the ST205 isolates, 6 of the 23 Calgary isolates were associated with homelessness. We observed no clustering of isolates associated with the other background information (death, risk factors, or ICU admittance).

Discussion
Outbreaks of IPD have been described most often in vulnerable populations and groups living in crowded conditions, but only a few serotypes have been described in association with outbreaks (27). Outbreaks of disease can be characterized as a temporal increase of disease from epidemiologically linked cases. The temporal relation can vary depending on the causative organism. Pneumococcal outbreaks may occur over a period of several years. In this study, phylogenetic analyses were used to support epidemiologic information linking a susceptible population with a particular serotype. In Calgary we have observed 2 large outbreaks of IPD that particularly affected homeless persons. The one during 2005-2007 was largely caused by serotype 5, although an increase in serotype 8 was also observed (8). In the more recent outbreak during 2010-2018, the incidence of serotype 4 increased. Phylogenetic analysis indicated relatively high genetic variability among the serotype 4 isolates collected over this period. Previously, we conducted WGS on a small sample of serotype 5 cases associated with the 2005-2007 outbreak; results indicated that all isolates were from the same genetic clone (8). In our analysis of the predominant serotype 4 ST244 clone in Alberta, we observed higher diversity with some clustering regionally in Calgary and Edmonton, as well as some temporal clustering and clustering in homeless persons during 2014-2016. There were also some genetically diverse isolates of S. pneumoniae serotype 4 broadly disseminated throughout the community, including among persons who were not homeless. No clustering was observed by age group, gender, site of bacterial isolation, or disease severity, which may indicate that the rise in disease was not because of the emergence of a single, more transmissible clone of serotype 4. Because of the longer temporal period over which pneumococcal outbreaks occur, some degree of genetic drift is expected, with strains being disseminated among the susceptible population and sublineages emerging more acutely in pockets that facilitate transmission, forming diversified subclades within the overall outbreak. The relative diversity among subclades can be thought of as smaller outbreaks of more clonal strains within the overarching dissemination of the original strain.
From a genetic perspective, the phylogeny representing the nationwide breadth of serotype 4 strains had a maximum 1,472 SNVs and an average 192 SNVs between strains (Appendix Figure 1), in contrast with the larger overall outbreak lineage (clade A) with a maximum 148 SNVs and average 22 SNVs. Further clonality can be seen among the subclades with each having ≈5 SNVs difference within and ≈10 SNVs between subclades (Appendix Table). A recent report of an outbreak of serotype 5 IPD in British Columbia, Canada stated that its strains differed by only ≈10 SNVs over a 3.5-year period (28).
The rise of serotype 4 IPD cases occurred during a period of widespread PCV13 use in children, raising questions about the reservoirs of this strain. Before that period, during the period of PCV7 use in children, serotype 4 was largely controlled at all ages, reflecting direct immunity in vaccine recipients and indirect immunity in unvaccinated adults. The reemergence of serotype 4 cases, primarily among adults, suggests reduced herd immunity. When Alberta switched from PCV7 to PCV13 in 2010, there was also a switch from a 4-dose schedule of PCV7 to a 3-dose PCV13 schedule for children. This change raises a question about whether the reduced-dose schedule in children, although still providing direct protection, might be less effective in reducing nasopharyngeal carriage, leading to asymptomatic transmission and reduced herd immunity. A study of pneumococcal carriage among children in Calgary has previously shown the near elimination of serotype 4 carriage after the introduction of PCV7, supporting this possible explanation (29). More recent studies in children in 2016 and 2018, well after the introduction of PCV13, found that serotype 4 IPD was not identified in any sample tested by conventional culture but was identified in 3.5% of children by using PCR (7). It is also possible that PCV13 may not reduce nasopharyngeal carriage and asymptomatic carriage of all vaccine serotypes as effectively as PCV7 did, regardless of the change in number of doses. In support of this possibility, PCV13 has known limited effectiveness to reduce serotype 3 IPD and possibly nasopharyngeal carriage, as described in a 2019 review (30). Serotype 4 IPD was associated with being male or a current user of illicit drugs or experiencing homelessness. Serotype 4 IPD case-patients had lower odds of having an immunocompromising illness, which may be partially associated with being younger, although the association remained when we adjusted for age. We previously reported that IPD is significantly overrepresented in homeless persons compared with the general population, regardless of season (9). Although the 23-valent pneumococcal polysaccharide vaccine is recommended for homeless persons in Canada, among those for whom we were able to obtain records, vaccination rates were very low (9,11).
The main limitation of this study is that we had complete clinical data only from Calgary. In addition, the total population of the surveillance area was 3,041,765 and for a relatively rare disease like IPD, local random variation in prevalent serotypes may limit the generalizability of our results.
Similar to serotype 5 during the 2005-2007 outbreak, serotype 4 also migrated across a large geographic area in western Canada and was seen in Victoria, British Columbia (31,32). Pneumococcal outbreaks have been reported in overcrowded jails, homeless shelters, and care homes (8,31,(33)(34)(35)(36). One study found recurrent infections were 5-fold higher among persons who were homeless than those who were not (37). Another study found most outbreaks of pneumococcal disease occurred in crowded settings (38). It is clear that homelessness and drug use are risk factors for illness and should be considered indicators for vaccination. Although we acknowledge the challenge of delivering vaccines to homeless persons, on the basis of these results, we recommended a public health initiative, currently under consideration by public health officials in Alberta, to target the homeless population of Calgary for publicly funded vaccination with both PCV13 and 23-valent pneumococcal polysaccharide vaccine.