Added Value of Next-Generation Sequencing for Multilocus Sequence Typing Analysis of a Pneumocystis jirovecii Pneumonia Outbreak

Pneumocystis jirovecii is a major threat for immunocompromised patients, and clusters of pneumocystis pneumonia (PCP) have been increasingly described in transplant units during the past decade. Exploring an outbreak transmission network requires complementary spatiotemporal and strain-typing approaches. We analyzed a PCP outbreak and demonstrated the added value of next-generation sequencing (NGS) for the multilocus sequence typing (MLST) study of P. jirovecii strains. Thirty-two PCP patients were included. Among the 12 solid organ transplant patients, 5 shared a major and unique genotype that was also found as a minor strain in a sixth patient. A transmission map analysis strengthened the suspicion of nosocomial acquisition of this strain for the 6 patients. NGS-MLST enables accurate determination of subpopulation, which allowed excluding other patients from the transmission network. NGS-MLST genotyping approach was essential to deciphering this outbreak. This innovative approach brings new insights for future epidemiologic studies on this uncultivable opportunistic fungus.

Pneumocystis jirovecii is a major threat for immunocompromised patients, and clusters of pneumocystis pneumonia (PCP) have been increasingly described in transplant units during the past decade. Exploring an outbreak transmission network requires complementary spatiotemporal and straintyping approaches. We analyzed a PCP outbreak and demonstrated the added value of next-generation sequencing (NGS) for the multilocus sequence typing (MLST) study of P. jirovecii strains. Thirty-two PCP patients were included. Among the 12 solid organ transplant patients, 5 shared a major and unique genotype that was also found as a minor strain in a sixth patient. A transmission map analysis strengthened the suspicion of nosocomial acquisition of this strain for the 6 patients. NGS-MLST enables accurate determination of subpopulation, which allowed excluding other patients from the transmission network. NGS-MLST genotyping approach was essential to deciphering this outbreak. This innovative approach brings new insights for future epidemiologic studies on this uncultivable opportunistic fungus. P neumocystis jirovecii pneumonia (PCP) is a life-threatening opportunistic disease that poses a particular threat in healthcare units that manage patients with oncohematologic conditions, patients having received solid organ transplants (SOTs), and HIV-infected patients. The incidence of PCP among SOT recipients has increased markedly since the early 2000s, concomitant with the development of more intensive immunosuppressive therapy and increasing organ transplant procedures (1). During 2001-2010, the incidence of PCP among HIV-positive/AIDS patients in France dropped from 2.4 to 0.7 cases/10 5 /year (-14.2%/ year), whereas incidence increased from 0.13 to 0.35 cases/10 5 /year (+13.3%/year) among patients without HIV infection (2). For SOT patients in France, the incidence of PCP increased by 13% during 2004-2010 (2). Moreover, multiple outbreak clusters in this specific population have been reported (3)(4)(5). Patients receiving renal transplants are particularly at risk; most reported PCP clusters have occurred in kidney transplant units (6). These outbreaks support the hypothesis of a de novo person-to-person transmission of P. jirovecii and its potential for nosocomial spread (7).
Exploring an outbreak transmission network requires spatiotemporal and strain-typing approaches. Various genotyping approaches to the study of P. jirovecii epidemiology have been described (8)(9)(10)(11)(12). Genotyping based on multilocus sequence typing (MLST) is standard for strain characterization, especially for organisms that are not easily cultivable, because it provides excellent discriminatory power and reproducibility (3,4,13,14). Multiple P. jirovecii strains isolated from a unique respiratory sample have frequently been reported (6)(7)(8)10). The proportions of these mixed strains in samples varied substantially according to the genotyping method (6)(7)(8). MLST has been used previously to study PCP outbreaks, but first-generation sequencing technology (i.e., Sanger sequencing) presents limitations for subpopulations characterization. In the use of Sanger sequencing technology, minor variant detection is generally restricted to strains with a relative abundance of DNA (>20% of the total amount) and requires an amplicon cloning step and the sequencing of >5 clones (6,(15)(16)(17).
Next-generation sequencing (NGS) technologies yield improved performance. Because of clonal amplification before sequence acquisition, these technologies enable indepth sequencing and the detection of major, minor, and ultra-minor populations within the same sample (18,19).
During 2009-2014, the incidence of PCP in all SOT patients in our institution was <1 case/year. In 2014, six cases of PCP were diagnosed among SOT patients over a 3-month period (May-July) and 6 more through August 2015. These grouped cases involved recipients of various organs, 5 heart recipients, 4 kidney recipients, 1 lung recipient, 1 liver recipient, and 1 recipient of both a heart and a kidney. This atypical distribution and the substantial increase of incidence triggered the exploration of this outbreak. In this study, we evaluated the added value of an NGS-MLST strategy in the specific context of a PCP cluster among SOT patients, taking advantage of the specific characteristics of NGS in terms of subpopulation detection.

Patient Selection and Samples
The Grenoble Alpes University Hospital has a total of 2,200 beds and performs 150 SOTs each year. In 2014, we followed a total of 1,899 SOT patients in our institution (1,380 kidney, 339 liver, 90 heart, and 90 lung recipients). We included 32 PCP patients in the study (Table 1): 12 SOT recipients (mean age 61.6 years, range 34-75 years) and 20 control patients (mean age 57 years, range 7 months-82 years). Control patients were those with sporadic PCP cases diagnosed in the previous 4 years who had diverse underlying conditions and no obvious epidemiologic link. PCP biologic diagnosis was confirmed by quantitative PCR in respiratory samples (bronchoalveolar lavage fluids, bronchoaspiration, or sputum) as described previously (20).

Clinical Epidemiologic Survey
According to the incubation periods described in the literature (median 53 days, range 1-88 days) (21,22), we collected data concerning patients' interactions within the institution from 6 months preceding the date of symptom onset in the first patient until the date of diagnosis of PCP in the last patient. Because colonized patients are potential infectious sources of P. jirovecii (23,24), we considered patients to be possibly contagious throughout the whole incubation period, even in the absence of symptoms and for 3 days after initiating treatment with trimethoprim/sulfamethoxazole. We defined transmission as probable when all of the following criteria were present: 1) the 2 patients were present in the same floor and in the same corridor on the same day, 2) the source patient was in the contagious phase, and 3) the duration from the probable date of transmission to the appearance of the first symptoms was compatible with the incubation period for PCP. If 1 of these criteria was missing, transmission was defined as possible. We defined the incubation period for PCP as the time elapsed from the strain transmission to the onset of symptoms.

MLST Strategy
We used an MLST strategy to genotype P. jirovecii strains, as recently recommended (25). We chose a combination of 3 loci belonging to the superoxide dismutase (SOD), the mitochondrial large subunit of ribosomal RNA (mtLSU), and the cytochrome b (CYTB) genes. This scheme displays a high level of strain discrimination and is adapted for accurate P. jirovecii strain typing (14). We designed new primers for each locus to obtain ≈750 bp amplicons depending on the chosen NGS procedure, which was performed by using GS Junior + (Roche Diagnostics, Meylan, France). Because we used extended sequences to characterize our strains, we could not refer to any previously used sequence type labeling. Thus, we arbitrarily named haplotypes with a capital letter (A), a number (1), and a lowercase letter (a) corresponding to mtLSU, CYTB, and SOD loci. We then labeled final genotypes by the association of the 3 different haplotypes (e.g., A1a). Primers, PCR conditions, and haplotype sequences are summarized in online Technical Appendix Tables 1, 2 (https://wwwnc.cdc. gov/EID/article/23/8/16-1295-Techapp1.pdf).

Amplicon Library Preparation and Emulsion PCR
We extracted total DNA by using the QIAmp DNA Mini Kit (QIAGEN, Hilden, Germany), according to the manufacturer's instructions and as previously described (20), and stored extracts at -20°C until library preparation. We prepared the amplicon library by using the universal tailed amplicon sequencing design. We first amplified each region of interest by using specific oligonucleotides coupled to M13 universal primers (PCR1) and then targeted these universal sequences in a second PCR (PCR2) to add a multiplex identifier and the sequencing primers as described in the Guidelines for Amplicon Experimental Design-454 Sequencing System (Roche Diagnostics). We purified PCR products with magnetic beads from an AMPureXP Kit (Beckmann Coulter, Beverly, MA, USA). After qualitative and quantitative analysis, we diluted purified PCR2 products to 1 × 10 9 molecules/mL and pooled an equal volume of each dilution to generate the amplicon library. We performed emulsion PCR as recommended in the GS Junior + library preparation procedure.

NGS and Limit of Detection
We performed NGS to sequence 700-800 bp fragments, according to the manufacturer's instructions, using a forward and reverse sequencing strategy for better coverage. Ninety-six PCR products were distributed in 2 comparably filled runs. We added a control sample to assess the limit of detection of subpopulations under our experimental conditions. This control consisted of 3 purified mtLSU amplicons belonging to 3 unique haplotypes, based on the results of the first run at 3 different ratios: 98.9%, 1.0%, and 0.1%.

Bioinformatic Analysis
We analyzed NGS results by using Amplicon Variant Analyzer and GS Reference Mapper software (Roche Diagnostics) and compared sequences to their respective reference sequences; GenBank accession nos. were NC_020331.1 (mtLSU), AF146753.1 (SOD), and AF320344.1 (CYTB). Each polymorphism position was verified and visually validated. We applied a 1% threshold for minor variant consideration based on the result of our artificial mix sample. We calculated haplotype frequency by using Amplicon Variant Analyzer software (dividing number of reads corresponding to the haplotype by the total number of reads for 1 locus).

Statistical Analysis and Hunter Index
We compared co-infection proportions between cluster and control populations by using the Fisher exact test, calculated a p value, and defined statistical significance as p<0.05. The Hunter index (D) was calculated to evaluate the discriminatory power of the 3 loci and their association, as previously described (26). Only single strains or strains with an easy extrapolation of the major genotype (variants in only 1 locus or presence of ultra-major variants in all loci) were taken into account for the D calculation. Related samples sharing the outbreak genotype were also excluded, except 1.

NGS Performance
The mean sequencing depths for NGS of P. jirovecii strains were 810× (5×-1,998×, median 739×) and 265× (8×-1,268×, median 233×) for the first and second runs, respectively. Sequence quality was assessed using FastQC software (http://www.bioinformatics.babraham.ac.uk). For the first run, the median Phred score (or Qscore) per base, up to a 750 bp read length, was >24 (corresponding to a probability of error <0.4%); for the second run, the median Phred score was >20 until 400 bp read length (corresponding to a probability of error <1%).

Limit of Detection
In the artificial control sample consisting of 3 mtLSU haplotypes at defined concentrations (C, 98.9%; B, 1%; and F, 0.1%), the C and B haplotypes were correctly detected with NGS at 99% (C) and 1% (B). The ultra-minor F genotype was not detected, a finding consistent with the limited analysis depth obtained for this sequence (220×).

Nucleotide Polymorphisms, Haplotypes, and Genotypes Determination
We genotyped 12 P. jirovecii samples from the SOT patient cluster and 20 samples from unrelated control patients by using the 3 loci NGS-MLST strategy. Analysis of extended 732 bp mtLSU sequences revealed 4 new polymorphisms located both upstream and downstream of the amplicons classically used for genotyping (online Technical Appendix Figure). These polymorphisms correspond to 3 singlenucleotide polymorphisms at positions 13002, 13505, and 13543 in the mitochondrial genome (GenBank accession no. NC_020331.1) and 1 multinucleotide polymorphism between 13554 and 13560. By contrast, extended CYTB and SOD amplicons did not show new polymorphisms.
Among the 32 samples, we identified 22 different haplotypes for mtLSU, 14 for CYTB, and 4 for SOD, with major and minor variants being considered ( Table 2). We identified C2a as the major genotype in 5 SOT patients from the cluster: 1 double (heart and kidney) recipient (T2), 1 heart recipient (T4), and 3 renal transplant recipients (T5, T8, and T10). We also detected this C2a genotype as a minor variant (2%) in 1 other SOT patient belonging to the 2014-2015 outbreak cluster (T7). With major and minor genotypes being considered, 6/12 SOT patients were carrying the C2a genotype. We did not detect this unusual genotype in any control patient, either as a major or minor strain (Table 2).

Epidemiologic Investigation
We noted the transmission networks between the 6 SOT patients infected with the C2a strain, the time to diagnosis, and the incubation for PCP ( Table 3). The transmission map (Figure 1) pointed out potential nosocomial transmission of the C2a strain involving both heart and kidney transplant units, with the index patient being patient T2, who was a long-standing heart recipient who received a kidney transplant 2 months before PCP diagnosis. Transmission was probable in the following patients: between T2 and T5, within the nephrology department; between T2 and T4, within the heart surgery outpatient clinic; between T5 and T8, in the nephrology department; and between T8 and T10, in the transplant outpatient clinic. Transmission between T5 and T7 was considered only possible because the patients were on the same floor the same day but in perpendicular corridors, and the incubation time for patient T5 was relatively short (4 days

Variants Analysis and Mixed Infections
Among the 32 respiratory samples, 11 (Figure 2). The co-infection proportions were comparable between the 2 sequencing runs (61% and 65%).

Discriminatory Index of the Extended Amplicons
We calculated the Hunter index (D) for each locus of 19 strains (online Technical Appendix Table 3), excluding the clonal C2a strains and uninterpretable genotypes (26). The discriminatory index of the mtLSU locus D was 0.754 when considering only known polymorphisms and 0.883 when considering both known and newly described polymorphisms. Similarly, in terms of CYTB and mtLSU loci association, D increased from 0.918 with only known polymorphisms to 0.953 when adding the newly described polymorphisms. When the 3 loci were considered, D further increased from 0.971 to 0.982.

Discussion
An outbreak of 12 diagnosed PCP cases occurred among SOT recipients in our institute during 2014-2015. This cluster consisted of patients carrying different transplanted organs who were followed up in different organ specialty units.
We used an exhaustive NGS-MLST approach to accurately characterize the major and minor strains implicated. To our knowledge, PCP outbreaks among SOT patients have been commonly described in renal transplant patients, reported more rarely in hepatic transplant recipients, and never reported in cardiac transplant patients (although sporadic cases exist in this population). Also, most reported grouped cases of PCP in SOT recipients described cases occurring in patients grafted with the same organ; grouped cases in patients transplanted with different organs rarely have been described (4,5). In our study, even though the SOT cluster patients were followed up by different units, they belonged to the same epidemic event according to genotyping results. Five SOT cluster patients shared a common major genotype, and the transmission map ( Figure 1) pointed out potential nosocomial transmission of this strain, which might have been present in both the cardiology and nephrology transplant units. This genotype was not detected in the control group, confirming that this P. jirovecii strain is not commonly present within our institute. The epidemiologic survey, guided by the genotyping results, identified potential dates and places of transmission between the patients infected with the same P. jirovecii strain. Five probable and 1 possible transmission events between SOT patients were revealed. Of note, 4/5 suspected transmission events occurred in outpatient areas. Waiting rooms and day hospital corridors have been thought to promote person-to-person contact and increase the risk for P. jirovecii transmission (7). The wearing of masks in waiting rooms frequently visited by immunocompromised patients was recently suggested for patients with respiratory symptoms but should also be recommended to any patient, given that asymptomatic carrier state or patients in incubation state might also be at risk for transmission (7,27). Also, patients could have met in certain common areas of the hospital (e.g., cafeteria and corridors), encounters that are usually not in the medical records. Recent studies confirmed the high probability of P. jirovecii airborne transmission or transmission between patients and healthcare workers (23,28). In the affected units, it would have been of interest to screen healthcare workers or air samples to identify asymptomatic carriers of the C2a P. jirovecii strain or its presence in the environment.
In our study, we used a 1% threshold for minor variant detection according to our artificial mix control. By using this threshold, we detected 21/32 (66%) cases with mixed-strain infections and 17/32 (53%) cases with >2 strains. Previous genotyping studies described P. jirovecii multiple infections with diverse proportions of different strains (from 20% to 75%) (29)(30)(31). Hauser et al. (29) and Gits-Muselli et al. (12) described mixed populations in 70% of cases with the use of PCR-single-strand-conformation polymorphism or microsatellite analysis. Our proportion of mixed strains was even higher (85%) when we considered only the control patients, which might more accurately reflect the reality of P. jirovecii strain diversity in a nonoutbreak environment. Ultra-deep sequencing of P. jirovecii strains outside an outbreak context has indicated a 92% proportion of coinfections (32). The proportion we observed with infections involving >2 strains is higher than previously described (30) because of the more accurate variant detection <10% provided by NGS. We noted a difference in the proportion of co-infections between the cluster group (42%) and control group (85%) (p<0.01) despite identical mean sequencing depth. We can hypothesize that the C2a strain has a particular intrinsic virulence or that this cluster has some specific characteristic, but further study would be required.
By using NGS, we found 1 confirmed C2a genotype as the minor strain in patients belonging to the cluster. The ratio among the different P. jirovecii strains can vary over time in a given patient (33). Thus, a previous minor Rectangles indicate hospitalization periods of >1 day; diamonds indicate 1-day presence in the institution (e.g., emergency services, imaging, laboratory, outpatient clinics). Places involved in the transmission network are shown in color, whereas units not involved are shown in gray. Thick arrow refers to probable nosocomial transmission of P. jirovecii between 2 patients who were present in the same place (same floor and same corridor) on the same day. Thin arrow represents possible nosocomial transmission of P. jirovecii between T5 and T7 (patients were in perpendicular corridors). Subscript letters on patient labels indicate type of transplant (H, heart; K, kidney). D, date of diagnosis and treatment initiation; S, date of symptoms onset. Asterisk indicates that patient was carrying the C2a genotype as a minor strain.
strain can theoretically become a major strain over time.
The patient with a minor C2a strain (T7) was thus added to the transmission map. NGS also enabled us to exclude other SOT and control patients from the transmission map with a higher degree of certainty compared with firstgeneration sequencing (Sanger sequencing), which is not accurate for minor strain detection <20%. Therefore, NGS-MLST appears to have a higher accuracy for describing an outbreak event and for specifying the transmission network.
For experimental purposes, we designed new primers to generate longer amplicons. We discovered 4 new mutations within the 732 bp mtLSU amplicon but no new polymorphism for the SOD or CYTB loci. Currently, no established MLST scheme for PCP typing exists; the 3-loci (mtLSU, CYTB, and SOD) scheme has been proposed and previously used in the context of a PCP outbreak (3,14). The 5th European Conference on Infections in Leukemia also recommended this MLST scheme for P. jirovecii typing (25). The internal transcribed spacer regions have been used but are difficult to amplify and are at risk for in vitro recombination (14,34). Extended 732 bp mtLSU and CYTB sequence polymorphisms conferred a high discriminatory power (D>0.95), considered to be sufficient for strain discrimination (35). In addition, mtLSU and CYTB amplification is facilitated because of their mitochondrial origin (multicopy genes). Hence, the lengthened 732 bp mtLSU PCR product could be of interest in light of the previously described 300 or 370 bp amplicons (14). The sequencing of 732 bp mtLSU and CYTB could be proposed as a rapid screening method to evaluate strain convergence, whereas SOD or internal transcribed spacer regions could be sequenced in a second step. Further studies will be needed to assess the added value of these newly described polymorphisms.
Although our study revealed new features in P. jirovecii typing, it did not provide evidence for or against features such as diploidy or mitochondrial heteroplasmy of P. jirovecii during PCP (32,36). None of our 32 samples exhibited a clear 50/50 allele repartition, which would have suggested heterozygosity, but the presence of heterozygote stages among the described genotypes cannot be entirely ruled out. The mitochondrial genes mtLSU and to a lesser extent CYTB displayed higher diversity than the nuclear locus SOD. However, this finding is not sufficient to confirm mitochondrial heteroplasmy because these differences might also have been attributable to the higher rate of polymorphism in the mitochondrial loci. One third of the samples displayed a unique genotype, suggesting that mitochondrial heteroplasmy was not present during these PCP infections.
This study demonstrated the clear added value of NGS-MLST to analyze major and minor strains in epidemiologic P. jirovecii studies. NGS is increasingly attractive because of the rapid development of bioinformatics and the reduced cost of this approach (37). We believe that NGS-MLST represents the next generation of MLST and that this method will become the new standard for strain typing in the next decade, especially for microorganisms that are not cultivable in vitro.  Ms. Charpentier is an assistant professor at Toulouse University Hospital, France, and previously completed her master's degree on this topic in Grenoble Alpes University Hospital, France. Her primary research interests include medical biology with an emphasis on medical mycology and parasitology.
A genus of unicellular fungi, Pneumocystis was likely originally described by Carlos Chagas in 1909 in guinea pigs, although he confused it with a trypanosome and placed it in a new genus, Schizotrypanum. In 1912, Delanoë and Delanoë at the Pasteur Institute published the first description of the new organism as unrelated to trypanosomes and proposed the species name P. carinii in honor of Antonio Carini.
Human Pneumocystis infections were first reported in 1942 by van der Meer and Brug, but not until 1976 did Frenkel report different morphologic and physiologic characteristics of human and rat Pneumocystis isolates. He proposed the name P. jirovecii in honor of Czech parasitologist Otto Jírovec, who reported Pneumocystis as a cause of interstitial pneumonia in infants, although this name change was not accepted by researchers at the time. When Pneumocystis was reclassified from a protozoan to a fungus, the naming convention shifted from the International Code of Zoological Nomenclature to the International Code of Botanical Nomenclature, and the species epithet was modified from jiroveci to jirovecii.