Comparative Omics Analysis of Historic and Recent Isolates of Bordetella pertussis and Effects of Genome Rearrangements on Evolution

Despite high vaccination coverage, pertussis is increasing in many industrialized countries, including the Czech Republic. To better understand Bordetella pertussis resurgence, we analyzed historic strains and recent clinical isolates by using a comparative omics approach. Whole-genome sequencing showed that historic and recent isolates of B. pertussis have substantial variation in genome organization and form separate phylogenetic clusters. Subsequent RNA sequence analysis and liquid chromatography with mass tandem spectrometry analyses showed that these variations translated into discretely separated transcriptomic and proteomic profiles. When compared with historic strains, recent isolates showed increased expression of flagellar genes and genes involved in lipopolysaccharide biosynthesis and decreased expression of polysaccharide capsule genes. Compared with reference strain Tohama I, all strains had increased expression and production of the type III secretion system apparatus. We detected the potential link between observed effects and insertion sequence element–induced changes in gene context only for a few genes.

reads at the Vienna Biocenter Core Facilities Next Generation Sequencing Unit (Vienna, Austria). After quality control, the reads were demultiplexed and quality trimming and adaptor removal from the reads was performed by using Trimmomatic (2).
Reads were mapped to combined transcriptome made of all strains by using the Salmon algorithm (3). Combined transcriptome was built from all annotated transcripts in all strains, and homologous transcripts were conflated into multistrain-representing genes. Gene homology between the strains was determined by using the Roary pipeline (4). Differential gene expression analysis was performed by using DESeq2 (5). Genes with a log2-fold change <-1 or >1 and a q value <0.05 (p value adjusted for multiple testing correction by the method of Benjamini and Hochberg [6]) were considered significantly deregulated.
To see the overall gene expression differences between the groups of recent clinical isolates and vaccine strains, all isolates within each group were treated as replicates of the same sample. RNA sequencing data from the sequencing runs were deposited in the European Nucleotide Archive under project accession no. PRJEB34096.

Protein Sample Preparation and Label-Free Proteomic Analysis by Using Liquid Chromatography with Mass Tandem Spectrometry Analyses
Cultures of B. pertussis were pelleted by centrifugation at 10,000× g at 4°C for 10 min to separate cell pellets and culture supernatants. Cells were resuspended in digestion buffer (100 mmol/L triethylammonium bicarbonate, pH 8.5, 2% sodium deoxycholate) and lysed by sonication. For analysis of supernatant fractions, supernatants were filtered through 0.22-µm filters and precipitated with 10% (wt/vol) trichloracetic acid (Sigma) overnight at 4°C. Precipitated proteins were collected by centrifugation at 14,000× g at 4°C, for 20 min, washed with 80% (wt/vol) acetone, and dissolved in 100 mmol/L triethylammonium bicarbonate, pH 8.5, 2% sodium deoxycholate digestion buffer. Protein concentrations were determined by using the BCA Protein Assay Kit (Thermo Fischer Scientific), and 20 µg of protein/sample were used for protein analysis. Cysteines were reduced with 5 mol/L Tris(2-carboxyethyl)phosphine (at 60°C for 60 min) and blocked with 10 mmol/L methyl methanethiosulfonate (at room temperature for 10 min). Samples were digested with trypsin (trypsin:protein ratio 1:20) at 37°C overnight.
Digestion of samples was stopped by addition of trifluoracetic acid (Sigma) to a final concentration of 1% (vol/vol). Sodium deoxycholate was removed by extraction with ethyl Page 3 of 6 acetate, and peptides were desalted on a C18 column (Michrom Bio, https://www.bioprocessonline.com).
A Nano Reversed Phase Column (EASY-Spray Column, 50 cm × 75 µm internal diameter, PepMap C18, 2-µm particles, 100 Å pore size; Thermo Fisher Scientific) was used for liquid chromatography-mass spectrometry analysis. Mobile phase buffer A was composed of water and 0.1% formic acid. Mobile phase B was composed of acetonitrile and 0.1% formic acid.
Samples were loaded onto the trap column (Acclaim PepMap300, C18, 5 µm, 300 Å wide pore, 300 µm × 5 mm; Thermo Fisher Scientific) at a flow rate of 15 µL/min. Loading buffer was composed of water, 2% acetonitrile, and 0.1% trifluoroacetic acid. Peptides were eluted with a gradient of phase B ranging from 4% to 35% over 60 min at a flow rate of 300 nL/min. Eluting retention time (maximum deviation 0.7 min), and this feature was also used in quantification experiments. Protein abundance was calculated from obtained label-free protein intensities by using he MaxLFQ algorithm (10). Proteins with <4 mass tandem spectrometry spectral counts were removed from the analysis. Statistics and data interpretation were performed by using Perseus version 1.6.2.3 software (11). Normalized label-free intensities were compared pairwise between recent clinical isolates, vaccine strains, and Tohama I. Similarly to transcriptomic analysis, all isolates within each group of isolates from the Czech Republic were treated as replicates of the same sample, and each abundance ratio was tested for significance by using a 2- The hierarchical clustering analysis was generated by using Perseus 1.6.2.3 software (11).
In brief, intensities of label-free quantified proteins were log2-transformed to reduce the effect of outliers. For analysis, the greatly modulated proteins were separated by using a multiple-sample test with the false discovery rate at a cutoff value of 0.05 by using the permutation test (250 randomizations). Hierarchical clustering was performed on Z-score normalized log2-transformed label-free quantified intensities of greatly modulated proteins within either cell-associated or cellfree fractions.

GO Term Enrichment Analysis
To gain a comprehensive functional annotation of the reference genome, GO terms per gene were deduced by using blast2go (13). For the GO term enrichment analysis, deregulated genes were split into up-regulated and down-regulated genes, and each gene set was analyzed separately. Each GO term, which was associated with >1 gene in the gene set, was tested for enrichment in comparison to the whole transcriptome by using the Fisher exact test. Afterwards, determined p values were corrected for multiple testing by using the method of Benjamini and Page 5 of 6 Hochberg (6), summarized by using the Revigo tool (14), and visualized by using Cytoscape (https://cytoscape.org).